本地英文版地址: ../en/query-dsl-script-score-query.html
脚本评分:使用 script(脚本) 为返回的文档提供自定义分数。
如果评分函数使用成本很高,但只需要计算一组经过过滤的文档的分数,则script_score
查询非常有用。
下面这个 script_score
查询为每个返回的文档分配一个分数,该分数等于 likes
字段的值除以 10
。
GET /_search { "query" : { "script_score" : { "query" : { "match": { "message": "elasticsearch" } }, "script" : { "source" : "doc['likes'].value / 10 " } } } }
-
query
- (必需, query 对象) 用来返回文档的 query。
-
script
-
(必需, script 对象) 用于计算
query
返回的文档的分数的脚本。script_score
查询的最终相关性评分不能为负数。 为了支持某些搜索优化,Lucene 要求分数为正数或0
。 -
min_score
- (可选, float) 分数低于此浮点数的文档将从搜索结果中排除。
-
boost
-
(可选, float) 由
script
产生的文档的分数乘以boost
以产生最终的文档的分数。默认为1.0
。
在脚本中,你可以访问访问_score
变量,该变量表示文档的当前相关性评分。
你可以在script
中使用任何可用的无痛函数(painless function)。
还可以使用下面几个预定义的函数来自定义评分:
建议使用这些预定义的函数,而不是编写自己的函数。 这些功能从 Elasticsearch 的内部机制来说更高效。
saturation(value,k) = value/(k + value)
"script" : { "source" : "saturation(doc['likes'].value, 1)" }
sigmoid(value, k, a) = value^a/ (k^a + value^a)
"script" : { "source" : "sigmoid(doc['likes'].value, 2, 1)" }
random_score
函数生成从0到1(不包括1)均匀分布的分数。
randomScore
函数的语法是:randomScore(<seed>, <fieldName>)
。
它有一个作为整数值的必需的参数 seed
和一个作为字符串值的可选参数 fieldName
。
"script" : { "source" : "randomScore(100, '_seq_no')" }
如果省略了参数fieldName
,Lucene 内部的文档 id 将被用作随机性的来源。
这非常有效,但却是不可再现的,因为文档可能会通过合并(merge)重新编号。
"script" : { "source" : "randomScore(100)" }
请注意,在同一个分片中具有相同字段值的文档将获得相同的分数,因此通常希望对整个分片中的所有文档使用具有唯一值的字段。
一个好的默认选择可能是使用_seq_no
字段,它唯一的缺点是如果文档被更新,分数将会改变,因为更新操作也会更新_seq_no
字段的值。
你可以在这里阅读更多关于衰减函数的内容。
-
double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
-
double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
-
double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)
-
double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
-
double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
-
double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
"script" : { "source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)", "params": { "origin": "40, -70.12", "scale": "200km", "offset": "0km", "decay" : 0.2 } }
-
double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
-
double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
-
double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
"script" : { "source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)", "params": { "origin": "2008-01-01T01:00:00Z", "scale": "1h", "offset" : "0", "decay" : 0.5 } }
date的衰减函数仅限于默认格式和默认时区的日期。
也不支持使用now
进行计算。
vector字段的函数 可通过script_score
查询来访问。
如果 search.allow_expensive_queries
设置为 false
则脚本评分查询不会被执行。
script_score
查询计算每个匹配文档或命中的得分。
有更快的替代查询类型,可以有效地跳过非竞争命中:
-
如果你想提升一些静态字段的文档,可以使用
rank_feature
查询。 -
如果你想提升与给定日期或地理坐标点更接近的文档,请使用
distance_feature
查询。
我们正在废弃 function_score
查询,建议使用 script_score
查询代替之。
可以使用 script_score
查询从 function_score
查询实现以下函数:
weight
函数可以通过下面的脚本在脚本评分查询中实现:
"script" : { "source" : "params.weight * _score", "params": { "weight": 2 } }
像随机评分函数(random score function)中描述的那样去使用 randomScore
函数。
field_value_factor
函数可以通过脚本轻松实现:
"script" : { "source" : "Math.log10(doc['field'].value * params.factor)", "params" : { "factor" : 5 } }
要检查文档是否有缺失值,可以使用 doc['field'].size() == 0
。
例如,如果文档没有字段 field
,此脚本将使用1
作为它的值:
"script" : { "source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)", "params" : { "factor" : 5 } }
下表列出了如何通过脚本实现 field_value_factor
修饰符:
修饰符 | 在脚本评分中的实现 |
---|---|
|
- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
script_score
查询具有可在脚本中使用的等效的衰减函数(decay function)。
在向量(vector)函数的计算过程中,所有匹配的文档都被线性扫描。
因此,预计查询时间会随着匹配文档的数量而线性增长。
因此,建议使用 query
参数来限制匹配文档的数量。
让我们创建一个带有dense_vector
类型的映射的索引,然后添加并索引几个文档进去。
PUT my_index { "mappings": { "properties": { "my_dense_vector": { "type": "dense_vector", "dims": 3 }, "status" : { "type" : "keyword" } } } } PUT my_index/_doc/1 { "my_dense_vector": [0.5, 10, 6], "status" : "published" } PUT my_index/_doc/2 { "my_dense_vector": [-0.5, 10, 10], "status" : "published" } POST my_index/_refresh
cosineSimilarity
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的余弦相似度。
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", "params": { "query_vector": [4, 3.4, -0.2] } } } } }
如果文档的 dense_vector 字段的维数与 查询向量(query_vector) 的维数不同,将会抛出一个错误。
dotProduct
函数计算给定的查询向量(query_vector) 和文档的向量(vector) 之间的点积。
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ double value = dotProduct(params.query_vector, 'my_dense_vector'); return sigmoid(1, Math.E, -value); """, "params": { "query_vector": [4, 3.4, -0.2] } } } } }
l1norm
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的 L1距离(曼哈顿距离)。
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
与表示相似性的 |
l2norm
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的 L2距离(欧几里德距离)。
GET my_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))", "params": { "queryVector": [4, 3.4, -0.2] } } } } }
如果文档没有执行 向量(vector) 函数的 vector 字段的值,将会抛出一个错误。
可以通过 doc['my_vector'].size() == 0
检查文档的字段my_vector
是否有值。
整个脚本可能如下所示:
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
在 7.6 中废弃。
sparse_vector
类型已废弃并将在8.0版本中移除。
我们来创建一个包含sparse_vector
类型的映射的索引,添加并索引几个文档进去:
PUT my_sparse_index { "mappings": { "properties": { "my_sparse_vector": { "type": "sparse_vector" }, "status" : { "type" : "keyword" } } } }
PUT my_sparse_index/_doc/1 { "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1}, "status" : "published" } PUT my_sparse_index/_doc/2 { "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6}, "status" : "published" } POST my_sparse_index/_refresh
cosineSimilaritySparse
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的余弦相似性。
GET my_sparse_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "cosineSimilaritySparse(params.query_vector, 'my_sparse_vector') + 1.0", "params": { "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
dotProductSparse
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的点积。
GET my_sparse_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": """ double value = dotProductSparse(params.query_vector, 'my_sparse_vector'); return sigmoid(1, Math.E, -value); """, "params": { "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
l1normSparse
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的 L1 距离。
GET my_sparse_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l1normSparse(params.queryVector, 'my_sparse_vector'))", "params": { "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
l2normSparse
函数计算给定的查询向量(query_vector) 和文档向量(vector) 之间的 L2 距离。
GET my_sparse_index/_search { "query": { "script_score": { "query" : { "bool" : { "filter" : { "term" : { "status" : "published" } } } }, "script": { "source": "1 / (1 + l2normSparse(params.queryVector, 'my_sparse_vector'))", "params": { "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0} } } } } }
使用explain请求可以查看如何计算分数的解释。
script_score
查询可以通过设置参数 explanation
来添加自己的解释:
GET /twitter/_explain/0 { "query" : { "script_score" : { "query" : { "match": { "message": "elasticsearch" } }, "script" : { "source" : """ long likes = doc['likes'].value; double normalizedLikes = likes / 10; if (explanation != null) { explanation.set('normalized likes = likes / 10 = ' + likes + ' / 10 = ' + normalizedLikes); } return normalizedLikes; """ } } } }
注意,当在普通的 _search
请求中使用时,explanation 将为 null,因此使用条件保护是最佳实践。
- Elasticsearch权威指南: 其他版本:
- Elasticsearch是什么?
- 7.7版本的新特性
- 开始使用Elasticsearch
- 安装和设置
- 升级Elasticsearch
- 搜索你的数据
- 查询领域特定语言(Query DSL)
- SQL access(暂时不翻译)
- Overview
- Getting Started with SQL
- Conventions and Terminology
- Security
- SQL REST API
- SQL Translate API
- SQL CLI
- SQL JDBC
- SQL ODBC
- SQL Client Applications
- SQL Language
- Functions and Operators
- Comparison Operators
- Logical Operators
- Math Operators
- Cast Operators
- LIKE and RLIKE Operators
- Aggregate Functions
- Grouping Functions
- Date/Time and Interval Functions and Operators
- Full-Text Search Functions
- Mathematical Functions
- String Functions
- Type Conversion Functions
- Geo Functions
- Conditional Functions And Expressions
- System Functions
- Reserved keywords
- SQL Limitations
- 聚合
- 度量(metric)聚合
- 桶(bucket)聚合
- adjacency_matrix 聚合
- auto_date_histogram 聚合
- children 聚合
- composite 聚合
- date_histogram 聚合
- date_range 聚合
- diversified_sampler 聚合
- filter 聚合
- filters 聚合
- geo_distance 聚合
- geohash_grid 聚合
- geotile_grid 聚合
- global 聚合
- histogram 聚合
- ip_range 聚合
- missing 聚合
- nested 聚合
- parent 聚合
- range 聚合
- rare_terms 聚合
- reverse_nested 聚合
- sampler 聚合
- significant_terms 聚合
- significant_text 聚合
- terms 聚合
- 给范围字段分桶的微妙之处
- 管道(pipeline)聚合
- 矩阵(matrix)聚合
- 重度缓存的聚合
- 只返回聚合的结果
- 聚合元数据
- Returning the type of the aggregation
- 使用转换对聚合结果进行索引
- 脚本
- 映射
- 删除的映射类型
- 字段数据类型
- alias(别名)
- array(数组)
- binary(二进制)
- boolean(布尔)
- date(日期)
- date_nanos(日期纳秒)
- dense_vector(密集矢量)
- histogram(直方图)
- flattened(扁平)
- geo_point(地理坐标点)
- geo_shape(地理形状)
- IP
- join(联结)
- keyword(关键词)
- nested(嵌套)
- numeric(数值)
- object(对象)
- percolator(渗透器)
- range(范围)
- rank_feature(特征排名)
- rank_features(特征排名)
- search_as_you_type(输入即搜索)
- Sparse vector
- Text
- Token count
- Shape
- Constant keyword
- Meta-Fields
- Mapping parameters
- Dynamic Mapping
- Text analysis
- Overview
- Concepts
- Configure text analysis
- Built-in analyzer reference
- Tokenizer reference
- Char Group Tokenizer
- Classic Tokenizer
- Edge n-gram tokenizer
- Keyword Tokenizer
- Letter Tokenizer
- Lowercase Tokenizer
- N-gram tokenizer
- Path Hierarchy Tokenizer
- Path Hierarchy Tokenizer Examples
- Pattern Tokenizer
- Simple Pattern Tokenizer
- Simple Pattern Split Tokenizer
- Standard Tokenizer
- Thai Tokenizer
- UAX URL Email Tokenizer
- Whitespace Tokenizer
- Token filter reference
- Apostrophe
- ASCII folding
- CJK bigram
- CJK width
- Classic
- Common grams
- Conditional
- Decimal digit
- Delimited payload
- Dictionary decompounder
- Edge n-gram
- Elision
- Fingerprint
- Flatten graph
- Hunspell
- Hyphenation decompounder
- Keep types
- Keep words
- Keyword marker
- Keyword repeat
- KStem
- Length
- Limit token count
- Lowercase
- MinHash
- Multiplexer
- N-gram
- Normalization
- Pattern capture
- Pattern replace
- Phonetic
- Porter stem
- Predicate script
- Remove duplicates
- Reverse
- Shingle
- Snowball
- Stemmer
- Stemmer override
- Stop
- Synonym
- Synonym graph
- Trim
- Truncate
- Unique
- Uppercase
- Word delimiter
- Word delimiter graph
- Character filters reference
- Normalizers
- Index modules
- Ingest node
- Pipeline Definition
- Accessing Data in Pipelines
- Conditional Execution in Pipelines
- Handling Failures in Pipelines
- Enrich your data
- Processors
- Append Processor
- Bytes Processor
- Circle Processor
- Convert Processor
- CSV Processor
- Date Processor
- Date Index Name Processor
- Dissect Processor
- Dot Expander Processor
- Drop Processor
- Enrich Processor
- Fail Processor
- Foreach Processor
- GeoIP Processor
- Grok Processor
- Gsub Processor
- HTML Strip Processor
- Inference Processor
- Join Processor
- JSON Processor
- KV Processor
- Lowercase Processor
- Pipeline Processor
- Remove Processor
- Rename Processor
- Script Processor
- Set Processor
- Set Security User Processor
- Split Processor
- Sort Processor
- Trim Processor
- Uppercase Processor
- URL Decode Processor
- User Agent processor
- ILM: Manage the index lifecycle
- Monitor a cluster
- Frozen indices
- Roll up or transform your data
- Set up a cluster for high availability
- Snapshot and restore
- Secure a cluster
- Overview
- Configuring security
- User authentication
- Built-in users
- Internal users
- Token-based authentication services
- Realms
- Realm chains
- Active Directory user authentication
- File-based user authentication
- LDAP user authentication
- Native user authentication
- OpenID Connect authentication
- PKI user authentication
- SAML authentication
- Kerberos authentication
- Integrating with other authentication systems
- Enabling anonymous access
- Controlling the user cache
- Configuring SAML single-sign-on on the Elastic Stack
- Configuring single sign-on to the Elastic Stack using OpenID Connect
- User authorization
- Built-in roles
- Defining roles
- Security privileges
- Document level security
- Field level security
- Granting privileges for indices and aliases
- Mapping users and groups to roles
- Setting up field and document level security
- Submitting requests on behalf of other users
- Configuring authorization delegation
- Customizing roles and authorization
- Enabling audit logging
- Encrypting communications
- Restricting connections with IP filtering
- Cross cluster search, clients, and integrations
- Tutorial: Getting started with security
- Tutorial: Encrypting communications
- Troubleshooting
- Some settings are not returned via the nodes settings API
- Authorization exceptions
- Users command fails due to extra arguments
- Users are frequently locked out of Active Directory
- Certificate verification fails for curl on Mac
- SSLHandshakeException causes connections to fail
- Common SSL/TLS exceptions
- Common Kerberos exceptions
- Common SAML issues
- Internal Server Error in Kibana
- Setup-passwords command fails due to connection failure
- Failures due to relocation of the configuration files
- Limitations
- Alerting on cluster and index events
- Command line tools
- How To
- Glossary of terms
- REST APIs
- API conventions
- cat APIs
- cat aliases
- cat allocation
- cat anomaly detectors
- cat count
- cat data frame analytics
- cat datafeeds
- cat fielddata
- cat health
- cat indices
- cat master
- cat nodeattrs
- cat nodes
- cat pending tasks
- cat plugins
- cat recovery
- cat repositories
- cat shards
- cat segments
- cat snapshots
- cat task management
- cat templates
- cat thread pool
- cat trained model
- cat transforms
- Cluster APIs
- Cluster allocation explain
- Cluster get settings
- Cluster health
- Cluster reroute
- Cluster state
- Cluster stats
- Cluster update settings
- Nodes feature usage
- Nodes hot threads
- Nodes info
- Nodes reload secure settings
- Nodes stats
- Pending cluster tasks
- Remote cluster info
- Task management
- Voting configuration exclusions
- Cross-cluster replication APIs
- Document APIs
- Enrich APIs
- Explore API
- Index APIs
- Add index alias
- Analyze
- Clear cache
- Clone index
- Close index
- Create index
- Delete index
- Delete index alias
- Delete index template
- Flush
- Force merge
- Freeze index
- Get field mapping
- Get index
- Get index alias
- Get index settings
- Get index template
- Get mapping
- Index alias exists
- Index exists
- Index recovery
- Index segments
- Index shard stores
- Index stats
- Index template exists
- Open index
- Put index template
- Put mapping
- Refresh
- Rollover index
- Shrink index
- Split index
- Synced flush
- Type exists
- Unfreeze index
- Update index alias
- Update index settings
- Index lifecycle management API
- Ingest APIs
- Info API
- Licensing APIs
- Machine learning anomaly detection APIs
- Add events to calendar
- Add jobs to calendar
- Close jobs
- Create jobs
- Create calendar
- Create datafeeds
- Create filter
- Delete calendar
- Delete datafeeds
- Delete events from calendar
- Delete filter
- Delete forecast
- Delete jobs
- Delete jobs from calendar
- Delete model snapshots
- Delete expired data
- Estimate model memory
- Find file structure
- Flush jobs
- Forecast jobs
- Get buckets
- Get calendars
- Get categories
- Get datafeeds
- Get datafeed statistics
- Get influencers
- Get jobs
- Get job statistics
- Get machine learning info
- Get model snapshots
- Get overall buckets
- Get scheduled events
- Get filters
- Get records
- Open jobs
- Post data to jobs
- Preview datafeeds
- Revert model snapshots
- Set upgrade mode
- Start datafeeds
- Stop datafeeds
- Update datafeeds
- Update filter
- Update jobs
- Update model snapshots
- Machine learning data frame analytics APIs
- Create data frame analytics jobs
- Create inference trained model
- Delete data frame analytics jobs
- Delete inference trained model
- Evaluate data frame analytics
- Explain data frame analytics API
- Get data frame analytics jobs
- Get data frame analytics jobs stats
- Get inference trained model
- Get inference trained model stats
- Start data frame analytics jobs
- Stop data frame analytics jobs
- Migration APIs
- Reload search analyzers
- Rollup APIs
- Search APIs
- Security APIs
- Authenticate
- Change passwords
- Clear cache
- Clear roles cache
- Create API keys
- Create or update application privileges
- Create or update role mappings
- Create or update roles
- Create or update users
- Delegate PKI authentication
- Delete application privileges
- Delete role mappings
- Delete roles
- Delete users
- Disable users
- Enable users
- Get API key information
- Get application privileges
- Get builtin privileges
- Get role mappings
- Get roles
- Get token
- Get users
- Has privileges
- Invalidate API key
- Invalidate token
- OpenID Connect Prepare Authentication API
- OpenID Connect authenticate API
- OpenID Connect logout API
- SAML prepare authentication API
- SAML authenticate API
- SAML logout API
- SAML invalidate API
- SSL certificate
- Snapshot and restore APIs
- Snapshot lifecycle management API
- Transform APIs
- Usage API
- Watcher APIs
- Definitions
- Breaking changes
- Release notes
- Elasticsearch version 7.7.1
- Elasticsearch version 7.7.0
- Elasticsearch version 7.6.2
- Elasticsearch version 7.6.1
- Elasticsearch version 7.6.0
- Elasticsearch version 7.5.2
- Elasticsearch version 7.5.1
- Elasticsearch version 7.5.0
- Elasticsearch version 7.4.2
- Elasticsearch version 7.4.1
- Elasticsearch version 7.4.0
- Elasticsearch version 7.3.2
- Elasticsearch version 7.3.1
- Elasticsearch version 7.3.0
- Elasticsearch version 7.2.1
- Elasticsearch version 7.2.0
- Elasticsearch version 7.1.1
- Elasticsearch version 7.1.0
- Elasticsearch version 7.0.0
- Elasticsearch version 7.0.0-rc2
- Elasticsearch version 7.0.0-rc1
- Elasticsearch version 7.0.0-beta1
- Elasticsearch version 7.0.0-alpha2
- Elasticsearch version 7.0.0-alpha1