Incorporating static relevance signals into the score | ElasticSearch 7.7 权威指南中文版

原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/static-scoring-signals.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/static-scoring-signals.html

重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考当前版本文档。

» » »

« Getting consistent scoring Tune for indexing speed »

Incorporating static relevance signals into the scoreedit

Many domains have static signals that are known to be correlated with relevance. For instance PageRank and url length are two commonly used features for web search in order to tune the score of web pages independently of the query.

There are two main queries that allow combining static score contributions with textual relevance, eg. as computed with BM25: - script_score query - rank_feature query

For instance imagine that you have a pagerank field that you wish to combine with the BM25 score so that the final score is equal to score = bm25_score + pagerank / (10 + pagerank).

With the script_score query the query would look like this:

GET index/_search
{
    "query" : {
        "script_score" : {
            "query" : {
                "match": { "body": "elasticsearch" }
            },
            "script" : {
                "source" : "_score * saturation(doc['pagerank'].value, 10)" 
            }
        }
    }
}

pagerank must be mapped as a Numeric

while with the rank_feature query it would look like below:

GET _search
{
    "query" : {
        "bool" : {
            "must": {
                "match": { "body": "elasticsearch" }
            },
            "should": {
                "rank_feature": {
                    "field": "pagerank", 
                    "saturation": {
                        "pivot": 10
                    }
                }
            }
        }
    }
}

pagerank must be mapped as a rank_feature field

While both options would return similar scores, there are trade-offs: script_score provides a lot of flexibility, enabling you to combine the text relevance score with static signals as you prefer. On the other hand, the rank_feature query only exposes a couple ways to incorporate static signails into the score. However, it relies on the rank_feature and rank_features fields, which index values in a special way that allows the rank_feature query to skip over non-competitive documents and get the top matches of a query faster.

« Getting consistent scoring Tune for indexing speed »