原文地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/similarity.html, 原文档版权归 www.elastic.co 所有

similarityedit

Elasticsearch allows you to configure a scoring algorithm or similarity per field. The similarity setting provides a simple way of choosing a similarity algorithm other than the default BM25, such as TF/IDF.

Similarities are mostly useful for text fields, but can also apply to other field types.

Custom similarities can be configured by tuning the parameters of the built-in similarities. For more details about this expert options, see the similarity module.

The only similarities which can be used out of the box, without any further configuration are:

BM25
The Okapi BM25 algorithm. The algorithm used by default in Elasticsearch and Lucene.
classic
[7.0.0] Deprecated in 7.0.0. The TF/IDF algorithm, the former default in Elasticsearch and Lucene.
boolean
A simple boolean similarity, which is used when full-text ranking is not needed and the score should only be based on whether the query terms match or not. Boolean similarity gives terms a score equal to their query boost.

The similarity can be set on the field level when a field is first created, as follows:

PUT my_index
{
  "mappings": {
    "properties": {
      "default_field": { 
        "type": "text"
      },
      "boolean_sim_field": {
        "type": "text",
        "similarity": "boolean" 
      }
    }
  }
}

The default_field uses the BM25 similarity.

The boolean_sim_field uses the boolean similarity.