Script score query | Elasticsearch Guide [7.7]

原文地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/query-dsl-script-score-query.html, 原文档版权归 www.elastic.co 所有

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

» » »

« Script query Wrapper query »

Script score queryedit

Uses a script to provide a custom score for returned documents.

The script_score query is useful if, for example, a scoring function is expensive and you only need to calculate the score of a filtered set of documents.

Example requestedit

The following script_score query assigns each returned document a score equal to the likes field value divided by 10.

GET /_search
{
    "query" : {
        "script_score" : {
            "query" : {
                "match": { "message": "elasticsearch" }
            },
            "script" : {
                "source" : "doc['likes'].value / 10 "
            }
        }
     }
}

Top-level parameters for `script_score`edit

query: (Required, query object) Query used to return documents.
script: (Required, script object) Script used to compute the score of documents returned by the query.

Final relevance scores from the script_score query cannot be negative. To support certain search optimizations, Lucene requires scores be positive or 0.
min_score: (Optional, float) Documents with a score lower than this floating point number are excluded from the search results.
boost: (Optional, float) Documents' scores produced by script are multiplied by boost to produce final documents' scores. Defaults to 1.0.

Notesedit

Use relevance scores in a scriptedit

Within a script, you can access the _score variable which represents the current relevance score of a document.

Predefined functionsedit

You can use any of the available painless functions in your script. You can also use the following predefined functions to customize scoring:

We suggest using these predefined functions instead of writing your own. These functions take advantage of efficiencies from Elasticsearch' internal mechanisms.

Saturationedit

saturation(value,k) = value/(k + value)

"script" : {
    "source" : "saturation(doc['likes'].value, 1)"
}

Sigmoidedit

sigmoid(value, k, a) = value^a/ (k^a + value^a)

"script" : {
    "source" : "sigmoid(doc['likes'].value, 2, 1)"
}

Random score functionedit

random_score function generates scores that are uniformly distributed from 0 up to but not including 1.

randomScore function has the following syntax: randomScore(<seed>, <fieldName>). It has a required parameter - seed as an integer value, and an optional parameter - fieldName as a string value.

"script" : {
    "source" : "randomScore(100, '_seq_no')"
}

If the fieldName parameter is omitted, the internal Lucene document ids will be used as a source of randomness. This is very efficient, but unfortunately not reproducible since documents might be renumbered by merges.

"script" : {
    "source" : "randomScore(100)"
}

Note that documents that are within the same shard and have the same value for field will get the same score, so it is usually desirable to use a field that has unique values for all documents across a shard. A good default choice might be to use the _seq_no field, whose only drawback is that scores will change if the document is updated since update operations also update the value of the _seq_no field.

Decay functions for numeric fieldsedit

You can read more about decay functions here.

double decayNumericLinear(double origin, double scale, double offset, double decay, double docValue)
double decayNumericExp(double origin, double scale, double offset, double decay, double docValue)
double decayNumericGauss(double origin, double scale, double offset, double decay, double docValue)

"script" : {
    "source" : "decayNumericLinear(params.origin, params.scale, params.offset, params.decay, doc['dval'].value)",
    "params": { 
        "origin": 20,
        "scale": 10,
        "decay" : 0.5,
        "offset" : 0
    }
}

Using params allows to compile the script only once, even if params change.

Decay functions for geo fieldsedit

double decayGeoLinear(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
double decayGeoExp(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)
double decayGeoGauss(String originStr, String scaleStr, String offsetStr, double decay, GeoPoint docValue)

"script" : {
    "source" : "decayGeoExp(params.origin, params.scale, params.offset, params.decay, doc['location'].value)",
    "params": {
        "origin": "40, -70.12",
        "scale": "200km",
        "offset": "0km",
        "decay" : 0.2
    }
}

Decay functions for date fieldsedit

double decayDateLinear(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
double decayDateExp(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)
double decayDateGauss(String originStr, String scaleStr, String offsetStr, double decay, JodaCompatibleZonedDateTime docValueDate)

"script" : {
    "source" : "decayDateGauss(params.origin, params.scale, params.offset, params.decay, doc['date'].value)",
    "params": {
        "origin": "2008-01-01T01:00:00Z",
        "scale": "1h",
        "offset" : "0",
        "decay" : 0.5
    }
}

Decay functions on dates are limited to dates in the default format and default time zone. Also calculations with now are not supported.

Functions for vector fieldsedit

Functions for vector fields are accessible through script_score query.

Allow expensive queriesedit

Script score queries will not be executed if search.allow_expensive_queries is set to false.

Faster alternativesedit

The script_score query calculates the score for every matching document, or hit. There are faster alternative query types that can efficiently skip non-competitive hits:

If you want to boost documents on some static fields, use the rank_feature query.
If you want to boost documents closer to a date or geographic point, use the distance_feature query.

Transition from the function score queryedit

We are deprecating the function_score query. We recommend using the script_score query instead.

You can implement the following functions from the function_score query using the script_score query:

`script_score`edit

What you used in script_score of the Function Score query, you can copy into the Script Score query. No changes here.

`weight`edit

weight function can be implemented in the Script Score query through the following script:

"script" : {
    "source" : "params.weight * _score",
    "params": {
        "weight": 2
    }
}

`random_score`edit

Use randomScore function as described in random score function.

`field_value_factor`edit

field_value_factor function can be easily implemented through script:

"script" : {
    "source" : "Math.log10(doc['field'].value * params.factor)",
    "params" : {
        "factor" : 5
    }
}

For checking if a document has a missing value, you can use doc['field'].size() == 0. For example, this script will use a value 1 if a document doesn’t have a field field:

"script" : {
    "source" : "Math.log10((doc['field'].size() == 0 ? 1 : doc['field'].value()) * params.factor)",
    "params" : {
        "factor" : 5
    }
}

This table lists how field_value_factor modifiers can be implemented through a script:

Modifier	Implementation in Script Score
`none`	-
`log`	`Math.log10(doc['f'].value)`
`log1p`	`Math.log10(doc['f'].value + 1)`
`log2p`	`Math.log10(doc['f'].value + 2)`
`ln`	`Math.log(doc['f'].value)`
`ln1p`	`Math.log(doc['f'].value + 1)`
`ln2p`	`Math.log(doc['f'].value + 2)`
`square`	`Math.pow(doc['f'].value, 2)`
`sqrt`	`Math.sqrt(doc['f'].value)`
`reciprocal`	`1.0 / doc['f'].value`

`decay` functionsedit

The script_score query has equivalent decay functions that can be used in script.

Functions for vector fieldsedit

During vector functions' calculation, all matched documents are linearly scanned. Thus, expect the query time grow linearly with the number of matched documents. For this reason, we recommend to limit the number of matched documents with a query parameter.

`dense_vector` functionsedit

Let’s create an index with a dense_vector mapping and index a couple of documents into it.

PUT my_index
{
  "mappings": {
    "properties": {
      "my_dense_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "status" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "my_dense_vector": [0.5, 10, 6],
  "status" : "published"
}

PUT my_index/_doc/2
{
  "my_dense_vector": [-0.5, 10, 10],
  "status" : "published"
}

POST my_index/_refresh

The cosineSimilarity function calculates the measure of cosine similarity between a given query vector and document vectors.

GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published" 
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", 
        "params": {
          "query_vector": [4, 3.4, -0.2]  
        }
      }
    }
  }
}

	To restrict the number of documents on which script score calculation is applied, provide a filter.
	The script adds 1.0 to the cosine similarity to prevent the score from being negative.
	To take advantage of the script optimizations, provide a query vector as a script parameter.

If a document’s dense vector field has a number of dimensions different from the query’s vector, an error will be thrown.

The dotProduct function calculates the measure of dot product between a given query vector and document vectors.

GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          double value = dotProduct(params.query_vector, 'my_dense_vector');
          return sigmoid(1, Math.E, -value); 
        """,
        "params": {
          "query_vector": [4, 3.4, -0.2]
        }
      }
    }
  }
}

Using the standard sigmoid function prevents scores from being negative.

The l1norm function calculates L¹ distance (Manhattan distance) between a given query vector and document vectors.

GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", 
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}

Unlike cosineSimilarity that represent similarity, l1norm and l2norm shown below represent distances or differences. This means, that the more similar the vectors are, the lower the scores will be that are produced by the l1norm and l2norm functions. Thus, as we need more similar vectors to score higher, we reversed the output from l1norm and l2norm. Also, to avoid division by 0 when a document vector matches the query exactly, we added 1 in the denominator.

The l2norm function calculates L² distance (Euclidean distance) between a given query vector and document vectors.

GET my_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}

If a document doesn’t have a value for a vector field on which a vector function is executed, an error will be thrown.

You can check if a document has a value for the field my_vector by doc['my_vector'].size() == 0. Your overall script can look like this:

"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"

`sparse_vector` functionsedit

Deprecated in 7.6.

The sparse_vector type is deprecated and will be removed in 8.0.

Let’s create an index with a sparse_vector mapping and index a couple of documents into it.

PUT my_sparse_index
{
  "mappings": {
    "properties": {
      "my_sparse_vector": {
        "type": "sparse_vector"
      },
      "status" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my_sparse_index/_doc/1
{
  "my_sparse_vector": {"2": 1.5, "15" : 2, "50": -1.1, "4545": 1.1},
  "status" : "published"
}

PUT my_sparse_index/_doc/2
{
  "my_sparse_vector": {"2": 2.5, "10" : 1.3, "55": -2.3, "113": 1.6},
  "status" : "published"
}

POST my_sparse_index/_refresh

The cosineSimilaritySparse function calculates cosine similarity between a given query vector and document vectors.

GET my_sparse_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilaritySparse(params.query_vector, 'my_sparse_vector') + 1.0",
        "params": {
          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}

The dotProductSparse function calculates dot product between a given query vector and document vectors.

GET my_sparse_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          double value = dotProductSparse(params.query_vector, 'my_sparse_vector');
          return sigmoid(1, Math.E, -value);
        """,
         "params": {
          "query_vector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}

The l1normSparse function calculates L¹ distance between a given query vector and document vectors.

GET my_sparse_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l1normSparse(params.queryVector, 'my_sparse_vector'))",
        "params": {
          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}

The l2normSparse function calculates L² distance between a given query vector and document vectors.

GET my_sparse_index/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l2normSparse(params.queryVector, 'my_sparse_vector'))",
        "params": {
          "queryVector": {"2": 0.5, "10" : 111.3, "50": -1.3, "113": 14.8, "4545": 156.0}
        }
      }
    }
  }
}

Explain requestedit

Using an explain request provides an explanation of how the parts of a score were computed. The script_score query can add its own explanation by setting the explanation parameter:

GET /twitter/_explain/0
{
    "query" : {
        "script_score" : {
            "query" : {
                "match": { "message": "elasticsearch" }
            },
            "script" : {
                "source" : """
                  long likes = doc['likes'].value;
                  double normalizedLikes = likes / 10;
                  if (explanation != null) {
                    explanation.set('normalized likes = likes / 10 = ' + likes + ' / 10 = ' + normalizedLikes);
                  }
                  return normalizedLikes;
                """
            }
        }
     }
}

Note that the explanation will be null when using in a normal _search request, so having a conditional guard is best practice.

« Script query Wrapper query »

Script score queryedit

Example requestedit

Top-level parameters for script_scoreedit

Notesedit

Use relevance scores in a scriptedit

Predefined functionsedit

Saturationedit

Sigmoidedit

Random score functionedit

Decay functions for numeric fieldsedit

Decay functions for geo fieldsedit

Decay functions for date fieldsedit

Functions for vector fieldsedit

Allow expensive queriesedit

Faster alternativesedit

Transition from the function score queryedit

script_scoreedit

weightedit

random_scoreedit

field_value_factoredit

decay functionsedit

Functions for vector fieldsedit

dense_vector functionsedit

sparse_vector functionsedit

Deprecated in 7.6.

Explain requestedit

Top-level parameters for `script_score`edit

`script_score`edit

`weight`edit

`random_score`edit

`field_value_factor`edit

`decay` functionsedit

`dense_vector` functionsedit

`sparse_vector` functionsedit