Cross-fields Entity Search | Elasticsearch: The Definitive Guide [2.x]

原文地址: https://www.elastic.co/guide/en/elasticsearch/guide/current/_cross_fields_entity_search.html, 版权归 www.elastic.co 所有

WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.

This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.

» » »

« Most Fields Field-Centric Queries »

Cross-fields Entity Searchedit

Now we come to a common pattern: cross-fields entity search. With entities like person, product, or address, the identifying information is spread across several fields. We may have a person indexed as follows:

{
    "firstname":  "Peter",
    "lastname":   "Smith"
}

Or an address like this:

{
    "street":   "5 Poland Street",
    "city":     "London",
    "country":  "United Kingdom",
    "postcode": "W1V 3DG"
}

This sounds a lot like the example we described in Multiple Query Strings, but there is a big difference between these two scenarios. In Multiple Query Strings, we used a separate query string for each field. In this scenario, we want to search across multiple fields with a single query string.

Our user might search for the person “Peter Smith” or for the address “Poland Street W1V.” Each of those words appears in a different field, so using a dis_max / best_fields query to find the single best-matching field is clearly the wrong approach.

A Naive Approachedit

Really, we want to query each field in turn and add up the scores of every field that matches, which sounds like a job for the bool query:

{
  "query": {
    "bool": {
      "should": [
        { "match": { "street":    "Poland Street W1V" }},
        { "match": { "city":      "Poland Street W1V" }},
        { "match": { "country":   "Poland Street W1V" }},
        { "match": { "postcode":  "Poland Street W1V" }}
      ]
    }
  }
}

Repeating the query string for every field soon becomes tedious. We can use the multi_match query instead, and set the type to most_fields to tell it to combine the scores of all matching fields:

{
  "query": {
    "multi_match": {
      "query":       "Poland Street W1V",
      "type":        "most_fields",
      "fields":      [ "street", "city", "country", "postcode" ]
    }
  }
}

Problems with the most_fields Approachedit

The most_fields approach to entity search has some problems that are not immediately obvious:

It is designed to find the most fields matching any words, rather than to find the most matching words across all fields.
It can’t use the operator or minimum_should_match parameters to reduce the long tail of less-relevant results.
Term frequencies are different in each field and could interfere with each other to produce badly ordered results.

« Most Fields Field-Centric Queries »