WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Cross-fields Entity Searchedit
Now we come to a common pattern: cross-fields entity search. With entities
like person, product, or address, the identifying information is spread
across several fields. We may have a person indexed as follows:
{
"firstname": "Peter",
"lastname": "Smith"
}
Or an address like this:
{
"street": "5 Poland Street",
"city": "London",
"country": "United Kingdom",
"postcode": "W1V 3DG"
}
This sounds a lot like the example we described in Multiple Query Strings, but there is a big difference between these two scenarios. In Multiple Query Strings, we used a separate query string for each field. In this scenario, we want to search across multiple fields with a single query string.
Our user might search for the person “Peter Smith” or for the address
“Poland Street W1V.” Each of those words appears in a different field, so
using a dis_max / best_fields query to find the single best-matching
field is clearly the wrong approach.
A Naive Approachedit
Really, we want to query each field in turn and add up the scores of every
field that matches, which sounds like a job for the bool query:
{
"query": {
"bool": {
"should": [
{ "match": { "street": "Poland Street W1V" }},
{ "match": { "city": "Poland Street W1V" }},
{ "match": { "country": "Poland Street W1V" }},
{ "match": { "postcode": "Poland Street W1V" }}
]
}
}
}
Repeating the query string for every field soon becomes tedious. We can use
the multi_match query instead, and set the type to most_fields to tell it to
combine the scores of all matching fields:
{
"query": {
"multi_match": {
"query": "Poland Street W1V",
"type": "most_fields",
"fields": [ "street", "city", "country", "postcode" ]
}
}
}
Problems with the most_fields Approachedit
The most_fields approach to entity search has some problems that are not
immediately obvious:
- It is designed to find the most fields matching any words, rather than to find the most matching words across all fields.
-
It can’t use the
operatororminimum_should_matchparameters to reduce the long tail of less-relevant results. - Term frequencies are different in each field and could interfere with each other to produce badly ordered results.