WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
cross-fields Queriesedit
The custom _all
approach is a good solution, as long as you thought
about setting it up before you indexed your documents. However, Elasticsearch
also provides a search-time solution to the problem: the multi_match
query
with type cross_fields
.
The cross_fields
type takes a term-centric approach, quite different from the
field-centric approach taken by best_fields
and most_fields
. It treats all
of the fields as one big field, and looks for each term in any field.
To illustrate the difference between field-centric and term-centric queries,
look at the explanation
for this field-centric most_fields
query:
GET /_validate/query?explain { "query": { "multi_match": { "query": "peter smith", "type": "most_fields", "operator": "and", "fields": [ "first_name", "last_name" ] } } }
For a document to match, both peter
and smith
must appear in the same
field, either the first_name
field or the last_name
field:
(+first_name:peter +first_name:smith) (+last_name:peter +last_name:smith)
A term-centric approach would use this logic instead:
+(first_name:peter last_name:peter) +(first_name:smith last_name:smith)
In other words, the term peter
must appear in either field, and the term
smith
must appear in either field.
The cross_fields
type first analyzes the query string to produce a list of
terms, and then it searches for each term in any field. That difference alone
solves two of the three problems that we listed in Field-Centric Queries, leaving
us just with the issue of differing inverse document frequencies.
Fortunately, the cross_fields
type solves this too, as can be seen from this
validate-query
request:
GET /_validate/query?explain { "query": { "multi_match": { "query": "peter smith", "type": "cross_fields", "operator": "and", "fields": [ "first_name", "last_name" ] } } }
It solves the term-frequency problem by blending inverse document frequencies across fields:
+blended("peter", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name])
In other words, it looks up the IDF of smith
in both the first_name
and
the last_name
fields and uses the minimum of the two as the IDF for both
fields. The fact that smith
is a common last name means that it will be
treated as a common first name too.
For the cross_fields
query type to work optimally, all fields should have
the same analyzer. Fields that share an analyzer are grouped together as
blended fields.
If you include fields with a different analysis chain, they will be added to
the query in the same way as for best_fields
. For instance, if we added the
title
field to the preceding query (assuming it uses a different analyzer), the
explanation would be as follows:
(+title:peter +title:smith) ( +blended("peter", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name]) )
This is particularly important when using the minimum_should_match
and
operator
parameters.
Per-Field Boostingedit
One of the advantages of using the cross_fields
query over
custom _all
fields is that you can boost individual
fields at query time.
For fields of equal value like first_name
and last_name
, this generally
isn’t required, but if you were searching for books using the title
and
description
fields, you might want to give more weight to the title
field.
This can be done as described before with the caret (^
) syntax:
GET /books/_search { "query": { "multi_match": { "query": "peter smith", "type": "cross_fields", "fields": [ "title^2", "description" ] } } }
The advantage of being able to boost individual fields should be weighed
against the cost of querying multiple fields instead of querying a single
custom _all
field. Use whichever of the two solutions that delivers the most
bang for your buck.
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment