WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Phrase Matchingedit
In the same way that the match
query is the go-to query for standard
full-text search, the match_phrase
query is the one you should reach for
when you want to find words that are near each other:
GET /my_index/my_type/_search { "query": { "match_phrase": { "title": "quick brown fox" } } }
Like the match
query, the match_phrase
query first analyzes the query
string to produce a list of terms. It then searches for all the terms, but
keeps only documents that contain all of the search terms, in the same
positions relative to each other. A query for the phrase quick fox
would not match any of our documents, because no document contains the word
quick
immediately followed by fox
.
The match_phrase
query can also be written as a match
query with type
phrase
:
"match": { "title": { "query": "quick brown fox", "type": "phrase" } }
Term Positionsedit
When a string is analyzed, the analyzer returns not only a list of terms, but also the position, or order, of each term in the original string:
GET /_analyze?analyzer=standard Quick brown fox
This returns the following:
{ "tokens": [ { "token": "quick", "start_offset": 0, "end_offset": 5, "type": "<ALPHANUM>", "position": 1 }, { "token": "brown", "start_offset": 6, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 }, { "token": "fox", "start_offset": 12, "end_offset": 15, "type": "<ALPHANUM>", "position": 3 } ] }
Positions can be stored in the inverted index, and position-aware queries like
the match_phrase
query can use them to match only documents that contain
all the words in exactly the order specified, with no words in-between.
What Is a Phraseedit
For a document to be considered a match for the phrase “quick brown fox”, the following must be true:
-
quick
,brown
, andfox
must all appear in the field. -
The position of
brown
must be1
greater than the position ofquick
. -
The position of
fox
must be2
greater than the position ofquick
.
If any of these conditions is not met, the document is not considered a match.
Internally, the match_phrase
query uses the low-level span
query family to
do position-aware matching. Span queries are term-level queries, so they have
no analysis phase; they search for the exact term specified.
Thankfully, most people never need to use the span
queries directly, as the
match_phrase
query is usually good enough. However, certain specialized
fields, like patent searches, use these low-level queries to perform very
specific, carefully constructed positional searches.
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment