WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Using Synonymsedit
Synonyms can replace existing tokens or be added to the token stream by using the
synonym
token filter:
PUT /my_index { "settings": { "analysis": { "filter": { "my_synonym_filter": { "type": "synonym", "synonyms": [ "british,english", "queen,monarch" ] } }, "analyzer": { "my_synonyms": { "tokenizer": "standard", "filter": [ "lowercase", "my_synonym_filter" ] } } } } }
First, we define a token filter of type |
|
We discuss synonym formats in Formatting Synonyms. |
|
Then we create a custom analyzer that uses the |
Synonyms can be specified inline with the synonyms
parameter, or in a
synonyms file that must be present on every node in the cluster. The path to
the synonyms file should be specified with the synonyms_path
parameter, and
should be either absolute or relative to the Elasticsearch config
directory.
See Updating Stopwords for techniques that can be used to refresh the
synonyms list.
Testing our analyzer with the analyze
API shows the following:
GET /my_index/_analyze { "analyzer" : "my_synonyms", "text" : "Elizabeth is the English queen" }
A document like this will match queries for any of the following: English queen
,
British queen
, English monarch
, or British monarch
.
Even a phrase query will work, because the position of
each term has been preserved.
Using the same synonym
token filter at both index time and search time is
redundant. If, at index time, we replace English
with the two terms
english
and british
, then at search time we need to search for only one of
those terms. Alternatively, if we don’t use synonyms at index time, then at
search time, we would need to convert a query for English
into a query for
english OR british
.
Whether to do synonym expansion at search or index time can be a difficult choice. We will explore the options more in Expand or contract.
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment