WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Shared Indexedit
We can use a large shared index for the many smaller forums by indexing the forum identifier in a field and using it as a filter:
PUT /forums { "settings": { "number_of_shards": 10 }, "mappings": { "post": { "properties": { "forum_id": { "type": "string", "index": "not_analyzed" } } } } } PUT /forums/post/1 { "forum_id": "baking", "title": "Easy recipe for ginger nuts", ... }
Create an index large enough to hold thousands of smaller forums. |
|
Each post must include a |
We can use the forum_id
as a filter to search within a single forum. The
filter will exclude most of the documents in the index (those from other
forums), and caching will ensure that responses are fast:
GET /forums/post/_search { "query": { "bool": { "must": { "match": { "title": "ginger nuts" } }, "filter": { "term": { "forum_id": { "baking" } } } } } }
This approach works, but we can do better. The posts from a single forum would fit easily onto one shard, but currently they are scattered across all ten shards in the index. This means that every search request has to be forwarded to a primary or replica of all ten shards. What would be ideal is to ensure that all the posts from a single forum are stored on the same shard.
In Routing a Document to a Shard, we explained that a document is allocated to a particular shard by using this formula:
shard = hash(routing) % number_of_primary_shards
The routing
value defaults to the document’s _id
, but we can override that
and provide our own custom routing value, such as forum_id
. All
documents with the same routing
value will be stored on the same shard:
PUT /forums/post/1?routing=baking { "forum_id": "baking", "title": "Easy recipe for ginger nuts", ... }
Using |
When we search for posts in a particular forum, we can pass the same routing
value to ensure that the search request is run on only the single shard that
holds our documents:
GET /forums/post/_search?routing=baking { "query": { "bool": { "must": { "match": { "title": "ginger nuts" } }, "filter": { "term": { "forum_id": { "baking" } } } } } }
The query is run on only the shard that corresponds to this |
|
We still need the filtering query, as a single shard can hold posts from many forums. |
Multiple forums can be queried by passing a comma-separated list of routing
values, and including each forum_id
in a terms
query:
GET /forums/post/_search?routing=baking,cooking,recipes { "query": { "bool": { "must": { "match": { "title": "ginger nuts" } }, "filter": { "terms": { "forum_id": { [ "baking", "cooking", "recipes" ] } } } } } }
While this approach is technically efficient, it looks a bit clumsy because of
the need to specify routing
values and terms
queries on every query or
indexing request. Index aliases to the rescue!
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment