Use constant_keyword
to speed up filteringedit
There is a general rule that the cost of a filter is mostly a function of the
number of matched documents. Imagine that you have an index containing cycles.
There are a large number of bicycles and many searches perform a filter on
cycle_type: bicycle
. This very common filter is unfortunately also very costly
since it matches most documents. There is a simple way to avoid running this
filter: move bicycles to their own index and filter bicycles by searching this
index instead of adding a filter to the query.
Unfortunately this can make client-side logic tricky, which is where
constant_keyword
helps. By mapping cycle_type
as a constant_keyword
with
value bicycle
on the index that contains bicycles, clients can keep running
the exact same queries as they used to run on the monolithic index and
Elasticsearch will do the right thing on the bicycles index by ignoring filters
on cycle_type
if the value is bicycle
and returning no hits otherwise.
Here is what mappings could look like:
PUT bicycles { "mappings": { "properties": { "cycle_type": { "type": "constant_keyword", "value": "bicycle" }, "name": { "type": "text" } } } } PUT other_cycles { "mappings": { "properties": { "cycle_type": { "type": "keyword" }, "name": { "type": "text" } } } }
We are splitting our index in two: one that will contain only bicycles, and another one that contains other cycles: unicycles, tricycles, etc. Then at search time, we need to search both indices, but we don’t need to modify queries.
GET bicycles,other_cycles/_search { "query": { "bool": { "must": { "match": { "description": "dutch" } }, "filter": { "term": { "cycle_type": "bicycle" } } } } }
On the bicycles
index, Elasticsearch will simply ignore the cycle_type
filter and rewrite the search request to the one below:
GET bicycles,other_cycles/_search { "query": { "match": { "description": "dutch" } } }
On the other_cycles
index, Elasticsearch will quickly figure out that
bicycle
doesn’t exist in the terms dictionary of the cycle_type
field and
return a search response with no hits.
This is a powerful way of making queries cheaper by putting common values in a
dedicated index. This idea can also be combined across multiple fields: for
instance if you track the color of each cycle and your bicycles
index ends up
having a majority of black bikes, you could split it into a bicycles-black
and a bicycles-other-colors
indices.
The constant_keyword
is not strictly required for this optimization: it is
also possible to update the client-side logic in order to route queries to the
relevant indices based on filters. However constant_keyword
makes it
transparently and allows to decouple search requests from the index topology in
exchange of very little overhead.