WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Dealing with Null Valuesedit
Think back to our earlier example, where documents have a field named tags
.
This is a multivalue field.
A document may have one tag, many tags, or potentially no tags at all. If a field has
no values, how is it stored in an inverted index?
That’s a trick question, because the answer is: it isn’t stored at all. Let’s look at that inverted index from the previous section:
Token |
DocIDs |
|
|
|
|
How would you store a field that doesn’t exist in that data structure? You can’t! An inverted index is simply a list of tokens and the documents that contain them. If a field doesn’t exist, it doesn’t hold any tokens, which means it won’t be represented in an inverted index data structure.
Ultimately, this means that a null
, []
(an empty
array), and [null]
are all equivalent. They simply don’t exist in the
inverted index!
Obviously, the world is not simple, and data is often missing fields, or contains explicit nulls or empty arrays. To deal with these situations, Elasticsearch has a few tools to work with null or missing values.
exists Queryedit
The first tool in your arsenal is the exists
query.
This query will return documents that have any value in
the specified field. Let’s use the tagging example and index some example documents:
POST /my_index/posts/_bulk { "index": { "_id": "1" }} { "tags" : ["search"] } { "index": { "_id": "2" }} { "tags" : ["search", "open_source"] } { "index": { "_id": "3" }} { "other_field" : "some data" } { "index": { "_id": "4" }} { "tags" : null } { "index": { "_id": "5" }} { "tags" : ["search", null] }
The |
|
The |
|
The |
|
The |
|
The |
The resulting inverted index for our tags
field will look like this:
Token |
DocIDs |
|
|
|
|
Our objective is to find all documents where a tag is set. We don’t care what
the tag is, so long as it exists within the document. In SQL parlance,
we would use an IS NOT NULL
query:
SELECT tags FROM posts WHERE tags IS NOT NULL
In Elasticsearch, we use the exists
query:
GET /my_index/posts/_search { "query" : { "constant_score" : { "filter" : { "exists" : { "field" : "tags" } } } } }
Our query returns three documents:
"hits" : [ { "_id" : "1", "_score" : 1.0, "_source" : { "tags" : ["search"] } }, { "_id" : "5", "_score" : 1.0, "_source" : { "tags" : ["search", null] } }, { "_id" : "2", "_score" : 1.0, "_source" : { "tags" : ["search", "open source"] } } ]
Document 5 is returned even though it contains a |
The results are easy to understand. Any document that has terms in the
tags
field was returned as a hit. The only two documents that were excluded
were documents 3 and 4.
missing Queryedit
The missing
query is essentially
the inverse of exists
: it returns
documents where there is no value for a particular field, much like this
SQL:
SELECT tags FROM posts WHERE tags IS NULL
Let’s swap the exists
query for a missing
query from our previous example:
GET /my_index/posts/_search { "query" : { "constant_score" : { "filter": { "missing" : { "field" : "tags" } } } } }
And, as you would expect, we get back the two docs that have no real values
in the tags
field—documents 3 and 4:
"hits" : [ { "_id" : "3", "_score" : 1.0, "_source" : { "other_field" : "some data" } }, { "_id" : "4", "_score" : 1.0, "_source" : { "tags" : null } } ]
exists/missing on Objectsedit
The exists
and missing
queries
also
work on inner objects, not just core types. With the following document
{ "name" : { "first" : "John", "last" : "Smith" } }
you can check for the existence of name.first
and name.last
but also just
name
. However, in Types and Mappings, we said that an object like the preceding one is
flattened internally into a simple field-value structure, much like this:
{ "name.first" : "John", "name.last" : "Smith" }
So how can we use an exists
or missing
query on the name
field, which
doesn’t really exist in the inverted index?
The reason that it works is that a filter like
{ "exists" : { "field" : "name" } }
is really executed as
{ "bool": { "should": [ { "exists": { "field": "name.first" }}, { "exists": { "field": "name.last" }} ] } }
That also means that if first
and last
were both empty, the name
namespace would not exist.
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment