Finding Multiple Exact Values | Elasticsearch: The Definitive Guide [2.x]

原文地址: https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_multiple_exact_values.html, 版权归 www.elastic.co 所有

WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.

This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.

» » »

« Combining Filters Ranges »

Finding Multiple Exact Valuesedit

The term query is useful for finding a single value, but often you’ll want to search for multiple values. What if you want to find documents that have a price of $20 or $30?

Rather than using multiple term queries, you can instead use a single terms query (note the s at the end). The terms query is simply the plural version of the singular term query cousin.

It looks nearly identical to a vanilla term too. Instead of specifying a single price, we are now specifying an array of values:

{
    "terms" : {
        "price" : [20, 30]
    }
}

And like the term query, we will place it inside the filter clause of a constant score query to use it:

GET /my_store/products/_search
{
    "query" : {
        "constant_score" : {
            "filter" : {
                "terms" : { 
                    "price" : [20, 30]
                }
            }
        }
    }
}

The terms query as seen previously, but placed inside the constant_score query

The query will return the second, third, and fourth documents:

"hits" : [
    {
        "_id" :    "2",
        "_score" : 1.0,
        "_source" : {
          "price" :     20,
          "productID" : "KDKE-B-9947-#kL5"
        }
    },
    {
        "_id" :    "3",
        "_score" : 1.0,
        "_source" : {
          "price" :     30,
          "productID" : "JODL-X-1937-#pV7"
        }
    },
    {
        "_id":     "4",
        "_score":  1.0,
        "_source": {
           "price":     30,
           "productID": "QQPX-R-3956-#aD8"
        }
     }
]

Contains, but Does Not Equaledit

It is important to understand that term and terms are contains operations, not equals. What does that mean?

If you have a term query for { "term" : { "tags" : "search" } }, it will match both of the following documents:

{ "tags" : ["search"] }
{ "tags" : ["search", "open_source"] }

This document is returned, even though it has terms other than search.

Recall how the term query works: it checks the inverted index for all documents that contain a term, and then constructs a bitset. In our simple example, we have the following inverted index:

Token	DocIDs
`open_source`	`2`
`search`	`1`,`2`

When a term query is executed for the token search, it goes straight to the corresponding entry in the inverted index and extracts the associated doc IDs. As you can see, both document 1 and document 2 contain the token in the inverted index. Therefore, they are both returned as a result.

The nature of an inverted index also means that entire field equality is rather difficult to calculate. How would you determine whether a particular document contains only your request term? You would have to find the term in the inverted index, extract the document IDs, and then scan every row in the inverted index, looking for those IDs to see whether a doc has any other terms.

As you might imagine, that would be tremendously inefficient and expensive. For that reason, term and terms are must contain operations, not must equal exactly.

Equals Exactlyedit

If you do want that behavior—entire field equality—the best way to accomplish it involves indexing a secondary field. In this field, you index the number of values that your field contains. Using our two previous documents, we now include a field that maintains the number of tags:

{ "tags" : ["search"], "tag_count" : 1 }
{ "tags" : ["search", "open_source"], "tag_count" : 2 }

Once you have the count information indexed, you can construct a constant_score that enforces the appropriate number of terms:

GET /my_index/my_type/_search
{
    "query": {
        "constant_score" : {
            "filter" : {
                 "bool" : {
                    "must" : [
                        { "term" : { "tags" : "search" } }, 
                        { "term" : { "tag_count" : 1 } } 
                    ]
                }
            }
        }
    }
}

	Find all documents that have the term `search`.
	But make sure the document has only one tag.

This query will now match only the document that has a single tag that is search, rather than any document that contains search.

« Combining Filters Ranges »