WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Sortingedit
In order to sort by relevance, we need to represent relevance as a value. In
Elasticsearch, the relevance score is represented by the floating-point
number returned in the search results as the _score
, so the default sort
order is _score
descending.
Sometimes, though, you don’t have a meaningful relevance score. For instance,
the following query just returns all tweets whose user_id
field has the
value 1
:
GET /_search { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 } } } } }
There isn’t a meaningful score here: because we are using a filter, we are indicating
that we just want the documents that match user_id: 1
with no attempt to determine
relevance. Documents will be returned in effectively random order, and each document
will have a score of zero.
If a score of zero makes your life difficult for logistical reasons, you can use
a constant_score
query instead:
GET /_search { "query" : { "constant_score" : { "filter" : { "term" : { "user_id" : 1 } } } } }
This will apply a constant score (default of 1
) to all documents. It will
perform the same as the above query, and all documents will be returned randomly
like before, they’ll just have a score of one instead of zero.
Sorting by Field Valuesedit
In this case, it probably makes sense to sort tweets by recency, with the most
recent tweets first. We can do this with the sort
parameter:
GET /_search { "query" : { "bool" : { "filter" : { "term" : { "user_id" : 1 }} } }, "sort": { "date": { "order": "desc" }} }
You will notice two differences in the results:
"hits" : { "total" : 6, "max_score" : null, "hits" : [ { "_index" : "us", "_type" : "tweet", "_id" : "14", "_score" : null, "_source" : { "date": "2014-09-24", ... }, "sort" : [ 1411516800000 ] }, ... }
The |
|
The value of the |
The first is that we have a new element in each result called sort
, which
contains the value(s) that was used for sorting. In this case, we sorted on
date
, which internally is indexed as milliseconds since the epoch. The long
number 1411516800000
is equivalent to the date string 2014-09-24 00:00:00
UTC
.
The second is that the _score
and max_score
are both null
. Calculating
the _score
can be quite expensive, and usually its only purpose is for
sorting; we’re not sorting by relevance, so it doesn’t make sense to keep
track of the _score
. If you want the _score
to be calculated regardless,
you can set the track_scores
parameter to true
.
As a shortcut, you can specify just the name of the field to sort on:
"sort": "number_of_children"
Fields will be sorted in ascending order by default, and
the _score
value in descending order.
Multilevel Sortingedit
Perhaps we want to combine the _score
from a query with the date
, and
show all matching results sorted first by date, then by relevance:
GET /_search { "query" : { "bool" : { "must": { "match": { "tweet": "manage text search" }}, "filter" : { "term" : { "user_id" : 2 }} } }, "sort": [ { "date": { "order": "desc" }}, { "_score": { "order": "desc" }} ] }
Order is important. Results are sorted by the first criterion first. Only
results whose first sort
value is identical will then be sorted by the
second criterion, and so on.
Multilevel sorting doesn’t have to involve the _score
. You could sort
by using several different fields, on geo-distance or on a custom value
calculated in a script.
Query-string search also supports custom sorting, using the sort
parameter
in the query string:
GET /_search?sort=date:desc&sort=_score&q=search
Sorting on Multivalue Fieldsedit
When sorting on fields with more than one value, remember that the values do not have any intrinsic order; a multivalue field is just a bag of values. Which one do you choose to sort on?
For numbers and dates, you can reduce a multivalue field to a single value
by using the min
, max
, avg
, or sum
sort modes. For instance, you
could sort on the earliest date in each dates
field by using the following:
"sort": { "dates": { "order": "asc", "mode": "min" } }
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment