WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
The Root Objectedit
The uppermost level of a mapping is known as the root object. It may contain the following:
- A properties section, which lists the mapping for each field that a document may contain
-
Various metadata fields, all of which start with an underscore, such
as
_type
,_id
, and_source
-
Settings, which control how the dynamic detection of new fields
is handled, such as
analyzer
,dynamic_date_formats
, anddynamic_templates
-
Other settings, which can be applied both to the root object and to fields
of type
object
, such asenabled
,dynamic
, andinclude_in_all
Propertiesedit
We have already discussed the three most important settings for document fields or properties in Core Simple Field Types and Complex Core Field Types:
-
type
-
The datatype that the field contains, such as
string
ordate
-
index
-
Whether a field should be searchable as full text (
analyzed
), searchable as an exact value (not_analyzed
), or not searchable at all (no
) -
analyzer
-
Which
analyzer
to use for a full-text field, both at index time and at search time
We will discuss other field types such as ip
, geo_point
, and geo_shape
in
the appropriate sections later in the book.
Metadata: _source Fieldedit
By default, Elasticsearch stores the JSON string representing the
document body in the _source
field. Like all stored fields, the _source
field is compressed before being written to disk.
This is almost always desired functionality because it means the following:
- The full document is available directly from the search results—no need for a separate round-trip to fetch the document from another data store.
-
Partial
update
requests will not function without the_source
field. - When your mapping changes and you need to reindex your data, you can do so directly from Elasticsearch instead of having to retrieve all of your documents from another (usually slower) data store.
-
Individual fields can be extracted from the
_source
field and returned inget
orsearch
requests when you don’t need to see the whole document. - It is easier to debug queries, because you can see exactly what each document contains, rather than having to guess their contents from a list of IDs.
That said, storing the _source
field does use disk space. If none of the
preceding reasons is important to you, you can disable the _source
field with
the following mapping:
PUT /my_index { "mappings": { "my_type": { "_source": { "enabled": false } } } }
In a search request, you can ask for only certain fields by specifying the
_source
parameter in the request body:
GET /_search { "query": { "match_all": {}}, "_source": [ "title", "created" ] }
Values for these fields will be extracted from the _source
field and
returned instead of the full _source
.
Metadata: _all Fieldedit
In Search Lite, we introduced the _all
field: a special field that
indexes the values from all other fields as one big string. The query_string
query clause (and searches performed as ?q=john
) defaults to searching in
the _all
field if no other field is specified.
The _all
field is useful during the exploratory phase of a new application,
while you are still unsure about the final structure that your documents will
have. You can throw any query string at it and you have a good chance of
finding the document you’re after:
GET /_search { "match": { "_all": "john smith marketing" } }
As your application evolves and your search requirements become more exacting,
you will find yourself using the _all
field less and less. The _all
field
is a shotgun approach to search. By querying individual fields, you have more
flexbility, power, and fine-grained control over which results are considered
to be most relevant.
One of the important factors taken into account by the
relevance algorithm
is the length of the field: the shorter the field, the more important. A term
that appears in a short title
field is likely to be more important than the
same term that appears somewhere in a long content
field. This distinction
between field lengths disappears in the _all
field.
If you decide that you no longer need the _all
field, you can disable it
with this mapping:
PUT /my_index/_mapping/my_type { "my_type": { "_all": { "enabled": false } } }
Inclusion in the _all
field can be controlled on a field-by-field basis
by using the include_in_all
setting, which defaults to true
. Setting
include_in_all
on an object (or on the root object) changes the
default for all fields within that object.
You may find that you want to keep the _all
field around to use
as a catchall full-text field just for specific fields, such as
title
, overview
, summary
, and tags
. Instead of disabling the _all
field completely, disable include_in_all
for all fields by default,
and enable it only on the fields you choose:
PUT /my_index/my_type/_mapping { "my_type": { "include_in_all": false, "properties": { "title": { "type": "string", "include_in_all": true }, ... } } }
Remember that the _all
field is just an analyzed string
field. It
uses the default analyzer to analyze its values, regardless of which
analyzer has been set on the fields where the values originate. And
like any string
field, you can configure which analyzer the _all
field should use:
PUT /my_index/my_type/_mapping { "my_type": { "_all": { "analyzer": "whitespace" } } }
Metadata: Document Identityedit
There are four metadata fields associated with document identity:
-
_id
- The string ID of the document
-
_type
- The type name of the document
-
_index
- The index where the document lives
-
_uid
-
The
_type
and_id
concatenated together astype#id
By default, the _uid
field is stored (can be retrieved) and
indexed (searchable). The _type
field is indexed but not stored,
and the _id
and _index
fields are neither indexed nor stored, meaning
they don’t really exist.
In spite of this, you can query the _id
field as though it were a real
field. Elasticsearch uses the _uid
field to derive the _id
. Although you
can change the index
and store
settings for these fields, you almost
never need to do so.