WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Indexing a Documentedit
Documents are indexed—stored and made searchable—by using the index
API. But first, we need to decide where the document lives. As we just
discussed, a document’s _index
, _type
, and _id
uniquely identify the
document. We can either provide our own _id
value or let the index
API
generate one for us.
Using Our Own IDedit
If your document has a natural identifier (for example, a user_account
field
or some other value that identifies the document), you should provide
your own _id
, using this form of the index
API:
PUT /{index}/{type}/{id} { "field": "value", ... }
For example, if our index is called website
, our type is called blog
,
and we choose the ID 123
, then the index request looks like this:
PUT /website/blog/123 { "title": "My first blog entry", "text": "Just trying this out...", "date": "2014/01/01" }
Elasticsearch responds as follows:
{ "_index": "website", "_type": "blog", "_id": "123", "_version": 1, "created": true }
The response indicates that the document has been successfully created
and includes the _index
, _type
, and _id
metadata, and a new element:
_version
.
Every document in Elasticsearch has a version number. Every time a change is
made to a document (including deleting it), the _version
number is
incremented. In Dealing with Conflicts, we discuss how to use the _version
number to ensure that one part of your application doesn’t overwrite changes
made by another part.
Autogenerating IDsedit
If our data doesn’t have a natural ID, we can let Elasticsearch autogenerate
one for us. The structure of the request changes: instead of using the PUT
verb (“store this document at this URL”), we use the POST
verb (“store this document under this URL”).
The URL now contains just the _index
and the _type
:
POST /website/blog/ { "title": "My second blog entry", "text": "Still trying this out...", "date": "2014/01/01" }
The response is similar to what we saw before, except that the _id
field has been generated for us:
{ "_index": "website", "_type": "blog", "_id": "AVFgSgVHUP18jI2wRx0w", "_version": 1, "created": true }
Autogenerated IDs are 20 character long, URL-safe, Base64-encoded GUID strings. These GUIDs are generated from a modified FlakeID scheme which allows multiple nodes to be generating unique IDs in parallel with essentially zero chance of collision.
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment