WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.
This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.
Document Metadataedit
A document doesn’t consist only of its data. It also has metadata—information about the document. The three required metadata elements are as follows:
-
_index
- Where the document lives
-
_type
- The class of object that the document represents
-
_id
- The unique identifier for the document
_indexedit
An index is a collection of documents that should be grouped together for a
common reason. For example, you may store all your products in a products
index,
while all your sales transactions go in sales
. Although it is possible to store
unrelated data together in a single index, it is often considered an anti-pattern.
Actually, in Elasticsearch, our data is stored and indexed in shards, while an index is just a logical namespace that groups together one or more shards. However, this is an internal detail; our application shouldn’t care about shards at all. As far as our application is concerned, our documents live in an index. Elasticsearch takes care of the details.
We cover how to create and manage indices ourselves in Index Management,
but for now we will let Elasticsearch create the index for us. All we have to
do is choose an index name. This name must be lowercase, cannot begin with an
underscore, and cannot contain commas. Let’s use website
as our index name.
_typeedit
Data may be grouped loosely together in an index, but often there are sub-partitions inside that data which may be useful to explicitly define. For example, all your products may go inside a single index. But you have different categories of products, such as "electronics", "kitchen" and "lawn-care".
The documents all share an identical (or very similar) schema: they have a title, description, product code, price. They just happen to belong to sub-categories under the umbrella of "Products".
Elasticsearch exposes a feature called types which allows you to logically partition data inside of an index. Documents in different types may have different fields, but it is best if they are highly similar. We’ll talk more about the restrictions and applications of types in Types and Mappings.
A _type
name can be lowercase or uppercase, but shouldn’t begin with an
underscore or period. It also may not contain commas,
and is limited to a length of 256 characters. We will use blog
for our type name.
_idedit
The ID is a string that, when combined with the _index
and _type
,
uniquely identifies a document in Elasticsearch. When creating a new document,
you can either provide your own _id
or let Elasticsearch generate one for
you.
Other Metadataedit
There are several other metadata elements, which are presented in Types and Mappings. With the elements listed previously, we are already able to store a document in Elasticsearch and to retrieve it by ID—in other words, to use Elasticsearch as a document store.
- Elasticsearch - The Definitive Guide:
- Foreword
- Preface
- Getting Started
- You Know, for Search…
- Installing and Running Elasticsearch
- Talking to Elasticsearch
- Document Oriented
- Finding Your Feet
- Indexing Employee Documents
- Retrieving a Document
- Search Lite
- Search with Query DSL
- More-Complicated Searches
- Full-Text Search
- Phrase Search
- Highlighting Our Searches
- Analytics
- Tutorial Conclusion
- Distributed Nature
- Next Steps
- Life Inside a Cluster
- Data In, Data Out
- What Is a Document?
- Document Metadata
- Indexing a Document
- Retrieving a Document
- Checking Whether a Document Exists
- Updating a Whole Document
- Creating a New Document
- Deleting a Document
- Dealing with Conflicts
- Optimistic Concurrency Control
- Partial Updates to Documents
- Retrieving Multiple Documents
- Cheaper in Bulk
- Distributed Document Store
- Searching—The Basic Tools
- Mapping and Analysis
- Full-Body Search
- Sorting and Relevance
- Distributed Search Execution
- Index Management
- Inside a Shard
- You Know, for Search…
- Search in Depth
- Structured Search
- Full-Text Search
- Multifield Search
- Proximity Matching
- Partial Matching
- Controlling Relevance
- Theory Behind Relevance Scoring
- Lucene’s Practical Scoring Function
- Query-Time Boosting
- Manipulating Relevance with Query Structure
- Not Quite Not
- Ignoring TF/IDF
- function_score Query
- Boosting by Popularity
- Boosting Filtered Subsets
- Random Scoring
- The Closer, The Better
- Understanding the price Clause
- Scoring with Scripts
- Pluggable Similarity Algorithms
- Changing Similarities
- Relevance Tuning Is the Last 10%
- Dealing with Human Language
- Aggregations
- Geolocation
- Modeling Your Data
- Administration, Monitoring, and Deployment