ignore_above | Elasticsearch Guide [7.7]

原文地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/ignore-above.html, 原文档版权归 www.elastic.co 所有

IMPORTANT: No additional bug fixes or documentation updates will be released for this version. For the latest information, see the current release documentation.

» » » ignore_above

« format ignore_malformed »

`ignore_above`edit

Strings longer than the ignore_above setting will not be indexed or stored. For arrays of strings, ignore_above will be applied for each array element separately and string elements longer than ignore_above will not be indexed or stored.

All strings/array elements will still be present in the _source field, if the latter is enabled which is the default in Elasticsearch.

PUT my_index
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword",
        "ignore_above": 20 
      }
    }
  }
}

PUT my_index/_doc/1 
{
  "message": "Syntax error"
}

PUT my_index/_doc/2 
{
  "message": "Syntax error with some long stacktrace"
}

GET my_index/_search 
{
  "aggs": {
    "messages": {
      "terms": {
        "field": "message"
      }
    }
  }
}

	This field will ignore any string longer than 20 characters.
	This document is indexed successfully.
	This document will be indexed, but without indexing the `message` field.
	Search returns both documents, but only the first is present in the terms aggregation.

The ignore_above setting can be updated on existing fields using the PUT mapping API.

This option is also useful for protecting against Lucene’s term byte-length limit of 32766.

The value for ignore_above is the character count, but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.

« format ignore_malformed »

ignore_aboveedit

`ignore_above`edit