Standard Analyzer | ElasticSearch 7.7 权威指南中文版

原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-standard-analyzer.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/analysis-standard-analyzer.html

重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考当前版本文档。

» » »

« Simple Analyzer Stop Analyzer »

Standard Analyzeredit

The standard analyzer is the default analyzer which is used if none is specified. It provides grammar based tokenization (based on the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29) and works well for most languages.

Example outputedit

POST _analyze
{
  "analyzer": "standard",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The above sentence would produce the following terms:

[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]

Configurationedit

The standard analyzer accepts the following parameters:

`max_token_length`	The maximum token length. If a token is seen that exceeds this length then it is split at `max_token_length` intervals. Defaults to `255`.
`stopwords`	A pre-defined stop words list like `_english_` or an array containing a list of stop words. Defaults to `_none_`.
`stopwords_path`	The path to a file containing stop words.

See the Stop Token Filter for more information about stop word configuration.

Example configurationedit

In this example, we configure the standard analyzer to have a max_token_length of 5 (for demonstration purposes), and to use the pre-defined list of English stop words:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_english_analyzer": {
          "type": "standard",
          "max_token_length": 5,
          "stopwords": "_english_"
        }
      }
    }
  }
}

POST my_index/_analyze
{
  "analyzer": "my_english_analyzer",
  "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}

The above example produces the following terms:

[ 2, quick, brown, foxes, jumpe, d, over, lazy, dog's, bone ]

Definitionedit

The standard analyzer consists of:

Tokenizer

Standard Tokenizer

Token Filters

Lower Case Token Filter
Stop Token Filter (disabled by default)

If you need to customize the standard analyzer beyond the configuration parameters then you need to recreate it as a custom analyzer and modify it, usually by adding token filters. This would recreate the built-in standard analyzer and you can use it as a starting point:

PUT /standard_example
{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase"       
          ]
        }
      }
    }
  }
}

You’d add any token filters after lowercase.

« Simple Analyzer Stop Analyzer »