ASCII folding token filter | ElasticSearch 7.7 权威指南中文版

原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-asciifolding-tokenfilter.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/analysis-asciifolding-tokenfilter.html

重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考当前版本文档。

» » »

« Apostrophe token filter CJK bigram token filter »

ASCII folding token filteredit

Converts alphabetic, numeric, and symbolic characters that are not in the Basic Latin Unicode block (first 127 ASCII characters) to their ASCII equivalent, if one exists. For example, the filter changes à to a.

This filter uses Lucene’s ASCIIFoldingFilter.

Exampleedit

The following analyze API request uses the asciifolding filter to drop the diacritical marks in açaí à la carte:

GET /_analyze
{
  "tokenizer" : "standard",
  "filter" : ["asciifolding"],
  "text" : "açaí à la carte"
}

The filter produces the following tokens:

[ acai, a, la, carte ]

Add to an analyzeredit

The following create index API request uses the asciifolding filter to configure a new custom analyzer.

PUT /asciifold_example
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "standard_asciifolding" : {
                    "tokenizer" : "standard",
                    "filter" : ["asciifolding"]
                }
            }
        }
    }
}

Configurable parametersedit

preserve_original: (Optional, boolean) If true, emit both original tokens and folded tokens. Defaults to false.

Customizeedit

To customize the asciifolding filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

For example, the following request creates a custom asciifolding filter with preserve_original set to true:

PUT /asciifold_example
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "standard_asciifolding" : {
                    "tokenizer" : "standard",
                    "filter" : ["my_ascii_folding"]
                }
            },
            "filter" : {
                "my_ascii_folding" : {
                    "type" : "asciifolding",
                    "preserve_original" : true
                }
            }
        }
    }
}

« Apostrophe token filter CJK bigram token filter »