Pattern replace token filter | ElasticSearch 7.7 权威指南中文版

原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-pattern_replace-tokenfilter.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/analysis-pattern_replace-tokenfilter.html

重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考当前版本文档。

» » »

« Pattern capture token filter Phonetic token filter »

Pattern replace token filteredit

Uses a regular expression to match and replace token substrings.

The pattern_replace filter uses Java’s regular expression syntax. By default, the filter replaces matching substrings with an empty substring ("").

Replacement substrings can use Java’s $g syntax to reference capture groups from the original token text.

A poorly-written regular expression may run slowly or return a StackOverflowError, causing the node running the expression to exit suddenly.

This filter uses Lucene’s PatternReplaceFilter.

Exampleedit

The following analyze API request uses the pattern_replace filter to prepend watch to the substring dog in foxes jump lazy dogs.

GET /_analyze
{
  "tokenizer": "whitespace",
  "filter": [
    {
      "type": "pattern_replace",
      "pattern": "(dog)",
      "replacement": "watch$1"
    }
  ],
  "text": "foxes jump lazy dogs"
}

The filter produces the following tokens.

[ foxes, jump, lazy, watchdogs ]

Configurable parametersedit

all: (Optional, boolean) If true, all substrings matching the pattern parameter’s regular expression are replaced. If false, the filter replaces only the first matching substring in each token. Defaults to true.
pattern: (Required, string) Regular expression, written in Java’s regular expression syntax. The filter replaces token substrings matching this pattern with the substring in the replacement parameter.
replacement: (Optional, string) Replacement substring. Defaults to an empty substring ("").

Customize and add to an analyzeredit

To customize the pattern_replace filter, duplicate it to create the basis for a new custom token filter. You can modify the filter using its configurable parameters.

The following create index API request configures a new custom analyzer using a custom pattern_replace filter, my_pattern_replace_filter.

The my_pattern_replace_filter filter uses the regular expression [£|€] to match and remove the currency symbols £ and €. The filter’s all parameter is false, meaning only the first matching symbol in each token is removed.

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "keyword",
          "filter": [
            "my_pattern_replace_filter"
          ]
        }
      },
      "filter": {
        "my_pattern_replace_filter": {
          "type": "pattern_replace",
          "pattern": "[£|€]",
          "replacement": "",
          "all": false
        }
      }
    }
  }
}

« Pattern capture token filter Phonetic token filter »