原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-charfilters.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/analysis-charfilters.html
本地英文版地址: ../en/analysis-charfilters.html
重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考 当前版本文档。
Character filters referenceedit
Character filters are used to preprocess the stream of characters before it is passed to the tokenizer.
A character filter receives the original text as a stream of characters and
can transform the stream by adding, removing, or changing characters. For
instance, a character filter could be used to convert Hindu-Arabic numerals
(٠١٢٣٤٥٦٧٨٩) into their Arabic-Latin equivalents (0123456789), or to strip HTML
elements like <b>
from the stream.
Elasticsearch has a number of built in character filters which can be used to build custom analyzers.
- HTML Strip Character Filter
-
The
html_strip
character filter strips out HTML elements like<b>
and decodes HTML entities like&
. - Mapping Character Filter
-
The
mapping
character filter replaces any occurrences of the specified strings with the specified replacements. - Pattern Replace Character Filter
-
The
pattern_replace
character filter replaces any characters matching a regular expression with the specified replacement.