Thai Tokenizer | ElasticSearch 7.7 权威指南中文版

原英文版地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-thai-tokenizer.html, 原文档版权归 www.elastic.co 所有
本地英文版地址: ../en/analysis-thai-tokenizer.html

重要: 此版本不会发布额外的bug修复或文档更新。最新信息请参考当前版本文档。

» » »

« Standard Tokenizer UAX URL Email Tokenizer »

Thai Tokenizeredit

The thai tokenizer segments Thai text into words, using the Thai segmentation algorithm included with Java. Text in other languages in general will be treated the same as the standard tokenizer.

This tokenizer may not be supported by all JREs. It is known to work with Sun/Oracle and OpenJDK. If your application needs to be fully portable, consider using the ICU Tokenizer instead.

Example outputedit

POST _analyze
{
  "tokenizer": "thai",
  "text": "การที่ได้ต้องแสดงว่างานดี"
}

The above sentence would produce the following terms:

[ การ, ที่, ได้, ต้อง, แสดง, ว่า, งาน, ดี ]

Configurationedit

The thai tokenizer is not configurable.

« Standard Tokenizer UAX URL Email Tokenizer »