原文地址: https://www.elastic.co/guide/en/elasticsearch/reference/7.7/analysis-whitespace-tokenizer.html, 原文档版权归 www.elastic.co 所有
IMPORTANT: No additional bug fixes or documentation updates
will be released for this version. For the latest information, see the
current release documentation.
Whitespace Tokenizeredit
The whitespace
tokenizer breaks text into terms whenever it encounters a
whitespace character.
Example outputedit
POST _analyze { "tokenizer": "whitespace", "text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }
The above sentence would produce the following terms:
[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]
Configurationedit
The whitespace
tokenizer accepts the following parameters:
|
The maximum token length. If a token is seen that exceeds this length then
it is split at |