Trim token filteredit
Removes leading and trailing whitespace from each token in a stream. While this
can change the length of a token, the trim filter does not change a token’s
offsets.
The trim filter uses Lucene’s
TrimFilter.
Many commonly used tokenizers, such as the
standard or
whitespace tokenizer, remove whitespace by
default. When using these tokenizers, you don’t need to add a separate trim
filter.
Exampleedit
To see how the trim filter works, you first need to produce a token
containing whitespace.
The following analyze API request uses the
keyword tokenizer to produce a token for
" fox ".
GET _analyze
{
"tokenizer" : "keyword",
"text" : " fox "
}
The API returns the following response. Note the " fox " token contains the
original text’s whitespace. Note that despite changing the token’s length, the
start_offset and end_offset remain the same.
{
"tokens": [
{
"token": " fox ",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
To remove the whitespace, add the trim filter to the previous analyze API
request.
GET _analyze
{
"tokenizer" : "keyword",
"filter" : ["trim"],
"text" : " fox "
}
The API returns the following response. The returned fox token does not
include any leading or trailing whitespace.
{
"tokens": [
{
"token": "fox",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
Add to an analyzeredit
The following create index API request uses the trim
filter to configure a new custom analyzer.
PUT trim_example
{
"settings": {
"analysis": {
"analyzer": {
"keyword_trim": {
"tokenizer": "keyword",
"filter": [ "trim" ]
}
}
}
}
}