KStem token filteredit
Provides KStem-based stemming for
the English language. The kstem filter combines
algorithmic stemming with a built-in
dictionary.
The kstem filter tends to stem less aggressively than other English stemmer
filters, such as the porter_stem filter.
The kstem filter is equivalent to the
stemmer filter’s
light_english variant.
This filter uses Lucene’s KStemFilter.
Exampleedit
The following analyze API request uses the kstem filter to stem the foxes
jumping quickly to the fox jump quick:
GET /_analyze
{
"tokenizer": "standard",
"filter": [ "kstem" ],
"text": "the foxes jumping quickly"
}
The filter produces the following tokens:
[ the, fox, jump, quick ]
Add to an analyzeredit
The following create index API request uses the
kstem filter to configure a new custom
analyzer.
To work properly, the kstem filter requires lowercase tokens. To ensure tokens
are lowercased, add the lowercase filter
before the kstem filter in the analyzer configuration.
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"kstem"
]
}
}
}
}
}