Stop Token Filter · my-elasticsearch-cn

# Stop Token Filter（Stop 词元过滤器） **stop** 类型的词元过滤器，用于从词元流中移除**stop words**。以下是 **stop** 类型的词元过滤器的可选设置： | `stopwords` | 停止词列表. 默认 **`_english_`** 无效词. | | `stopwords_path` | 无效词配置文件路径（绝对或者相对路径）. 每个停止词自占一行 (换行分隔). 文件必须是 **UTF-8** 编码. | | `ignore_case` | 设置为 **`true` **所有词被转为小写. 默认 **`false`**. | | `remove_trailing` | 设置为 **false**，以便不忽略搜索的最后一个字词，如果它是无效词。这对于完成建议者非常有用，因为像** green a** 一样的查询可以扩展到 **green apple** ，即使你删除一般的无效词。默认为 **true**。 | **stopwords** 参数接受一个数组类型的无效词： PUT /my_index { "settings": { "analysis": { "filter": { "my_stop": { "type": "stop", "stopwords": ["and", "is", "the"] } } } } } 或预定义的语言特定列表： PUT /my_index { "settings": { "analysis": { "filter": { "my_stop": { "type": "stop", "stopwords": "_english_" } } } } } **Elasticsearch** 提供以下预定义语言列表： **`_arabic_`, `_armenian_`, `_basque_`, `_brazilian_`, `_bulgarian_`, `_catalan_`, `_czech_`, `_danish_`, `_dutch_`, `_english_`, `_finnish_`, `_french_`, `_galician_`, `_german_`, `_greek_`, `_hindi_`, `_hungarian_`,`_indonesian_`, `_irish_`, `_italian_`, `_latvian_`, `_norwegian_`, `_persian_`, `_portuguese_`, `_romanian_`, `_russian_`, `_sorani_`, `_spanish_`, `_swedish_`, `_thai_`, `_turkish_`.** 空的无效词列表（禁用无效词）使用：**`\_none_。` **