💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
# Synonym Token Filter(Synonym 词元过滤器) 原文链接 : [https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html) 译文链接 : [http://www.apache.wiki/pages/viewpage.action?pageId=10028859](http://www.apache.wiki/pages/viewpage.action?pageId=10028859) 贡献者 : [fucker](/display/~caizhongjie),[ApacheCN](/display/~apachecn),[Apache中文网](/display/~apachechina) **`synonym` (同义词)**词元过滤器允许在分析过程中轻松处理同义词。 同义词使用配置文件配置。示例如下: ``` PUT /test_index { "settings": { "index" : { "analysis" : { "analyzer" : { "synonym" : { "tokenizer" : "whitespace", "filter" : ["synonym"] } }, "filter" : { "synonym" : { "type" : "synonym", "synonyms_path" : "analysis/synonym.txt" } } } } } } ``` 以上配置一个 **`synonym` **(同义词)过滤器,其中包含一个路径 **analysis/synonym.txt**(相对于 **config **的位置)。 然后使用过滤器配置 **`synonym`  **同义词分析器。 其他设置有:**ignore_case**(默认为 **false**),和 **`expand` **(默认为 **true**)。 **tokenizer** 参数控制将用于标记同义词的分词器,并且默认为 **`whitespace`**  分词器。 支持两种同义词格式:**Solr,WordNet**。 #### **Solr synonyms** 以下是文件的示例格式: ``` # Blank lines and lines starting with pound are comments. # Explicit mappings match any token sequence on the LHS of "=>" # and replace with all alternatives on the RHS. These types of mappings # ignore the expand parameter in the schema. # Examples: i-pod, i pod => ipod, sea biscuit, sea biscit => seabiscuit # Equivalent synonyms may be separated with commas and give # no explicit mapping. In this case the mapping behavior will # be taken from the expand parameter in the schema. This allows # the same synonym file to be used in different synonym handling strategies. # Examples: ipod, i-pod, i pod foozball , foosball universe , cosmos lol, laughing out loud # If expand==true, "ipod, i-pod, i pod" is equivalent # to the explicit mapping: ipod, i-pod, i pod => ipod, i-pod, i pod # If expand==false, "ipod, i-pod, i pod" is equivalent # to the explicit mapping: ipod, i-pod, i pod => ipod # Multiple synonym mapping entries are merged. foo => foo bar foo => baz # is equivalent to foo => foo bar, baz ``` 您也可以在配置文件中直接给过滤器定义同义词(请注意使用 **synonyms **而不是 **synonyms_path **): ``` PUT /test_index { "settings": { "index" : { "analysis" : { "filter" : { "synonym" : { "type" : "synonym", "synonyms" : [ "i-pod, i pod => ipod", "universe, cosmos" ] } } } } } } ``` 但是,建议使用 **synonyms_path **在文件中定义大型同义词集,因为内联指定会不必要地增加群集大小。 #### **WordNet synonyms** 基于 WordNet 格式的 同义词 可以如下使用格式声明: ```  PUT /test_index { "settings": { "index" : { "analysis" : { "filter" : { "synonym" : { "type" : "synonym", "format" : "wordnet", "synonyms" : [ "s(100000001,1,'abstain',v,1,0).", "s(100000001,2,'refrain',v,1,0).", "s(100000001,3,'desist',v,1,0)." ] } } } } } } ``` 同时支持使用 **`synonyms_path`  **在文本中定义 **WordNet synonyms**。