自定义分析器 · my-elasticsearch-cn

# 自定义分析器当内置分析器不能满足您的需求时，您可以创建一个custom分析器，它使用以下相应的组合： * 零个或多个[字符过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html) * 一个[ 分析器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenizers.html) * 零个或多个[token过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html)。 ## 配置 custom（自定义）分析器接受以下的参数： | `tokenizer` | 内置或定制的标记器。（需要） | | `char_filter` | 内置或自定义[字符过滤器](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-charfilters.html)的可选阵列。 | | `filter` | 可选的内置或定制token过滤器阵列。 | | `position_increment_gap` | 在索引文本值数组时，Elasticsearch会在一个值的最后一个值和下一个值的第一个项之间插入假的“间隙”，以确保短语查询与不同数组元素的两个术语不匹配。默认为100.有关更多信息，请参阅[position_increment_gap](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/position-increment-gap.html)。 | ## 配置示例以下是一个结合以下内容的示例：字符过滤器 * [HTML Strip Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-htmlstrip-charfilter.html "HTML Strip Char Filter") 分词器 * [Standard Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-tokenizer.html "Standard Tokenizer") Token 分析器 * [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html "Lowercase Token Filter") * [ASCII-Folding Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-asciifolding-tokenfilter.html "ASCII Folding Token Filter") | `PUT my_index` `{` `"settings"``: {` `"analysis"``: {` `"analyzer"``: {` `"my_custom_analyzer"``: {` `"type"``: ``"custom"``,` `"tokenizer"``: ``"standard"``,` `"char_filter"``: [` `"html_strip"` `],` `"filter"``: [` `"lowercase"``,` `"asciifolding"` `]` `}` `}` `}` `}` `}` `POST my_index/_analyze` `{` `"analyzer"``: ``"my_custom_analyzer"``,` `"text"``: ``"Is this <b>déjà vu</b>?"` `}` | 上述句子将产生以下词语： | `[ is, ``this``, deja, vu ]` | 前面的例子使用了默认配置的tokenizer，令牌过滤器和字符过滤器，但是可以创建每个配置的版本并在自定义分析器中使用它们。以下是一个比较复杂的例子：字符过滤器 * [Mapping Character Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-mapping-charfilter.html "Mapping Char Filter"), 分词器 * [Pattern Tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-pattern-tokenizer.html "Pattern Tokenizer"), 配置为分割标点符号 Token 分析器 * [Lowercase Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-lowercase-tokenfilter.html "Lowercase Token Filter")(小写分析器) * [Stop Token Filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-stop-tokenfilter.html "Stop Token Filter")(停止分析器), 配置为使用预定义的英文停止词列表 ## 示例 | `PUT my_index` `{` `"settings"``: {` `"analysis"``: {` `"analyzer"``: {` `"my_custom_analyzer"``: {` `"type"``: ``"custom"``,` `"char_filter"``: [` `"emoticons"` `],` `"tokenizer"``: ``"punctuation"``,` `"filter"``: [` `"lowercase"``,` `"english_stop"` `]` `}` `},` `"tokenizer"``: {` `"punctuation"``: {` `"type"``: ``"pattern"``,` `"pattern"``: ``"[ .,!?]"` `}` `},` `"char_filter"``: {` `"emoticons"``: {` `"type"``: ``"mapping"``,` `"mappings"``: [` `":) => _happy_"``,` `":( => _sad_"` `]` `}` `},` `"filter"``: {` `"english_stop"``: {` `"type"``: ``"stop"``,` `"stopwords"``: ``"_english_"` `}` `}` `}` `}` `}` `POST my_index/_analyze` `{` `"analyzer"``: ``"my_custom_analyzer"``,` `"text"``: ``"I'm a :) person, and you?"` `}` | | [![](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/images/icons/callouts/1.png)](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-custom-analyzer.html#CO283-1) | 表情符号字符过滤器，标点符号化器和english_stop令牌过滤器是在相同索引设置中定义的自定义实现。 | 以上示例产生以下词语： | `[ i'm, _happy_, person, you ]` |