analyzer（分析器） · my-elasticsearch-cn

# analyzer（分析器） `[analyzed](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/mapping-index.html)（`被分析）的 **string** **fields**（字符串字段）的值通过 [`analyzer`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis.html)（分析器）来传递，将字符串转换为一串 **`tokens`**（标记）标记或者 **`terms`**（词条）。例如，基于某种分析器，字符串 "**The quick Brown Foxes**" 被解析为 : **`quick `**`，`**`brown`，`fox `**`。`这些是索引该字段的实际 **`terms`**（词条），可以用来有效地搜索大块文本内的单个单词。这样的分析过程不仅发生在索引的时候，而且在查询时也需要 : 查询字符串需要通过相同（或类似的）**`analyzer `**分析器传递，以便尝试查找那些存在于索引的相同格式的 **`terms`**（词条）。 **Elasticsearch **内置了许多 [`pre-defined analyzers`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-analyzers.html)（预定义的分析器），可以在不进一步配置的情况下使用。它还附带许多 [`character filters`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-charfilters.html)（字符过滤器），[`tokenizers`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenizers.html)（分词器）和[`Token Filters`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-tokenfilters.html)（标记过滤器）。可以用来组合配置每个索引的自定义`analyzer`（分析器）。每一个查询，每一个字段或索引都可以指定分析器，在索引的时候，**Elasticsearch **将按以下顺序查找 **`analyzer`**（分析器）: * 定义在字段映射中的 **`analyzer`**（分析器）。 * 索引设置中 **`default`**（默认）的 **`analyzer`**（分析器）。 * [`standard`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-analyzer.html)（标准的）**`analyzer`**（分析器）。在查询时，还有几层 : * 在 [`full-text query`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/full-text-queries.html)（全文查找）中定义的 **`analyzer`**（分析器）。 * 在字段映射中定义的 **`search_analyzer`**（搜索分析器）。 * 在字段映射中定义的 **`analyzer`**（分析器）。 * 在索引配置中 **`default_search`**（默认搜索的）**`analyzer`**（分析器）。 * 索引设置中 **`default`**（默认）的 **`analyzer`**（分析器）。 * [`standard`](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/analysis-standard-analyzer.html)（标准的）**`analyzer`**（分析器）。为特定字段指定分析器的最简单的方法是在字段映射中进行定义，如下所示 : | `curl -XPUT ``'localhost:9200/my_index?pretty'` `-H ``'Content-Type: application/json'` `-d'` `{` `"mappings"``: {` `"my_type"``: {` `"properties"``: {` `"text"``: { ``# 1` `"type"``: ``"text"``,` `"fields"``: {` `"english"``: { ``# 2` `"type"``: ``"text"``,` `"analyzer"``: ``"english"` `}` `}` `}` `}` `}` `}` `}` `'` `curl -XGET ``'localhost:9200/my_index/_analyze?pretty'` `-H ``'Content-Type: application/json'` `-d' ``# 3` `{` `"field"``: ``"text"``,` `"text"``: ``"The quick Brown Foxes."` `}` `'` `curl -XGET ``'localhost:9200/my_index/_analyze?pretty'` `-H ``'Content-Type: application/json'` `-d' ``# 4` `{` `"field"``: ``"text.english"``,` `"text"``: ``"The quick Brown Foxes."` `}` `'` | | 1 | `**text** `字段使用默认的 **`standard`**（标准的）分析器。 | | 2 | **`text.english `**多字段使用 **`english `**分词器，可以删除 **`stop words`**（停用词）并应用于 **`stemming `**词干。 | | 3 | 返回 **`tokens`**（标记）: [**`the`**，**`quick`**，**`brown`**，**`foxes`**]。 | | 4 | 返回 **`tokens`**（标记）: [**`quick`**，**`brown`**，**`fox`**]。 | ## search_quote_analyzer（搜索引用分析器） `该 **search_quote_analyzer **`设置允许你为短语指定 **`analyzer`**（分析器），这在处理禁用短语的 **`stop words`**（停用词）时特别有用。要使用三个 **`analyzer`**（分析器）设置来禁用短语的停用词 : 1. 一个 **`analyzer`**（分析器）设置成索引所有的 **`terms`**（词条）包括 **`stop words`**（停用词）。 2. 一个 **`search_analyzer `**设置成将移除 **`stop words`**（停用词）的非短语查询。 3. 一个 `**search_quote_analyzer** `设置不会移除 **`stop words`**（停用词）的短语查询。 | `curl -XPUT ``'localhost:9200/my_index?pretty'` `-H ``'Content-Type: application/json'` `-d'` `{` `"settings"``:{` `"analysis"``:{` `"analyzer"``:{` `"my_analyzer"``:{ ``# 1` `"type"``:``"custom"``,` `"tokenizer"``:``"standard"``,` `"filter"``:[` `"lowercase"` `]` `},` `"my_stop_analyzer"``:{ ``# 2` `"type"``:``"custom"``,` `"tokenizer"``:``"standard"``,` `"filter"``:[` `"lowercase"``,` `"english_stop"` `]` `}` `},` `"filter"``:{` `"english_stop"``:{` `"type"``:``"stop"``,` `"stopwords"``:``"_english_"` `}` `}` `}` `},` `"mappings"``:{` `"my_type"``:{` `"properties"``:{` `"title"``: {` `"type"``:``"text"``,` `"analyzer"``:``"my_analyzer"``, ``# 3` `"search_analyzer"``:``"my_stop_analyzer"``, ``# 4` `"search_quote_analyzer"``:``"my_analyzer"` `# 5` `}` `}` `}` `}` `}` `'` | | `PUT my_index``/my_type/1` `{` `"title"``:``"The Quick Brown Fox"` `}` `PUT my_index``/my_type/2` `{` `"title"``:``"A Quick Brown Fox"` `}` `GET my_index``/my_type/_search` `{` `"query"``:{` `"query_string"``:{` `"query"``:``"\"the quick brown fox\""` `# 1` `}` `}` `}` | | 1 | **`my_analyzer `**分析器，用于标识所有 `terms`（词条）包括 **`stop words`**（停用词）。 | | 2 | 移除 `**stop** **words**`（停用词）的 **`my_stop_analyzer `**分析器。 | | 3 | **`analyzer`**（分析器）设置指向将在索引时使用的 **`my_analyzer `**分析器。 | | 4 | **`search_analyzer `**设置指向 **`my_stop_analyzer`**，并移除非短语查询的**`stop words`**（停用词）。 | | 5 | **`search_quote_analyzer `**设置指向 **`my_analyzer `**分析器，并确保 **`stop words`**（停用词）不会从短语查询中移除。 | | 1 | 由于查询时用括号括起来的,因此它被检测为短语查询。因此**`search_quote_analyzer `**会启动并确保停用词不会从查询中移除。**`my_analyzer `**分析器将返回与其中一个文档相匹配的 **`terms`**（词条）[`**the**,``**quick**,``**brown**,`**`fox`**]。同时，将通过 **`my_stop_analyzer `**分析器分析**`terms`**（词条）查询，该分析器将过滤掉 **`stop words`**（停用词）。因此，搜索 **`The quick brown fox`** 或 **`A quick brown fox`** 将返回两个文档，因为这两个文档都包含以下 **`tokens`**（词元）[`**quick**,``**brown**,`**`fox`**]。没有**`search_quote_analyzer`**，将不可能对 **phrase** **queries**（短语查询）做到精确匹配，因为短语查询时 **`stop words`**（停用词）会被删除，从而导致两个文档都会被匹配到。 |