Glossary of terms · my-elasticsearch-cn

# Glossary of terms (词汇表) ##### analysis（分析） **Analysis**（分析）是将 [full text](https://www.elastic.co/guide/en/elasticsearch/reference/5.4/glossary.html#glossary-text)（全文）转化为 [terms](https://www.elastic.co/guide/en/elasticsearch/reference/5.4/glossary.html#glossary-term)（词条）的过程。使用不同的 **analyzer**（分词器），`**FOO BAR**，``**Foo-Bar**<span style="font-family: Arial, sans-serif;">，</span>``**foo**，**bar** ``这些短语可能都会生成 `**foo **和 `**`b``ar `**`两个词条，实际的 **index**（索引）里面存储的就是这些 **terms**（词条）`*`。`*`针对`` **FoO:bAR**的 **full text query**（全文检索），会先将其分析成为 **foo,**`**bar** 这样的词条，然后匹配存储在 **index**（索引）中的 **term**（词条）。正是这个 **analysis**（分析）的过程（发生在索引和搜索时）使得 **elasticsearch **能够执行 **full text queries**（全文检索）。<span style="color: rgb(68, 68, 68);">也可以参阅 </span>[text](https://www.elastic.co/guide/en/elasticsearch/reference/5.4/glossary.html#glossary-text)（文本）<span style="color: rgb(68, 68, 68);">和 </span>[term](https://www.elastic.co/guide/en/elasticsearch/reference/5.4/glossary.html#glossary-term)（词条）了解更多细节信息。``` </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">cluster （集群）</span> **cluster**（集群）是由拥有同一个集群名的一个或者多个节点组成。每个集群拥有一个主节点，它由集群自行选举出来，在当前主节点挂了，能被其他节点取代。 </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">document （文档）</span> **document（文档）**是存储在**elasticsearch**中的**json**文档。类似于关系型数据库中的一行记录。每个文档存储在一个**index**（索引）中，它具有一个**type**（类型）和一个**id**。文档是包含零到多个**fields**（属性）或者键值对的**json**对象（类似于其他语言中的hash/hashmap/associative array）。当一个文档被**indexed**（索引）的时候，它的原始**json**文档会被存储成**_source**属性，对该文档进行**get**或者**search**操作时，默认返回的就是改属性。 </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">id </span> 文档的**ID**标识一个文档。文档的index/type/id必须唯一。如果没有提供**ID**，**elasticsearch**会自动生成一个**ID**。（查询**routing**（路由）获取更多信息） </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">field（属性） </span> 一个文档包涵一系列的属性或者键值对。它的值可以是简单标量值（如字符串，整型数，日期），或者是像数组和对象一样的嵌套结构。属性类似于关系型数据库中的列。每个属性的**mapping**（映射）都有其类型（不同于**document**（文档）的**type**（类型）），表明该属性能存储成改类型的数据，例如 `integer`, `string`, `object。**mapping**（映射）也允许你定义属性的值是否需要**analyzed**（分词）。` </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">index （索引）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">**index**（索引）类似于关系型数据库中的表。它有一个**mapping**（映射）来定义索引中的**fields**（属性），这些属性被分组成多种**type**（类型）。索引是一个逻辑命名空间，它对应一到多个**primary** **shards**（主分片）和零到多个**replica shards**（副本分片）。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">mapping （映射）</span> <span style="color: rgb(0, 0, 0);">**mapping**（映射）类似于关系型数据库中的元数据定义。每一个**index**（索引）对应一个**mapping**（映射），它定义了**index**（索引）中的每一个**type**（类型），另外还有一些索引级别的设置。**mapping**（映射）可以显式定义，或者当一个文档进行索引时自动生成。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">node （节点）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">**node**（节点）是从属于一个**elasticsearch**集群的正在运行的节点。当以测试为目的时，可以在一台主机上启动多个节点，但是通常一台主机最好运行一个节点。在启动时，节点会使用广播的方式，自动感知（网络中）具有相同集群名的集群，并尝试加入它。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">primary shard （主分片）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">每个文档存储在单**primary shard** （主分片）中。当索引一个文档时，它会首先被索引到主分片上，然后索引到主分片的所有副本上。默认情况下，一个**index**（索引）有5个**primary shard** （主分片）。根据**index**（索引）的处理能力，你可以指定更少或者更多的**primary shard** （主分片）来扩展文档数量。当**index**（索引）创建之后，**primary shard** （主分片）的数量不可更改。查询**routing**（路由）获取更多信息。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">replica shard （副本分片）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">每一个**primary shard** （主分片）拥有零到多个副本。副本是**primary shard** （主分片）的拷贝，它的存在有两个目的：</span> 1. <span class="glossterm" style="color: rgb(0, 0, 0);">增加容错：当主分片失败时，一个**replica shard**（副本分片）可以提升为**primary shard** （主分片）</span> 2. <span class="glossterm" style="color: rgb(0, 0, 0);">提升性能：**primary shard** （主分片）和**replica shard**（副本分片）都能处理**get**和**shearch**请求。默认情况下，每个**primary shard** （主分片）有一个副本，副本的个数可以动态的修改。**replica shard**（副本分片）不会和**primary shard** （主分片）分配在同一个节点上。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">routing （路由）</span> <span style="color: rgb(43, 69, 144);"> 当你索引一个文档时，它会被存储在一个单独的主分片上。通过对routing值进行哈希计算来决定具体是哪一个主分片。默认情况下，routing值是来自于文档ID，如果文档指定了一个父文档，则通过其父文档ID（保证父子文档存储在同一个分片上）。如果你不想使用默认的文档ID来作为routing值,你可以在索引时直接指定一个routing值，或者在mapping中指定一个字段的值来作为routing值。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span style="color: rgb(43, 69, 144);">shard （分片）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">**shard**（分片）是一个**Lucene**实例。它是由**elasticsearch**管理的低层次的工作单元。**index**（索引）是指向主分片和副本分片的逻辑命名空间。除了定义**index**（索引）应该具有的**primary** **shard**（主分片）和**replica shard**（副本分片）的数量之外，你不需要对**shard**（分片）作其它的工作。相反，你的代码应该只处理**index**（索引）。**elasticsearch**将**shards**（分片）分配到整个集群的所有节点上，当节点失败时可以自动将分片迁移到其他节点或者新增的节点上。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">source field （源属性）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">在默认情况下，你索引的**json** **document**（文档）会存储在_source **field**（属性）中，**get**和**search**请求会返回该**field**（属性）。这样可以直接在搜索结果中获取原始文档对象，不需要通过**ID**再检索一次文档对象。</span> </div> </div> </div> <div class="columnLayout single" data-layout="single" style="margin: 0px 0px 8px; padding: 0px; display: table; table-layout: fixed; width: 587px; color: rgb(51, 51, 51); font-family: Arial, sans-serif; line-height: 20px; widows: 1;"> <div class="cell normal" data-type="normal" style="margin: 8px 0px; padding: 0px 15px; box-sizing: border-box; word-wrap: break-word; border-radius: 5px; display: table-cell; vertical-align: top;"> <div class="innerCell" style="margin: 0px; padding: 0px; overflow-x: auto;"> ##### <span class="glossterm" style="color: rgb(43, 69, 144);">term （词条）</span> <span class="glossterm" style="color: rgb(0, 0, 0);">**term**（词条）是**elasticsearch**中被索引的确切值。`foo`, `Foo`, `FOO 这些**term**（词条）不相等。**term**（词条）可以通过词条搜索来检索。查询**text**（文本）和**anaylsis**（分词）获取更多信息。` ##### text （文本） **text**（文本）（或者说全文）是普通的非结构化文本，如一个段落。默认情况下，**text**（文本）会被**analyzed**（分词）成**term**（词条），term（词条）是实际存储在索引中的内容。文本的**field**（属性）必须在索引时完成**analyzed**（分词）来支持全文检索的功能，全文检索使用的关键词也必须在搜索时**analyzed**（分词）成索引时产生的相同**term**（词条）。查询**term**（词条）和**analysis**（分词）获取更多信息。 ##### type （类型） **type**（类型）代表文档的类型，如一封邮件，一个用户，一条推文。搜索API可以通过文档类型来过滤。**index**（索引）可以包涵多个类型，每一个**type**（类型）有一系列的**fields**（属性）。同一个**index**（索引）中不同**type**（类型）的同名**fields**（属性）必须使用相同的**mapping**（映射）（定义文档的属性如何索引以及是文档能被搜索）。