属性方法 · JAVA

[TOC] # Mapping参数 ![](https://img.kancloud.cn/35/8e/358e31e4b4e0838b58a32f56e88104e2_931x531.png) ## analyzer * 分词器，默认为standard analyzer，当该字段被索引和搜索时对字段进行分词处理 ## boost * 字段权重，默认为1.0 ## dynamic * Mapping中的字段类型一旦设定后，禁止直接修改，原因是：Lucene实现的倒排索引生成后不允许修改 * 只能新建一个索引，然后reindex数据 * 默认允许新增字段 * 通过dynamic参数来控制字段的新增： * true（默认）允许自动新增字段 * false 不允许自动新增字段，但是文档可以正常写入，但无法对新增字段进行查询等操作 * strict 文档不能写入，报错 ~~~ PUT my_index { "mappings": { "_doc": { "dynamic": false, "properties": { "user": { "properties": { "name": { "type": "text" }, "social_networks": { "dynamic": true, "properties": {} } } } } } } } ~~~ 定义后my\_index这个索引下不能自动新增字段，但是在user.social\_networks下可以自动新增子字段 ## copy\_to * 将该字段复制到目标字段，实现类似\_all的作用 * 不会出现在\_source中，只用来搜索 ~~~ DELETE my_index PUT my_index { "mappings": { "doc": { "properties": { "first_name": { "type": "text", "copy_to": "full_name" }, "last_name": { "type": "text", "copy_to": "full_name" }, "full_name": { "type": "text" } } } } } PUT my_index/doc/1 { "first_name": "John", "last_name": "Smith" } GET my_index/_search { "query": { "match": { "full_name": { "query": "John Smith", "operator": "and" } } } } ~~~ ## index * 控制当前字段是否索引，默认为true，即记录索引，false不记录，即不可搜索 ## index\_options * index\_options参数控制将哪些信息添加到倒排索引，以用于搜索和突出显示，可选的值有：docs，freqs，positions，offsets * docs：只索引 doc id * freqs：索引 doc id 和词频，平分时可能要用到词频 * positions：索引 doc id、词频、位置，做 proximity or phrase queries 时可能要用到位置信息 * offsets：索引doc id、词频、位置、开始偏移和结束偏移，高亮功能需要用到offsets ## fielddata * 是否预加载 fielddata，默认为false * Elasticsearch第一次查询时完整加载这个字段所有 Segment 中的倒排索引到内存中 * 如果我们有一些 5 GB 的索引段，并希望加载 10 GB 的 fielddata 到内存中，这个过程可能会要数十秒 * 将 fielddate 设置为 true ,将载入 fielddata 的代价转移到索引刷新的时候，而不是查询时，从而大大提高了搜索体验 * 参考：[预加载 fielddata](https://www.elastic.co/guide/cn/elasticsearch/guide/current/preload-fielddata.html) ## eager\_global\_ordinals * 是否预构建全局序号，默认false * 参考：[预构建全局序号（Eager global ordinals）](https://www.elastic.co/guide/cn/elasticsearch/guide/current/preload-fielddata.html#global-ordinals) ## doc\_values * 参考：[Doc Values and Fielddata](https://www.elastic.co/guide/cn/elasticsearch/guide/current/docvalues-and-fielddata.html) ## fields * 该参数的目的是为了实现 multi-fields * 一个字段，多种数据类型 * 譬如：一个字段 city 的数据类型为 text ，用于全文索引，可以通过 fields 为该字段定义 keyword 类型，用于排序和聚合 ~~~ # 设置 mapping PUT my_index { "mappings": { "_doc": { "properties": { "city": { "type": "text", "fields": { "raw": { "type": "keyword" } } } } } } } # 插入两条数据 PUT my_index/_doc/1 { "city": "New York" } PUT my_index/_doc/2 { "city": "York" } # 查询，city用于全文索引 match，city.raw用于排序和聚合 GET my_index/_search { "query": { "match": { "city": "york" } }, "sort": { "city.raw": "asc" }, "aggs": { "Cities": { "terms": { "field": "city.raw" } } } } ~~~ ## format * 由于JSON没有date类型，Elasticsearch预先通过format参数定义时间格式，将匹配的字符串识别为date类型，转换为时间戳（单位：毫秒） * format默认为：`strict_date_optional_time||epoch_millis` * Elasticsearch内建的时间格式: ![](https://img.kancloud.cn/4d/b9/4db9e4fdd6aed8f81227c904799fb12c_362x693.png) * 上述名称加前缀`strict_`表示为严格格式 * 更多的查看文档 ## properties * 用于\_doc，object和nested类型的字段定义**子字段** ~~~ PUT my_index { "mappings": { "_doc": { "properties": { "manager": { "properties": { "age": { "type": "integer" }, "name": { "type": "text" } } }, "employees": { "type": "nested", "properties": { "age": { "type": "integer" }, "name": { "type": "text" } } } } } } } PUT my_index/_doc/1 { "region": "US", "manager": { "name": "Alice White", "age": 30 }, "employees": [ { "name": "John Smith", "age": 34 }, { "name": "Peter Brown", "age": 26 } ] } ~~~ ## normalizer * 与 analyzer 类似，只不过 analyzer 用于 text 类型字段，分词产生多个 token，而 normalizer 用于 keyword 类型，只产生一个 token（整个字段的值作为一个token，而不是分词拆分为多个token） * 定义一个自定义 normalizer，使用大写uppercase过滤器 ~~~ PUT test_index_4 { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["uppercase", "asciifolding"] } } } }, "mappings": { "_doc": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } } # 插入数据 POST test_index_4/_doc/1 { "foo": "hello world" } POST test_index_4/_doc/2 { "foo": "Hello World" } POST test_index_4/_doc/3 { "foo": "hello elasticsearch" } # 搜索hello，结果为空，而不是3条！！ GET test_index_4/_search { "query": { "match": { "foo": "hello" } } } # 搜索 hello world，结果2条，1 和 2 GET test_index_4/_search { "query": { "match": { "foo": "hello world" } } } ~~~ ## 其他字段 * coerce * 强制类型转换，把json中的值转为ES中字段的数据类型，譬如：把字符串"5"转为integer的5 * coerce默认为 true * 如果coerce设置为 false，当json的值与es字段类型不匹配将会 rejected * 通过 "settings": { "index.mapping.coerce": false } 设置索引的 coerce * enabled * 是否索引，默认为 true * 可以在\_doc和字段两个粒度进行设置 * ignore\_above * 设置能被索引的字段的长度 * 超过这个长度，该字段将不被索引，所以无法搜索，但聚合的terms可以看到 * null\_value * 该字段定义遇到null值时的处理策略，默认为Null，即空值，此时ES会忽略该值 * 通过设定该值可以设定字段为 null 时的默认值 * ignore\_malformed * 当数据类型不匹配且 coerce 强制转换时,默认情况会抛出异常,并拒绝整个文档的插入 * 若设置该参数为 true，则忽略该异常，并强制赋值，但是不会被索引，其他字段则照常 * norms * norms 存储各种标准化因子，为后续查询计算文档对该查询的匹配分数提供依据 * norms 参数对**评分**很有用，但需要占用大量的磁盘空间 * 如果不需要计算字段的评分，可以取消该字段 norms 的功能 * position\_increment\_gap * 与 proximity queries（近似查询）和 phrase queries（短语查询）有关 * 默认值 100 * search\_analyzer * 搜索分词器，查询时使用 * 默认与 analyzer 一样 * similarity * 设置相关度算法，ES5.x 和 ES6.x 默认的算法为 BM25 * 另外也可选择 classic 和 boolean * store * store 的意思是：是否在 \_source 之外在独立存储一份，默认值为 false * es在存储数据的时候把json对象存储到"\_source"字段里，"\_source"把所有字段保存为一份文档存储（读取需要1次IO），要取出某个字段则通过 source filtering 过滤 * 当字段比较多或者内容比较多，并且不需要取出所有字段的时候，可以把特定字段的store设置为true单独存储（读取需要1次IO），同时在\_source设置exclude * 关于该字段的理解，参考： [es设置mapping store属性](https://blog.csdn.net/helllochun/article/details/52136954) * term\_vector * 与倒排索引相关