2.文档基本CRUD、集群健康检查 · java核心知识整理

## 一、 document数据格式 ### 面向文档的搜索分析引擎： 1. 应用系统的数据结构都是面向对象的，复杂的 2. 对象数据存储到数据库中，只能拆解开来，变为扁平的多张表，每次查询的时候还得还原回对象格式，相当麻烦 3. ES是面向文档的，文档中存储的数据结构，与面向对象的数据结构是一样的，基于这种文档数据结构，es可以提供复杂的索引，全文检索，分析聚合等功能 4. es的document用json数据格式来表达 ~~~ public class Employee { private String email; private String firstName; private String lastName; private EmployeeInfo info; private Date joinDate; } private class EmployeeInfo { private String bio; // 性格 private Integer age; private String[] interests; // 兴趣爱好 } EmployeeInfo info = new EmployeeInfo(); info.setBio("curious and modest"); info.setAge(30); info.setInterests(new String[]{"bike", "climb"}); Employee employee = new Employee(); employee.setEmail("zhangsan@sina.com"); employee.setFirstName("san"); employee.setLastName("zhang"); employee.setInfo(info); employee.setJoinDate(new Date()); employee对象：里面包含了Employee类自己的属性，还有一个EmployeeInfo对象两张表：employee表，employee_info表，将employee对象的数据重新拆开来，变成Employee数据和EmployeeInfo数据 employee表：email，first_name，last_name，join_date，4个字段 employee_info表：bio，age，interests，3个字段；此外还有一个外键字段，比如employee_id，关联着employee表 { "email": "zhangsan@sina.com", "first_name": "san", "last_name": "zhang", "info": { "bio": "curious and modest", "age": 30, "interests": [ "bike", "climb" ] }, "join_date": "2020/12/01" } 我们就明白了es的document数据格式和数据库的关系型数据格式的区别复制代码 ~~~ ## 二、简单的集群管理 ### 1、快速检查集群的健康状况 ~~~ es提供了一套api，叫做cat api，可以查看es中各种各样的数据 GET /_cat/health?v epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1488006741 15:12:21 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0% epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1488007113 15:18:33 elasticsearch green 2 2 2 1 0 0 0 0 - 100.0% epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent 1488007216 15:20:16 elasticsearch yellow 1 1 1 1 0 0 1 0 - 50.0% 复制代码 ~~~ ### 2、集群的健康状况？green、yellow、red？ 1. **green**：每个索引的primary shard和replica shard都是active状态的 2. **yellow**：每个索引的primary shard都是active状态的，但是部分replica shard不是active状态，处于不可用的状态 3. **red**：不是所有索引的primary shard都是active状态的，部分索引有数据丢失了 ### 3、快速查看集群中有哪些索引？ ~~~ GET /_cat/indices?v health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open .kibana rUm9n9wMRQCCrRDEhqneBg 1 1 1 0 3.1kb 3.1kb 复制代码 ~~~ ### 4、简单的索引操作 ~~~ 创建索引：PUT /test_index?pretty health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open test_index XmS9DTAtSkSZSwWhhGEKkQ 5 1 0 0 650b 650b yellow open .kibana rUm9n9wMRQCCrRDEhqneBg 1 1 1 0 3.1kb 3.1kb 删除索引：DELETE /test_index?pretty health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open .kibana rUm9n9wMRQCCrRDEhqneBg 1 1 1 0 3.1kb 3.1kb 复制代码 ~~~ ## 三、面向document的CRUD ### 1、新增商品：新增文档，建立索引 ~~~ PUT /index/type/id { "json数据" } PUT /ecommerce/product/1 { "name" : "gaolujie yagao", "desc" : "gaoxiao meibai", "price" : 30, "producer" : "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } PUT /ecommerce/product/2 { "name" : "jiajieshi yagao", "desc" : "youxiao fangzhu", "price" : 25, "producer" : "jiajieshi producer", "tags": [ "fangzhu" ] } PUT /ecommerce/product/3 { "name" : "zhonghua yagao", "desc" : "caoben zhiwu", "price" : 40, "producer" : "zhonghua producer", "tags": [ "qingxin" ] } es会自动建立index和type，不需要提前创建，而且es默认会对document每个field都建立倒排索引，让其可以被搜索复制代码 ~~~ ### 2、查询商品：检索文档 ~~~ GET /index/type/id GET /ecommerce/product/1 { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "found": true, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } } 复制代码 ~~~ ### 3、修改商品：替换文档 ~~~ PUT /ecommerce/product/1 { "name" : "jiaqiangban gaolujie yagao", "desc" : "gaoxiao meibai", "price" : 30, "producer" : "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 1, "result": "created", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": true } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 2, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 }, "created": false } PUT /ecommerce/product/1 { "name" : "jiaqiangban gaolujie yagao" } 替换方式有一个不好，即使必须带上所有的field，才能去进行信息的修改复制代码 ~~~ ### 4、修改商品：更新文档 ~~~ POST /ecommerce/product/1/_update { "doc": { "name": "jiaqiangban gaolujie yagao" } } { "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 8, "result": "updated", "_shards": { "total": 2, "successful": 1, "failed": 0 } } 复制代码 ~~~ ### 5、删除商品：删除文档 ~~~ DELETE /ecommerce/product/1 { "found": true, "_index": "ecommerce", "_type": "product", "_id": "1", "_version": 9, "result": "deleted", "_shards": { "total": 2, "successful": 1, "failed": 0 } } { "_index": "ecommerce", "_type": "product", "_id": "1", "found": false } 复制代码 ~~~ ## 四、多种搜索方式 ### 1、query string search ~~~ 搜索全部商品：GET /ecommerce/product/_search took：耗费了几毫秒 timed_out：是否超时，这里是没有 _shards：数据拆成了5个分片，所以对于搜索请求，会打到所有的primary shard（或者是它的某个replica shard也可以） hits.total：查询结果的数量，3个document hits.max_score：score的含义，就是document对于一个search的相关度的匹配分数，越相关，就越匹配，分数也高 hits.hits：包含了匹配搜索的document的详细数据 { "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 1, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 1, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 1, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } } ] } } query string search的由来，因为search参数都是以http请求的query string来附带的搜索商品名称中包含yagao的商品，而且按照售价降序排序：GET /ecommerce/product/_search?q=name:yagao&sort=price:desc 适用于临时的在命令行使用一些工具，比如curl，快速的发出请求，来检索想要的信息；但是如果查询请求很复杂，是很难去构建的在生产环境中，几乎很少使用query string search 复制代码 ~~~ ### 2、query DSL ~~~ DSL：Domain Specified Language，特定领域的语言 http request body：请求体，可以用json的格式来构建查询语法，比较方便，可以构建各种复杂的语法，比query string search肯定强大多了查询所有的商品 GET /ecommerce/product/_search { "query": { "match_all": {} } } 查询名称包含yagao的商品，同时按照价格降序排序 GET /ecommerce/product/_search { "query" : { "match" : { "name" : "yagao" } }, "sort": [ { "price": "desc" } ] } 分页查询商品，总共3条商品，假设每页就显示1条商品，现在显示第2页，所以就查出来第2个商品 GET /ecommerce/product/_search { "query": { "match_all": {} }, "from": 1, "size": 1 } 指定要查询出来商品的名称和价格就可以 GET /ecommerce/product/_search { "query": { "match_all": {} }, "_source": ["name", "price"] } 更加适合生产环境的使用，可以构建复杂的查询复制代码 ~~~ ### 3、query filter ~~~ 搜索商品名称包含yagao，而且售价大于25元的商品 GET /ecommerce/product/_search { "query" : { "bool" : { "must" : { "match" : { "name" : "yagao" } }, "filter" : { "range" : { "price" : { "gt" : 25 } } } } } } 复制代码 ~~~ ### 4、full-text search（全文检索） ~~~ GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "yagao producer" } } } { "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "1", "_score": 0.25811607, "_source": { "name": "gaolujie yagao", "desc": "gaoxiao meibai", "price": 30, "producer": "gaolujie producer", "tags": [ "meibai", "fangzhu" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "3", "_score": 0.25811607, "_source": { "name": "zhonghua yagao", "desc": "caoben zhiwu", "price": 40, "producer": "zhonghua producer", "tags": [ "qingxin" ] } }, { "_index": "ecommerce", "_type": "product", "_id": "2", "_score": 0.1805489, "_source": { "name": "jiajieshi yagao", "desc": "youxiao fangzhu", "price": 25, "producer": "jiajieshi producer", "tags": [ "fangzhu" ] } } ] } } 复制代码 ~~~ ### 5、phrase search（短语搜索） ~~~ 跟全文检索相对应，相反，全文检索会将输入的搜索串拆解开来，去倒排索引里面去一一匹配，只要能匹配上任意一个拆解后的单词，就可以作为结果返回 phrase search，要求输入的搜索串，必须在指定的字段文本中，完全包含一模一样的，才可以算匹配，才能作为结果返回 GET /ecommerce/product/_search { "query" : { "match_phrase" : { "producer" : "yagao producer" } } } { "took": 11, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.70293105, "hits": [ { "_index": "ecommerce", "_type": "product", "_id": "4", "_score": 0.70293105, "_source": { "name": "special yagao", "desc": "special meibai", "price": 50, "producer": "special yagao producer", "tags": [ "meibai" ] } } ] } } 复制代码 ~~~ ### 6、highlight search（高亮搜索结果） ~~~ GET /ecommerce/product/_search { "query" : { "match" : { "producer" : "producer" } }, "highlight": { "fields" : { "producer" : {} } } } 复制代码 ~~~ ## 五、聚合分析 ### 1、计算每个tag下的商品数量 ~~~ GET /ecommerce/product/_search { "size": 0, "aggs": { "all_tags": { "terms": { "field": "tags" } } } } { "took": 20, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2 }, { "key": "meibai", "doc_count": 2 }, { "key": "qingxin", "doc_count": 1 } ] } } } 复制代码 ~~~ ### 3、对名称中包含yagao的商品，计算每个tag下的商品数量 ~~~ GET /ecommerce/product/_search { "size": 0, "query": { "match": { "name": "yagao" } }, "aggs": { "all_tags": { "terms": { "field": "tags" } } } } 复制代码 ~~~ ### 4、先分组，再算每组的平均值，计算每个tag下的商品的平均价格 ~~~ GET /ecommerce/product/_search { "size": 0, "aggs" : { "group_by_tags" : { "terms" : { "field" : "tags" }, "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } } } { "took": 8, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 4, "max_score": 0, "hits": [] }, "aggregations": { "group_by_tags": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "fangzhu", "doc_count": 2, "avg_price": { "value": 27.5 } }, { "key": "meibai", "doc_count": 2, "avg_price": { "value": 40 } }, { "key": "qingxin", "doc_count": 1, "avg_price": { "value": 40 } } ] } } } 复制代码 ~~~ ### 5、计算每个tag下的商品的平均价格，并且按照平均价格降序排序 ~~~ GET /ecommerce/product/_search { "size": 0, "aggs" : { "all_tags" : { "terms" : { "field" : "tags", "order": { "avg_price": "desc" } }, "aggs" : { "avg_price" : { "avg" : { "field" : "price" } } } } } } 复制代码 ~~~ ### 6、按照指定的价格范围区间进行分组，然后在每组内再按照tag进行分组，最后再计算每组的平均价格 ~~~ GET /ecommerce/product/_search { "size": 0, "aggs": { "group_by_price": { "range": { "field": "price", "ranges": [ { "from": 0, "to": 20 }, { "from": 20, "to": 40 }, { "from": 40, "to": 50 } ] }, "aggs": { "group_by_tags": { "terms": { "field": "tags" }, "aggs": { "average_price": { "avg": { "field": "price" } } } } } } } } ~~~ 作者：Leo\_CX330 链接：https://juejin.cn/post/6908348672308838408 来源：掘金著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。