高亮 · TUNA-daily

1. 一个最基本的高亮例子 ~~~ PUT /blog_website { "mappings": { "blogs": { "properties": { "title": { "type": "text", "analyzer": "ik_max_word" }, "content": { "type": "text", "analyzer": "ik_max_word" } } } } } ~~~ ~~~ PUT /blog_website/blogs/1 { "title": "我的第一篇博客", "content": "大家好，这是我写的第一篇博客，特别喜欢这个博客网站！！！" } GET /blog_website/blogs/_search { "query": { "match": { "title": "博客" } }, "highlight": { "fields": { "title": {} } } } ~~~ ~~~ { "took": 103, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.28582606, "hits": [ { "_index": "blog_website", "_type": "blogs", "_id": "1", "_score": 0.28582606, "_source": { "title": "我的第一篇博客", "content": "大家好，这是我写的第一篇博客，特别喜欢这个博客网站！！！" }, "highlight": { "title": [ "我的第一篇博客" ] } } ] } } ~~~ 表现，会变成红色，所以说你的指定的field中，如果包含了那个搜索词的话，就会在那个field的文本中，对搜索词进行红色的高亮显示 ~~~ GET /blog_website/blogs/_search { "query": { "bool": { "should": [ { "match": { "title": "博客" } }, { "match": { "content": "博客" } } ] } }, "highlight": { "fields": { "title": {}, "content": {} } } } ~~~ highlight中的field，必须跟query中的field一一对齐的 2. 三种highlight介绍 plain highlight，lucene highlight，默认 posting highlight，index_options=offsets （1）性能比plain highlight要高，因为不需要重新对高亮文本进行分词（2）对磁盘的消耗更少（3）将文本切割为句子，并且对句子进行高亮，效果更好 ~~~ PUT /blog_website { "mappings": { "blogs": { "properties": { "title": { "type": "text", "analyzer": "ik_max_word" }, "content": { "type": "text", "analyzer": "ik_max_word", "index_options": "offsets" } } } } } ~~~ PUT /blog_website/blogs/1 { "title": "我的第一篇博客", "content": "大家好，这是我写的第一篇博客，特别喜欢这个博客网站！！！" } ~~~ GET /blog_website/blogs/_search { "query": { "match": { "content": "博客" } }, "highlight": { "fields": { "content": {} } } } ~~~ fast vector highlight index-time term vector设置在mapping中，就会用fast verctor highlight （1）对大field而言（大于1mb），性能更高 ~~~ PUT /blog_website { "mappings": { "blogs": { "properties": { "title": { "type": "text", "analyzer": "ik_max_word" }, "content": { "type": "text", "analyzer": "ik_max_word", "term_vector" : "with_positions_offsets" } } } } } ~~~ 强制使用某种highlighter，比如对于开启了term vector的field而言，可以强制使用plain highlight ~~~ GET /blog_website/blogs/_search { "query": { "match": { "content": "博客" } }, "highlight": { "fields": { "content": { "type": "plain" } } } } ~~~ 总结一下，其实可以根据你的实际情况去考虑，一般情况下，用plain highlight也就足够了，不需要做其他额外的设置如果对高亮的性能要求很高，可以尝试启用posting highlight 如果field的值特别大，超过了1M，那么可以用fast vector highlight 3. 设置高亮html标签，默认是标签 ~~~ GET /blog_website/blogs/_search { "query": { "match": { "content": "博客" } }, "highlight": { "pre_tags": ["<tag1>"], "post_tags": ["</tag2>"], "fields": { "content": { "type": "plain" } } } } ~~~ 4. 高亮片段fragment的设置 ~~~ GET /_search { "query" : { "match": { "user": "kimchy" } }, "highlight" : { "fields" : { "content" : {"fragment_size" : 150, "number_of_fragments" : 3, "no_match_size": 150 } } } } ~~~ > fragment_size: > 你一个Field的值，比如有长度是1万，但是你不可能在页面上显示这么长啊。。。设置要显示出来的fragment文本判断的长度，默认是100（highlight部分显示最大长度） > number_of_fragments： > 你可能你的高亮的fragment文本片段有多个片段，你可以指定就显示几个片段(高亮部分显示几个) "no_match_size": 150: