<!-- 秀川 -->
### 组合查询
在《组合过滤》中我们讨论了怎样用布尔过滤器组合多个用`and`, `or`, and `not`逻辑组成的过滤子句,在查询中, 布尔查询充当着相似的作用,但是有一个重要的区别。
过滤器会做一个判断: 是否应该将文档添加到结果集? 然而查询会做更精细的判断. 他们不仅决定一个文档是否要添加到结果集,而且还要计算文档的相关性(_relevant_).
像过滤器一样, 布尔查询接受多个用`must`, `must_not`, and `should`的查询子句. 例:
```Javascript
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": { "match": { "title": "quick" }},
"must_not": { "match": { "title": "lazy" }},
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "dog" }}
]
}
}
}
```
在前面的查询中,凡是满足`title`字段中包含`quick`,但是不包含`lazy`的文档都会在查询结果中。到目前为止,布尔查询的作用非常类似于布尔过滤的作用。
当`should`过滤器中有两个子句时不同的地方就体现出来了,下面例子就可以体现:一个文档不需要同时包含`brown`和`dog`,但如果同时有这两个词,这个文档的相关性就更高:
```Javascript
{
"hits": [
{
"_id": "3",
"_score": 0.70134366, <1>
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "1",
"_score": 0.3312608,
"_source": {
"title": "The quick brown fox"
}
}
]
}
```
<1> 文档3的得分更高,是因为它同时包含了`brown` 和 `dog`。
####得分计算
布尔查询通过把所有符合`must` 和 `should`的子句得分加起来,然后除以`must` 和 `should`子句的总数为每个文档计算相关性得分。
`must_not`子句并不影响得分;他们存在的意义是排除已经被包含的文档。
#### 精度控制
所有的 `must` 子句必须匹配, 并且所有的 `must_not` 子句必须不匹配, 但是多少 `should` 子句应该匹配呢? 默认的,不需要匹配任何 `should` 子句,一种情况例外:如果没有`must`子句,就必须至少匹配一个`should`子句。
像我们控制`match`查询的精度一样,我们也可以通过`minimum_should_match`参数控制多少`should`子句需要被匹配,这个参数可以是正整数,也可以是百分比。
```Javascript
GET /my_index/my_type/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "fox" }},
{ "match": { "title": "dog" }}
],
"minimum_should_match": 2 <1>
}
}
}
```
<1> 这也可以用百分比表示
结果集仅包含`title`字段中有`"brown"
和 "fox"`, `"brown" 和 "dog"`, 或 `"fox" 和 "dog"`的文档。如果一个文档包含上述三个条件,那么它的相关性就会比其他仅包含三者中的两个条件的文档要高。
<!--
[[bool-query]]
=== Combining Queries
In <<combining-filters>> we discussed how to((("full text search", "combining queries"))), use the `bool` filter to combine
multiple filter clauses with `and`, `or`, and `not` logic. In query land, the
`bool` query does a similar job but with one important difference.
Filters make a binary decision: should this document be included in the
results list or not? Queries, however, are more subtle. They decide not only
whether to include a document, but also how _relevant_ that document is.
Like the filter equivalent, the `bool` query accepts((("bool query"))) multiple query clauses
under the `must`, `must_not`, and `should` parameters. For instance:
[source,js]
--------------------------------------------------
GET /my_index/my_type/_search
{
"query": {
"bool": {
"must": { "match": { "title": "quick" }},
"must_not": { "match": { "title": "lazy" }},
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "dog" }}
]
}
}
}
--------------------------------------------------
// SENSE: 100_Full_Text_Search/15_Bool_query.json
The results from the preceding query include any document whose `title` field
contains the term `quick`, except for those that also contain `lazy`. So
far, this is pretty similar to how the `bool` filter works.
The difference comes in with the two `should` clauses, which say that: a document
is _not required_ to contain ((("should clause", "in bool queries")))either `brown` or `dog`, but if it does, then
it should be considered _more relevant_:
[source,js]
--------------------------------------------------
{
"hits": [
{
"_id": "3",
"_score": 0.70134366, <1>
"_source": {
"title": "The quick brown fox jumps over the quick dog"
}
},
{
"_id": "1",
"_score": 0.3312608,
"_source": {
"title": "The quick brown fox"
}
}
]
}
--------------------------------------------------
<1> Document 3 scores higher because it contains both `brown` and `dog`.
==== Score Calculation
The `bool` query calculates((("relevance scores", "calculation in bool queries")))((("bool query", "score calculation"))) the relevance `_score` for each document by adding
together the `_score` from all of the matching `must` and `should` clauses,
and then dividing by the total number of `must` and `should` clauses.
The `must_not` clauses do not affect ((("must_not clause", "in bool queries")))the score; their only purpose is to
exclude documents that might otherwise have been included.
==== Controlling Precision
All the `must` clauses must match, and all the `must_not` clauses must not
match, but how many `should` clauses((("bool query", "controlling precision")))((("full text search", "combining queries", "controlling precision")))((("precision", "controlling for bool query"))) should match? By default, none of the `should` clauses are required to match, with one
exception: if there are no `must` clauses, then at least one `should` clause
must match.
Just as we can control the <<match-precision,precision of the `match` query>>,
we can control how many `should` clauses need to match by using the
`minimum_should_match` parameter,((("minimum_should_match parameter", "in bool queries"))) either as an absolute number or as a
percentage:
[source,js]
--------------------------------------------------
GET /my_index/my_type/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "brown" }},
{ "match": { "title": "fox" }},
{ "match": { "title": "dog" }}
],
"minimum_should_match": 2 <1>
}
}
}
--------------------------------------------------
// SENSE: 100_Full_Text_Search/15_Bool_query.json
<1> This could also be expressed as a percentage.
The results would include only documents whose `title` field contains `"brown"
AND "fox"`, `"brown" AND "dog"`, or `"fox" AND "dog"`. If a document contains
all three, it would be considered more relevant than those that contain
just two of the three.
-->
- Introduction
- 入门
- 是什么
- 安装
- API
- 文档
- 索引
- 搜索
- 聚合
- 小结
- 分布式
- 结语
- 分布式集群
- 空集群
- 集群健康
- 添加索引
- 故障转移
- 横向扩展
- 更多扩展
- 应对故障
- 数据
- 文档
- 索引
- 获取
- 存在
- 更新
- 创建
- 删除
- 版本控制
- 局部更新
- Mget
- 批量
- 结语
- 分布式增删改查
- 路由
- 分片交互
- 新建、索引和删除
- 检索
- 局部更新
- 批量请求
- 批量格式
- 搜索
- 空搜索
- 多索引和多类型
- 分页
- 查询字符串
- 映射和分析
- 数据类型差异
- 确切值对决全文
- 倒排索引
- 分析
- 映射
- 复合类型
- 结构化查询
- 请求体查询
- 结构化查询
- 查询与过滤
- 重要的查询子句
- 过滤查询
- 验证查询
- 结语
- 排序
- 排序
- 字符串排序
- 相关性
- 字段数据
- 分布式搜索
- 查询阶段
- 取回阶段
- 搜索选项
- 扫描和滚屏
- 索引管理
- 创建删除
- 设置
- 配置分析器
- 自定义分析器
- 映射
- 根对象
- 元数据中的source字段
- 元数据中的all字段
- 元数据中的ID字段
- 动态映射
- 自定义动态映射
- 默认映射
- 重建索引
- 别名
- 深入分片
- 使文本可以被搜索
- 动态索引
- 近实时搜索
- 持久化变更
- 合并段
- 结构化搜索
- 查询准确值
- 组合过滤
- 查询多个准确值
- 包含,而不是相等
- 范围
- 处理 Null 值
- 缓存
- 过滤顺序
- 全文搜索
- 匹配查询
- 多词查询
- 组合查询
- 布尔匹配
- 增加子句
- 控制分析
- 关联失效
- 多字段搜索
- 多重查询字符串
- 单一查询字符串
- 最佳字段
- 最佳字段查询调优
- 多重匹配查询
- 最多字段查询
- 跨字段对象查询
- 以字段为中心查询
- 全字段查询
- 跨字段查询
- 精确查询
- 模糊匹配
- Phrase matching
- Slop
- Multi value fields
- Scoring
- Relevance
- Performance
- Shingles
- Partial_Matching
- Postcodes
- Prefix query
- Wildcard Regexp
- Match phrase prefix
- Index time
- Ngram intro
- Search as you type
- Compound words
- Relevance
- Scoring theory
- Practical scoring
- Query time boosting
- Query scoring
- Not quite not
- Ignoring TFIDF
- Function score query
- Popularity
- Boosting filtered subsets
- Random scoring
- Decay functions
- Pluggable similarities
- Conclusion
- Language intro
- Intro
- Using
- Configuring
- Language pitfalls
- One language per doc
- One language per field
- Mixed language fields
- Conclusion
- Identifying words
- Intro
- Standard analyzer
- Standard tokenizer
- ICU plugin
- ICU tokenizer
- Tidying text
- Token normalization
- Intro
- Lowercasing
- Removing diacritics
- Unicode world
- Case folding
- Character folding
- Sorting and collations
- Stemming
- Intro
- Algorithmic stemmers
- Dictionary stemmers
- Hunspell stemmer
- Choosing a stemmer
- Controlling stemming
- Stemming in situ
- Stopwords
- Intro
- Using stopwords
- Stopwords and performance
- Divide and conquer
- Phrase queries
- Common grams
- Relevance
- Synonyms
- Intro
- Using synonyms
- Synonym formats
- Expand contract
- Analysis chain
- Multi word synonyms
- Symbol synonyms
- Fuzzy matching
- Intro
- Fuzziness
- Fuzzy query
- Fuzzy match query
- Scoring fuzziness
- Phonetic matching
- Aggregations
- overview
- circuit breaker fd settings
- filtering
- facets
- docvalues
- eager
- breadth vs depth
- Conclusion
- concepts buckets
- basic example
- add metric
- nested bucket
- extra metrics
- bucket metric list
- histogram
- date histogram
- scope
- filtering
- sorting ordering
- approx intro
- cardinality
- percentiles
- sigterms intro
- sigterms
- fielddata
- analyzed vs not
- 地理坐标点
- 地理坐标点
- 通过地理坐标点过滤
- 地理坐标盒模型过滤器
- 地理距离过滤器
- 缓存地理位置过滤器
- 减少内存占用
- 按距离排序
- Geohashe
- Geohashe
- Geohashe映射
- Geohash单元过滤器
- 地理位置聚合
- 地理位置聚合
- 按距离聚合
- Geohash单元聚合器
- 范围(边界)聚合器
- 地理形状
- 地理形状
- 映射地理形状
- 索引地理形状
- 查询地理形状
- 在查询中使用已索引的形状
- 地理形状的过滤与缓存
- 关系
- 关系
- 应用级别的Join操作
- 扁平化你的数据
- Top hits
- Concurrency
- Concurrency solutions
- 嵌套
- 嵌套对象
- 嵌套映射
- 嵌套查询
- 嵌套排序
- 嵌套集合
- Parent Child
- Parent child
- Indexing parent child
- Has child
- Has parent
- Children agg
- Grandparents
- Practical considerations
- Scaling
- Shard
- Overallocation
- Kagillion shards
- Capacity planning
- Replica shards
- Multiple indices
- Index per timeframe
- Index templates
- Retiring data
- Index per user
- Shared index
- Faking it
- One big user
- Scale is not infinite
- Cluster Admin
- Marvel
- Health
- Node stats
- Other stats
- Deployment
- hardware
- other
- config
- dont touch
- heap
- file descriptors
- conclusion
- cluster settings
- Post Deployment
- dynamic settings
- logging
- indexing perf
- rolling restart
- backup
- restore
- conclusion