### 单一查询字符串(Single Query String)
bool查询是多字段查询的中流砥柱。在很多场合下它都能很好地工作,特别是当你能够将不同的查询字符串映射到不同的字段时。
问题在于,现在的用户期望能够在一个地方输入所有的搜索词条,然后应用能够知道如何为他们得到正确的结果。所以当我们把含有多个字段的搜索表单称为高级搜索(Advanced Search)时,是有一些讽刺意味的。高级搜索虽然对用户而言会显得更"高级",但是实际上它的实现方式更简单。
对于多词,多字段查询并没有一种万能(one-size-fits-all)的方法。要得到最佳的结果,你需要了解你的数据以及如何使用恰当的工具。
#### 了解你的数据
当用户的唯一输入就是一个查询字符串时,你会经常碰到以下三种情况:
##### 1.最佳字段(Best fields)::
当搜索代表某些概念的单词时,例如"brown fox",几个单词合在一起表达出来的意思比单独的单词更多。类似title和body的字段,尽管它们是相关联的,但是也是互相竞争着的。文档在相同的字段中应该有尽可能多的单词(译注:搜索的目标单词),文档的分数应该来自拥有最佳匹配的字段。
##### 2.多数字段(Most fields)::
一个用来调优相关度的常用技术是将相同的数据索引到多个字段中,每个字段拥有自己的分析链(Analysis Chain)。
主要字段会含有单词的词干部分,同义词和消除了变音符号的单词。它用来尽可能多地匹配文档。
相同的文本可以被索引到其它的字段中来提供更加精确的匹配。一个字段或许会包含未被提取成词干的单词,另一个字段是包含了变音符号的单词,第三个字段则使用shingle来提供关于[单词邻近度(Word Proximity)](http://blog.csdn.net/dm_vincent/article/details/41800351)的信息。
以上这些额外的字段扮演者signal的角色,用来增加每个匹配的文档的相关度分值。越多的字段被匹配则意味着文档的相关度越高。
##### 3.跨字段(Cross fields)::
对于一些实体,标识信息会在多个字段中出现,每个字段中只含有一部分信息:
* Person: `first_name` 和 `last_name`
* Book: `title`, `author`, 和 `description`
* Address: `street`, `city`, `country`, 和 `postcode`
此时,我们希望在任意字段中找到尽可能多的单词。我们需要在多个字段中进行查询,就好像这些字段是一个字段那样。
以上这些都是多词,多字段查询,但是每种都需要使用不同的策略。我们会在本章剩下的部分解释每种策略。
<!-- === Single Query String
The `bool` query is the mainstay of multiclause queries.((("multifield search", "single query string"))) It works well
for many cases, especially when you are able to map different query strings to
individual fields.
The problem is that, these days, users expect to be able to type all of their
search terms into a single field, and expect that the application will figure out how
to give them the right results. It is ironic that the multifield search form
is known as _Advanced Search_—it may appear advanced to the user, but it is
much simpler to implement.
There is no simple _one-size-fits-all_ approach to multiword, multifield
queries. To get the best results, you have to _know your data_ and know how
to use the appropriate tools.
[[know-your-data]]
==== Know Your Data
When your only user input is a single query string, you will encounter three scenarios frequently:
Best fields::
When searching for words that represent a concept, such as ``brown fox,'' the
words mean more together than they do individually. Fields like the `title`
and `body`, while related, can be considered to be in competition with each
other. Documents should have as many words as possible in _the same field_,
and the score should come from the _best-matching field_.
Most fields::
+
--
A common technique for fine-tuning relevance is to index the same data into
multiple fields, each with its own analysis chain.
The main field may contain words in their stemmed form, synonyms, and words
stripped of their _diacritics_, or accents. It is used to match as many
documents as possible.
The same text could then be indexed in other fields to provide more-precise
matching. One field may contain the unstemmed version, another the original
word with accents, and a third might use _shingles_ to provide information
about <<proximity-matching,word proximity>>.
These other fields act as _signals_ to increase the relevance score of each
matching document. The _more fields that match_, the better.
--
Cross fields::
+
--
For some entities, the identifying information is spread across multiple
fields, each of which contains just a part of the whole:
* Person: `first_name` and `last_name`
* Book: `title`, `author`, and `description`
* Address: `street`, `city`, `country`, and `postcode`
In this case, we want to find as many words as possible in _any_ of the listed
fields. We need to search across multiple fields as if they were one big
field.
--
All of these are multiword, multifield queries, but each requires a
different strategy. We will examine each strategy in turn in the rest of this
chapter.
-->
- Introduction
- 入门
- 是什么
- 安装
- API
- 文档
- 索引
- 搜索
- 聚合
- 小结
- 分布式
- 结语
- 分布式集群
- 空集群
- 集群健康
- 添加索引
- 故障转移
- 横向扩展
- 更多扩展
- 应对故障
- 数据
- 文档
- 索引
- 获取
- 存在
- 更新
- 创建
- 删除
- 版本控制
- 局部更新
- Mget
- 批量
- 结语
- 分布式增删改查
- 路由
- 分片交互
- 新建、索引和删除
- 检索
- 局部更新
- 批量请求
- 批量格式
- 搜索
- 空搜索
- 多索引和多类型
- 分页
- 查询字符串
- 映射和分析
- 数据类型差异
- 确切值对决全文
- 倒排索引
- 分析
- 映射
- 复合类型
- 结构化查询
- 请求体查询
- 结构化查询
- 查询与过滤
- 重要的查询子句
- 过滤查询
- 验证查询
- 结语
- 排序
- 排序
- 字符串排序
- 相关性
- 字段数据
- 分布式搜索
- 查询阶段
- 取回阶段
- 搜索选项
- 扫描和滚屏
- 索引管理
- 创建删除
- 设置
- 配置分析器
- 自定义分析器
- 映射
- 根对象
- 元数据中的source字段
- 元数据中的all字段
- 元数据中的ID字段
- 动态映射
- 自定义动态映射
- 默认映射
- 重建索引
- 别名
- 深入分片
- 使文本可以被搜索
- 动态索引
- 近实时搜索
- 持久化变更
- 合并段
- 结构化搜索
- 查询准确值
- 组合过滤
- 查询多个准确值
- 包含,而不是相等
- 范围
- 处理 Null 值
- 缓存
- 过滤顺序
- 全文搜索
- 匹配查询
- 多词查询
- 组合查询
- 布尔匹配
- 增加子句
- 控制分析
- 关联失效
- 多字段搜索
- 多重查询字符串
- 单一查询字符串
- 最佳字段
- 最佳字段查询调优
- 多重匹配查询
- 最多字段查询
- 跨字段对象查询
- 以字段为中心查询
- 全字段查询
- 跨字段查询
- 精确查询
- 模糊匹配
- Phrase matching
- Slop
- Multi value fields
- Scoring
- Relevance
- Performance
- Shingles
- Partial_Matching
- Postcodes
- Prefix query
- Wildcard Regexp
- Match phrase prefix
- Index time
- Ngram intro
- Search as you type
- Compound words
- Relevance
- Scoring theory
- Practical scoring
- Query time boosting
- Query scoring
- Not quite not
- Ignoring TFIDF
- Function score query
- Popularity
- Boosting filtered subsets
- Random scoring
- Decay functions
- Pluggable similarities
- Conclusion
- Language intro
- Intro
- Using
- Configuring
- Language pitfalls
- One language per doc
- One language per field
- Mixed language fields
- Conclusion
- Identifying words
- Intro
- Standard analyzer
- Standard tokenizer
- ICU plugin
- ICU tokenizer
- Tidying text
- Token normalization
- Intro
- Lowercasing
- Removing diacritics
- Unicode world
- Case folding
- Character folding
- Sorting and collations
- Stemming
- Intro
- Algorithmic stemmers
- Dictionary stemmers
- Hunspell stemmer
- Choosing a stemmer
- Controlling stemming
- Stemming in situ
- Stopwords
- Intro
- Using stopwords
- Stopwords and performance
- Divide and conquer
- Phrase queries
- Common grams
- Relevance
- Synonyms
- Intro
- Using synonyms
- Synonym formats
- Expand contract
- Analysis chain
- Multi word synonyms
- Symbol synonyms
- Fuzzy matching
- Intro
- Fuzziness
- Fuzzy query
- Fuzzy match query
- Scoring fuzziness
- Phonetic matching
- Aggregations
- overview
- circuit breaker fd settings
- filtering
- facets
- docvalues
- eager
- breadth vs depth
- Conclusion
- concepts buckets
- basic example
- add metric
- nested bucket
- extra metrics
- bucket metric list
- histogram
- date histogram
- scope
- filtering
- sorting ordering
- approx intro
- cardinality
- percentiles
- sigterms intro
- sigterms
- fielddata
- analyzed vs not
- 地理坐标点
- 地理坐标点
- 通过地理坐标点过滤
- 地理坐标盒模型过滤器
- 地理距离过滤器
- 缓存地理位置过滤器
- 减少内存占用
- 按距离排序
- Geohashe
- Geohashe
- Geohashe映射
- Geohash单元过滤器
- 地理位置聚合
- 地理位置聚合
- 按距离聚合
- Geohash单元聚合器
- 范围(边界)聚合器
- 地理形状
- 地理形状
- 映射地理形状
- 索引地理形状
- 查询地理形状
- 在查询中使用已索引的形状
- 地理形状的过滤与缓存
- 关系
- 关系
- 应用级别的Join操作
- 扁平化你的数据
- Top hits
- Concurrency
- Concurrency solutions
- 嵌套
- 嵌套对象
- 嵌套映射
- 嵌套查询
- 嵌套排序
- 嵌套集合
- Parent Child
- Parent child
- Indexing parent child
- Has child
- Has parent
- Children agg
- Grandparents
- Practical considerations
- Scaling
- Shard
- Overallocation
- Kagillion shards
- Capacity planning
- Replica shards
- Multiple indices
- Index per timeframe
- Index templates
- Retiring data
- Index per user
- Shared index
- Faking it
- One big user
- Scale is not infinite
- Cluster Admin
- Marvel
- Health
- Node stats
- Other stats
- Deployment
- hardware
- other
- config
- dont touch
- heap
- file descriptors
- conclusion
- cluster settings
- Post Deployment
- dynamic settings
- logging
- indexing perf
- rolling restart
- backup
- restore
- conclusion