Popularity · Elasticsearch权威指南（中文版）

[[boosting-by-popularity]] === Boosting by Popularity Imagine that we have a website that hosts blog posts and enables users to vote for the blog posts that they like.((("relevance", "controlling", "boosting by popularity")))((("popularity", "boosting by")))((("boosting", "by popularity"))) We would like more-popular posts to appear higher in the results list, but still have the full-text score as the main relevance driver. We can do this easily by storing the number of votes with each blog post: [role="pagebreak-before"] [source,json] ------------------------------- PUT /blogposts/post/1 { "title": "About popularity", "content": "In this post we will talk about...", "votes": 6 } ------------------------------- At search time, we can use the `function_score` query ((("function_score query", "field_value_factor function")))((("field_value_factor function")))with the `field_value_factor` function to combine the number of votes with the full-text relevance score: [source,json] ------------------------------- GET /blogposts/post/_search { "query": { "function_score": { <1> "query": { <2> "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { <3> "field": "votes" <4> } } } } ------------------------------- <1> The `function_score` query wraps the main query and the function we would like to apply. <2> The main query is executed first. <3> The `field_value_factor` function is applied to every document matching the main `query`. <4> Every document _must_ have a number in the `votes` field for the `function_score` to work. In the preceding example, the final `_score` for each document has been altered as follows: new_score = old_score * number_of_votes This will not give us great results. The full-text `_score` range usually falls somewhere between 0 and 10. As can be seen in <<img-popularity-linear>>, a blog post with 10 votes will completely swamp the effect of the full-text score, and a blog post with 0 votes will reset the score to zero. [[img-popularity-linear]] .Linear popularity based on an original `_score` of `2.0` image::images/elas_1701.png[Linear popularity based on an original `_score` of `2.0`] ==== modifier A better way to incorporate popularity is to smooth out the `votes` value with some `modifier`. ((("modifier parameter")))((("field_value_factor function", "modifier parameter")))In other words, we want the first few votes to count a lot, but for each subsequent vote to count less. The difference between 0 votes and 1 vote should be much bigger than the difference between 10 votes and 11 votes. A typical `modifier` for this use case is `log1p`, which changes the formula to the following: new_score = old_score * log(1 + number_of_votes) The `log` function smooths out the effect of the `votes` field to provide a curve like the one in <<img-popularity-log>>. [[img-popularity-log]] .Logarithmic popularity based on an original `_score` of `2.0` image::images/elas_1702.png[Logarithmic popularity based on an original `_score` of `2.0`] The request with the `modifier` parameter looks like the following: [source,json] ------------------------------- GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p" <1> } } } } ------------------------------- <1> Set the `modifier` to `log1p`. [role="pagebreak-before"] The available modifiers are `none` (the default), `log`, `log1p`, `log2p`, `ln`, `ln1p`, `ln2p`, `square`, `sqrt`, and `reciprocal`. You can read more about them in the http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_field_value_factor[`field_value_factor` documentation]. ==== factor The strength of the popularity effect can be increased or decreased by multiplying the value((("factor (function_score)")))((("field_value_factor function", "factor parameter"))) in the `votes` field by some number, called the `factor`: [source,json] ------------------------------- GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 2 <1> } } } } ------------------------------- <1> Doubles the popularity effect Adding in a `factor` changes the formula to this: new_score = old_score * log(1 + factor * number_of_votes) A `factor` greater than `1` increases the effect, and a `factor` less than `1` decreases the effect, as shown in <<img-popularity-factor>>. [[img-popularity-factor]] .Logarithmic popularity with different factors image::images/elas_1703.png[Logarithmic popularity with different factors] ==== boost_mode Perhaps multiplying the full-text score by the result of the `field_value_factor` function ((("function_score query", "boost_mode parameter")))((("boost_mode parameter")))still has too large an effect. We can control how the result of a function is combined with the `_score` from the query by using the `boost_mode` parameter, which accepts the following values: `multiply`:: Multiply the `_score` with the function result (default) `sum`:: Add the function result to the `_score` `min`:: The lower of the `_score` and the function result `max`:: The higher of the `_score` and the function result `replace`:: Replace the `_score` with the function result If, instead of multiplying, we add the function result to the `_score`, we can achieve a much smaller effect, especially if we use a low `factor`: [source,json] ------------------------------- GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum" <1> } } } ------------------------------- <1> Add the function result to the `_score`. The formula for the preceding request now looks like this (see <<img-popularity-sum>>): new_score = old_score + log(1 + 0.1 * number_of_votes) [[img-popularity-sum]] .Combining popularity with `sum` image::images/elas_1704.png["Combining popularity with `sum`"] ==== max_boost Finally, we can cap the maximum effect((("function_score query", "max_boost parameter")))((("max_boost parameter"))) that the function can have by using the `max_boost` parameter: [source,json] ------------------------------- GET /blogposts/post/_search { "query": { "function_score": { "query": { "multi_match": { "query": "popularity", "fields": [ "title", "content" ] } }, "field_value_factor": { "field": "votes", "modifier": "log1p", "factor": 0.1 }, "boost_mode": "sum", "max_boost": 1.5 <1> } } } ------------------------------- <1> Whatever the result of the `field_value_factor` function, it will never be greater than `1.5`. NOTE: The `max_boost` applies a limit to the result of the function only, not to the final `_score`.