Not Quite Not | Elasticsearch: The Definitive Guide [2.x]

原文地址: https://www.elastic.co/guide/en/elasticsearch/guide/current/not-quite-not.html, 版权归 www.elastic.co 所有

WARNING: The 2.x versions of Elasticsearch have passed their EOL dates. If you are running a 2.x version, we strongly advise you to upgrade.

This documentation is no longer maintained and may be removed. For the latest information, see the current Elasticsearch documentation.

» » »

« Manipulating Relevance with Query Structure Ignoring TF/IDF »

Not Quite Notedit

A search on the Internet for “Apple” is likely to return results about the company, the fruit, and various recipes. We could try to narrow it down to just the company by excluding words like pie, tart, crumble, and tree, using a must_not clause in a bool query:

GET /_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "text": "apple"
        }
      },
      "must_not": {
        "match": {
          "text": "pie tart fruit crumble tree"
        }
      }
    }
  }
}

But who is to say that we wouldn’t miss a very relevant document about Apple the company by excluding tree or crumble? Sometimes, must_not can be too strict.

boosting Queryedit

The boosting query solves this problem. It allows us to still include results that appear to be about the fruit or the pastries, but to downgrade them—to rank them lower than they would otherwise be:

GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "text": "apple"
        }
      },
      "negative": {
        "match": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}

It accepts a positive query and a negative query. Only documents that match the positive query will be included in the results list, but documents that also match the negative query will be downgraded by multiplying the original _score of the document with the negative_boost.

For this to work, the negative_boost must be less than 1.0. In this example, any documents that contain any of the negative terms will have their _score cut in half.

« Manipulating Relevance with Query Structure Ignoring TF/IDF »