admin管理员组

文章数量:1122833

I have the following code in a search query, and when I adjust the boost value of the 'wildcard' section, it doesn't change any result scores. Our search objective is that if search phrase is in title, it is most important. Otherwise search body as well. If still not turning up anything, try fuzzy matches. Prioritize recency with a time decay function. Information from last 30 days is far more valuable than information from 9 months ago.

Any thoughts?

{
  "from": 0,
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "wildcard": {
                "title": {
                  "value": "*blackwell*",
                  "boost": 1000
                }
              }
            },
            {
              "multi_match": {
                "fields": [
                  "title",
                  "body"
                ],
                "query": "blackwell",
                "type": "phrase",
                "boost": 100
              }
            },
            {
              "multi_match": {
                "fields": [
                  "title",
                  "body"
                ],
                "query": "blackwell",
                "operator": "or",
                "boost": 10
              }
            },
            {
              "multi_match": {
                "fields": [
                  "title",
                  "body"
                ],
                "query": "blackwell",
                "fuzziness": "AUTO",
                "operator": "and",
                "boost": 1
              }
            }
          ],
          "minimum_should_match": 1
        }
      },
      "functions": [
        {
          "exp": {
            "modifiedAt": {
              "offset": "30d",
              "scale": "360d",
              "decay": 0.01
            }
          }
        }
      ],
      "boost_mode": "multiply"
    }
  },

Here is output from _explain:

{
  "_index": "534e9cac-96c9-47af-9321-67860f3b33ba_record_chunks",
  "_id": "836bba2c-8ba6-491e-a1e2-da7e91de78f8_0011",
  "matched": true,
  "explanation": {
    "value": 1254.3658,
    "description": "function score, product of:",
    "details": [
      {
        "value": 1254.3658,
        "description": "sum of:",
        "details": [
          {
            "value": 1254.3658,
            "description": "weight(body:blackwel in 257) [PerFieldSimilarity], result of:",
            "details": [
              {
                "value": 1254.3658,
                "description": "score(freq=2.0), computed as boost * idf * tf from:",
                "details": [
                  {
                    "value": 244.20001,
                    "description": "boost",
                    "details": []
                  },
                  {
                    "value": 7.447975,
                    "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details": [
                      {
                        "value": 544,
                        "description": "n, number of documents containing term",
                        "details": []
                      },
                      {
                        "value": 934570,
                        "description": "N, total number of documents with field",
                        "details": []
                      }
                    ]
                  },
                  {
                    "value": 0.68966836,
                    "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                    "details": [
                      {
                        "value": 2,
                        "description": "freq, occurrences of term within document",
                        "details": []
                      },
                      {
                        "value": 1.2,
                        "description": "k1, term saturation parameter",
                        "details": []
                      },
                      {
                        "value": 0.75,
                        "description": "b, length normalization parameter",
                        "details": []
                      },
                      {
                        "value": 56,
                        "description": "dl, length of field (approximate)",
                        "details": []
                      },
                      {
                        "value": 84.00777,
                        "description": "avgdl, average length of field",
                        "details": []
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value": 1,
        "description": "min of:",
        "details": [
          {
            "value": 1,
            "description": "Function for field modifiedAt:",
            "details": [
              {
                "value": 1,
                "description": "exp(- MIN[Math.max(Math.abs(1.731852097E12(=doc value) - 1.732904021506E12(=origin))) - 2.592E9(=offset), 0)] * 1.4805716904539902E-10)",
                "details": []
              }
            ]
          },
          {
            "value": 3.4028235e38,
            "description": "maxBoost",
            "details": []
          }
        ]
      }
    ]
  }
}

I have the following code in a search query, and when I adjust the boost value of the 'wildcard' section, it doesn't change any result scores. Our search objective is that if search phrase is in title, it is most important. Otherwise search body as well. If still not turning up anything, try fuzzy matches. Prioritize recency with a time decay function. Information from last 30 days is far more valuable than information from 9 months ago.

Any thoughts?

{
  "from": 0,
  "size": 10,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "wildcard": {
                "title": {
                  "value": "*blackwell*",
                  "boost": 1000
                }
              }
            },
            {
              "multi_match": {
                "fields": [
                  "title",
                  "body"
                ],
                "query": "blackwell",
                "type": "phrase",
                "boost": 100
              }
            },
            {
              "multi_match": {
                "fields": [
                  "title",
                  "body"
                ],
                "query": "blackwell",
                "operator": "or",
                "boost": 10
              }
            },
            {
              "multi_match": {
                "fields": [
                  "title",
                  "body"
                ],
                "query": "blackwell",
                "fuzziness": "AUTO",
                "operator": "and",
                "boost": 1
              }
            }
          ],
          "minimum_should_match": 1
        }
      },
      "functions": [
        {
          "exp": {
            "modifiedAt": {
              "offset": "30d",
              "scale": "360d",
              "decay": 0.01
            }
          }
        }
      ],
      "boost_mode": "multiply"
    }
  },

Here is output from _explain:

{
  "_index": "534e9cac-96c9-47af-9321-67860f3b33ba_record_chunks",
  "_id": "836bba2c-8ba6-491e-a1e2-da7e91de78f8_0011",
  "matched": true,
  "explanation": {
    "value": 1254.3658,
    "description": "function score, product of:",
    "details": [
      {
        "value": 1254.3658,
        "description": "sum of:",
        "details": [
          {
            "value": 1254.3658,
            "description": "weight(body:blackwel in 257) [PerFieldSimilarity], result of:",
            "details": [
              {
                "value": 1254.3658,
                "description": "score(freq=2.0), computed as boost * idf * tf from:",
                "details": [
                  {
                    "value": 244.20001,
                    "description": "boost",
                    "details": []
                  },
                  {
                    "value": 7.447975,
                    "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                    "details": [
                      {
                        "value": 544,
                        "description": "n, number of documents containing term",
                        "details": []
                      },
                      {
                        "value": 934570,
                        "description": "N, total number of documents with field",
                        "details": []
                      }
                    ]
                  },
                  {
                    "value": 0.68966836,
                    "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                    "details": [
                      {
                        "value": 2,
                        "description": "freq, occurrences of term within document",
                        "details": []
                      },
                      {
                        "value": 1.2,
                        "description": "k1, term saturation parameter",
                        "details": []
                      },
                      {
                        "value": 0.75,
                        "description": "b, length normalization parameter",
                        "details": []
                      },
                      {
                        "value": 56,
                        "description": "dl, length of field (approximate)",
                        "details": []
                      },
                      {
                        "value": 84.00777,
                        "description": "avgdl, average length of field",
                        "details": []
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "value": 1,
        "description": "min of:",
        "details": [
          {
            "value": 1,
            "description": "Function for field modifiedAt:",
            "details": [
              {
                "value": 1,
                "description": "exp(- MIN[Math.max(Math.abs(1.731852097E12(=doc value) - 1.732904021506E12(=origin))) - 2.592E9(=offset), 0)] * 1.4805716904539902E-10)",
                "details": []
              }
            ]
          },
          {
            "value": 3.4028235e38,
            "description": "maxBoost",
            "details": []
          }
        ]
      }
    ]
  }
}
Share Improve this question edited Nov 30, 2024 at 10:01 Paulo 10.3k5 gold badges22 silver badges37 bronze badges asked Nov 23, 2024 at 1:47 JackBurtonJackBurton 2431 gold badge4 silver badges12 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

Tldr;

It should work, see my demo.

Maybe your wildcard query is never a match, thus not participating int the scoring. But I am shooting in the dark without mapping / sample documents.

Could you perhaps give a minimum reproducible exemple ?

To investigate

Have you thought about using the explain api ?

GET 79216980/_explain/<doc _id>
{
  "query": {
    "wildcard": {
      "data.keyword": {
        "value": "*pot*",
        "boost": 2
      }
    }
  }
}

Which can give you an explanation as to why a document did match.

{
  "_index": "79216980",
  "_id": "S1jPY5MB7I4Yfde2NMmi",
  "matched": true,
  "explanation": {
    "value": 2,
    "description": "data.keyword:*pot*^2.0",
    "details": []
  }
}

analysis of the explain (edit)

From the explain I can tell you that:

  • title field never matches, only the body field does.
  • wildcard query did not match either.

The score is solely computed by the matches one the body. So this would explain why the score does not change with regards to the boost.

Could it be the title is of type keyword, meaning it is case sensitive.

I would suggest you test you query one clause at a time ? see which one work ?

You could also set on the wildcard clause the setting case_insensitive: true

{
    "wildcard": {
      "title": {
        "value": "*blackwell*",
        "boost": 1000,
        "case_insensitive": true
      }
    }
  }

Demo

POST _bulk
{"index":{"_index":"79216980"}}
{"data": "I love potatoes"}
{"index":{"_index":"79216980"}}
{"data": "I love potage"}

GET 79216980/_search
{
  "query": {
    "wildcard": {
      "data.keyword": {
        "value": "*pot*",
        "boost": 2
      }
    }
  }
}

Gives:

{
...
  "hits": {
    ...
    "max_score": 2,
    "hits": [
      {
        "_index": "79216980",
        "_id": "S1jPY5MB7I4Yfde2NMmi",
        "_score": 2,
        "_source": {
          "data": "I love potatoes"
        }
      },
      {
        "_index": "79216980",
        "_id": "TFjPY5MB7I4Yfde2NMmi",
        "_score": 2,
        "_source": {
          "data": "I love potage"
        }
      }
    ]
  }
}

本文标签: ElasticSearch boost on wildcard not workingStack Overflow