admin管理员组

文章数量:1424377

I am performing a fuzzy elasticsearch query on 'text' and 'keywords' fields. I have two documents in elasticsearch, one with 'text' "testPhone 5" and the other "testPhone 4s". When I perform a fuzzy query with "testPhone 5", I am seeing that both documents are being given the exact same score value. Why is this occurring?

Extra info: I am indexing documents using the 'uax_url_email' tokenizer and 'lowercase' filter.

This is the query I am making:

{
    query : {
        bool: {
            // match one or the other fuzzy query
            should: [
                {
                    fuzzy: {
                        text: {
                            min_similarity: 0.4,
                            value: 'testphone 5',
                            prefix_length: 0,
                            boost: 5,
                        }
                    }
                },
                {
                    fuzzy: {
                        keywords: {
                            min_similarity: 0.4,
                            value: 'testphone 5',
                            prefix_length: 0,
                            boost: 1,
                        }
                    }
                }
            ]
        }
    },
    sort: [ 
        '_score'
    ],
    explain: true
}

This is the result:

{ max_score: 0.47213298,
  total: 2,
  hits:
  [ { _index: 'test',
     _shard: 0,
     _id: '51fbf95f82e89ae8c300002c',
     _node: '0Mtfzbe1RDinU71Ordx-Ag',
     _source:
    { next: { id: '51fbf95f82e89ae8c3000027' },
      cards: [ '51fbf95f82e89ae8c3000027', [length]: 1 ],
      other: false,
      _id: '51fbf95f82e89ae8c300002c',
      category: '51fbf95f82e89ae8c300002b',
      image: '.png',
      text: 'testPhone 5',
      keywords: [ [length]: 0 ],
      __v: 0 },
   _type: 'productgroup',
   _explanation:
    { details:
       [ { details:
            [ { details:
                 [ { details:
                      [ { details:
                           [ { value: 3.8888888, description: 'boost' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.17020021,
                               description: 'queryNorm' },
                             [length]: 3 ],
                          value: 0.99999994,
                          description: 'queryWeight, product of:' },
                        { details:
                           [ { details:
                                [ { value: 1, description: 'termFreq=1.0' },
                                  [length]: 1 ],
                               value: 1,
                               description: 'tf(freq=1.0), with freq of:' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.625,
                               description: 'fieldNorm(doc=0)' },
                             [length]: 3 ],
                          value: 0.944266,
                          description: 'fieldWeight in 0, product of:' },
                        [length]: 2 ],
                     value: 0.94426596,
                     description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
                   [length]: 1 ],
                value: 0.94426596,
                description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
              [length]: 1 ],
           value: 0.94426596,
           description: 'sum of:' },
         { value: 0.5, description: 'coord(1/2)' },
         [length]: 2 ],
      value: 0.47213298,
      description: 'product of:' },
   _score: 0.47213298 },
 { _index: 'test',
   _shard: 4,
   _id: '51fbf95f82e89ae8c300002d',
   _node: '0Mtfzbe1RDinU71Ordx-Ag',
   _source:
    { next: { id: '51fbf95f82e89ae8c3000027' },
      cards: [ '51fbf95f82e89ae8c3000029', [length]: 1 ],
      other: false,
      _id: '51fbf95f82e89ae8c300002d',
      category: '51fbf95f82e89ae8c300002b',
      image: '.png',
      text: 'testPhone 4s',
      keywords: [ 'apple', [length]: 1 ],
      __v: 0 },
   _type: 'productgroup',
   _explanation:
    { details:
       [ { details:
            [ { details:
                 [ { details:
                      [ { details:
                           [ { value: 3.8888888, description: 'boost' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.17020021,
                               description: 'queryNorm' },
                             [length]: 3 ],
                          value: 0.99999994,
                          description: 'queryWeight, product of:' },
                        { details:
                           [ { details:
                                [ { value: 1, description: 'termFreq=1.0' },
                                  [length]: 1 ],
                               value: 1,
                               description: 'tf(freq=1.0), with freq of:' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.625,
                               description: 'fieldNorm(doc=0)' },
                             [length]: 3 ],
                          value: 0.944266,
                          description: 'fieldWeight in 0, product of:' },
                        [length]: 2 ],
                     value: 0.94426596,
                     description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
                   [length]: 1 ],
                value: 0.94426596,
                description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
              [length]: 1 ],
           value: 0.94426596,
           description: 'sum of:' },
         { value: 0.5, description: 'coord(1/2)' },
         [length]: 2 ],
      value: 0.47213298,
      description: 'product of:' },
   _score: 0.47213298 },
 [length]: 2 ] }

I am performing a fuzzy elasticsearch query on 'text' and 'keywords' fields. I have two documents in elasticsearch, one with 'text' "testPhone 5" and the other "testPhone 4s". When I perform a fuzzy query with "testPhone 5", I am seeing that both documents are being given the exact same score value. Why is this occurring?

Extra info: I am indexing documents using the 'uax_url_email' tokenizer and 'lowercase' filter.

This is the query I am making:

{
    query : {
        bool: {
            // match one or the other fuzzy query
            should: [
                {
                    fuzzy: {
                        text: {
                            min_similarity: 0.4,
                            value: 'testphone 5',
                            prefix_length: 0,
                            boost: 5,
                        }
                    }
                },
                {
                    fuzzy: {
                        keywords: {
                            min_similarity: 0.4,
                            value: 'testphone 5',
                            prefix_length: 0,
                            boost: 1,
                        }
                    }
                }
            ]
        }
    },
    sort: [ 
        '_score'
    ],
    explain: true
}

This is the result:

{ max_score: 0.47213298,
  total: 2,
  hits:
  [ { _index: 'test',
     _shard: 0,
     _id: '51fbf95f82e89ae8c300002c',
     _node: '0Mtfzbe1RDinU71Ordx-Ag',
     _source:
    { next: { id: '51fbf95f82e89ae8c3000027' },
      cards: [ '51fbf95f82e89ae8c3000027', [length]: 1 ],
      other: false,
      _id: '51fbf95f82e89ae8c300002c',
      category: '51fbf95f82e89ae8c300002b',
      image: 'https://s3.amazonaws./sold_category_icons/Smartphones.png',
      text: 'testPhone 5',
      keywords: [ [length]: 0 ],
      __v: 0 },
   _type: 'productgroup',
   _explanation:
    { details:
       [ { details:
            [ { details:
                 [ { details:
                      [ { details:
                           [ { value: 3.8888888, description: 'boost' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.17020021,
                               description: 'queryNorm' },
                             [length]: 3 ],
                          value: 0.99999994,
                          description: 'queryWeight, product of:' },
                        { details:
                           [ { details:
                                [ { value: 1, description: 'termFreq=1.0' },
                                  [length]: 1 ],
                               value: 1,
                               description: 'tf(freq=1.0), with freq of:' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.625,
                               description: 'fieldNorm(doc=0)' },
                             [length]: 3 ],
                          value: 0.944266,
                          description: 'fieldWeight in 0, product of:' },
                        [length]: 2 ],
                     value: 0.94426596,
                     description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
                   [length]: 1 ],
                value: 0.94426596,
                description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
              [length]: 1 ],
           value: 0.94426596,
           description: 'sum of:' },
         { value: 0.5, description: 'coord(1/2)' },
         [length]: 2 ],
      value: 0.47213298,
      description: 'product of:' },
   _score: 0.47213298 },
 { _index: 'test',
   _shard: 4,
   _id: '51fbf95f82e89ae8c300002d',
   _node: '0Mtfzbe1RDinU71Ordx-Ag',
   _source:
    { next: { id: '51fbf95f82e89ae8c3000027' },
      cards: [ '51fbf95f82e89ae8c3000029', [length]: 1 ],
      other: false,
      _id: '51fbf95f82e89ae8c300002d',
      category: '51fbf95f82e89ae8c300002b',
      image: 'https://s3.amazonaws./sold_category_icons/Smartphones.png',
      text: 'testPhone 4s',
      keywords: [ 'apple', [length]: 1 ],
      __v: 0 },
   _type: 'productgroup',
   _explanation:
    { details:
       [ { details:
            [ { details:
                 [ { details:
                      [ { details:
                           [ { value: 3.8888888, description: 'boost' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.17020021,
                               description: 'queryNorm' },
                             [length]: 3 ],
                          value: 0.99999994,
                          description: 'queryWeight, product of:' },
                        { details:
                           [ { details:
                                [ { value: 1, description: 'termFreq=1.0' },
                                  [length]: 1 ],
                               value: 1,
                               description: 'tf(freq=1.0), with freq of:' },
                             { value: 1.5108256,
                               description: 'idf(docFreq=2, maxDocs=5)' },
                             { value: 0.625,
                               description: 'fieldNorm(doc=0)' },
                             [length]: 3 ],
                          value: 0.944266,
                          description: 'fieldWeight in 0, product of:' },
                        [length]: 2 ],
                     value: 0.94426596,
                     description: 'score(doc=0,freq=1.0 = termFreq=1.0\n), product of:' },
                   [length]: 1 ],
                value: 0.94426596,
                description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },
              [length]: 1 ],
           value: 0.94426596,
           description: 'sum of:' },
         { value: 0.5, description: 'coord(1/2)' },
         [length]: 2 ],
      value: 0.47213298,
      description: 'product of:' },
   _score: 0.47213298 },
 [length]: 2 ] }
Share Improve this question asked Aug 2, 2013 at 18:56 teztez 5746 silver badges13 bronze badges
Add a ment  | 

2 Answers 2

Reset to default 2

I ran into this issue myself recently. I can't tell you exactly why it is happening, but I CAN tell you how I fixed it:

I ran 2 queries over the same field, one with an exact match, and then the exact same query on the same field with fuzzy matches enabled and a lower boost.

That made sure that my exact matches always ended higher then the fuzzy matches.

P.S. I think they're scored equal because, because of the fuzziness, the both match and ES doesn't care that one is an exact match as long as the both match, but this is pure theory crafting on my end since i'm not intimately familiar with the scoring algorithm.

Fuzzy queries are not analyzed but the field is so your search for testphone 5 with a distance of 0.4 yields the analyzed term testphone for both documents and that term is used to further filter down the results

description: 'weight(text:testphone^3.8888888 in 0) [PerFieldSimilarity], result of:' },

See also @imotov excellent answer here: ElasticSearch's Fuzzy Query

You can see how exactly a string will be tokenized using the _analyze API

http://www.elasticsearch/guide/en/elasticsearch/reference/current/indices-analyze.html

i.e

http://localhost:9200/prefix_test/_analyze?field=text&text=testphone+5

will return:

{
   "tokens": [
      {
         "token": "testphone",
         "start_offset": 0,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "5",
         "start_offset": 10,
         "end_offset": 11,
         "type": "<NUM>",
         "position": 2
      }
   ]
}

So even if you index the value testphone sammsung a fuzzy query for "testphone samsunk" won't yield anything where as just samsunk will.

You may get better results by not analyzing (or using the keyword analyzer) the field.

If you want to have different analysis on a single field you can use the multi_field construct.

http://www.elasticsearch/guide/en/elasticsearch/reference/current/mapping-multi-field-type.html

本文标签: javascriptElasticsearch multi field fuzzy search not returning exact match firstStack Overflow