admin管理员组

文章数量:1349237

Hello we faced a problem with lowercase normalizer during aggregation query. We have an initial mapping like

"mappings": {
    "properties": {
      "keyword_value": {
        "type":  "keyword",'
        "normalizer": "lowercase_normalizer"
      }
    }
  }

During an aggregation query it will return aggregation function result like sum, count, etc. and the keyword_value as key in the lower case.

The issue is that we want to retrieve a keyword_value value in its original case.

If we make a basic search then we can retrieve data from keyword_value field in its original case.

We have a couple approaches in mind like making additional query to retrieve original values(could affect our performance). Also another approach is to update mapping with a new field without normalizer and update new fields value with additional query(not a suitable approach for us since we don't want to reindex the data).

So could you please suggest me the best approach how we can retrieve the keyword_value in its original case? Maybe we can somehow ignore lowercase normalizer during query? Why aggregation returns key in lower case but basic query returns in original?

Hello we faced a problem with lowercase normalizer during aggregation query. We have an initial mapping like

"mappings": {
    "properties": {
      "keyword_value": {
        "type":  "keyword",'
        "normalizer": "lowercase_normalizer"
      }
    }
  }

During an aggregation query it will return aggregation function result like sum, count, etc. and the keyword_value as key in the lower case.

The issue is that we want to retrieve a keyword_value value in its original case.

If we make a basic search then we can retrieve data from keyword_value field in its original case.

We have a couple approaches in mind like making additional query to retrieve original values(could affect our performance). Also another approach is to update mapping with a new field without normalizer and update new fields value with additional query(not a suitable approach for us since we don't want to reindex the data).

So could you please suggest me the best approach how we can retrieve the keyword_value in its original case? Maybe we can somehow ignore lowercase normalizer during query? Why aggregation returns key in lower case but basic query returns in original?

Share Improve this question edited Apr 2 at 6:54 Ilya Basalyha asked Apr 2 at 5:48 Ilya BasalyhaIlya Basalyha 11 bronze badge New contributor Ilya Basalyha is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct. 2
  • how exactly is this java related? Seeing as you have: "normalizer": "lowercase_normalizer", are you really surprised you get lowercases? Maybe it's worth looking at other elasticsearch questions about the same topic, like: stackoverflow/questions/51664234/… – Stultuske Commented Apr 2 at 6:38
  • Removed java tag thanks for the pointing. Yes, I have little experience with elastic and currently working with exisiting code. Thanks for sharing the topic, but as stated in the question we're already having similar approach in mind, but we're want to avoid updating existing index mapping – Ilya Basalyha Commented Apr 2 at 7:01
Add a comment  | 

1 Answer 1

Reset to default 0

update mapping with a new field without normalizer

This is the most efficient way for your use case because of the followings.

  1. easy to implement

  2. don't need to reindex

    1. The new data will have both keyword_value and keyword_value_original

    2. for the existing data use _update_by_query API call

  3. Better search speed when you compare with other solutions.

Here is how to:

PUT test_index_lowercase
{
  "mappings": {
    "properties": {
      "keyword_value": {
        "type": "keyword",
        "normalizer": "lowercase_normalizer"
      }
    }
  },
  "settings": {
    "analysis": {
      "normalizer": {
        "lowercase_normalizer": {
          "type": "custom",
          "filter": ["lowercase","asciifolding"]
        }
      }
    }
  }
}

PUT test_index_lowercase/_doc/1
{
  "keyword_value": "MuSaB"
}

PUT test_index_lowercase/_doc/2
{
  "keyword_value": "musab"
}

GET test_index_lowercase/_search
{
  "size": 0,
  "aggs": {
    "NAME": {
      "terms": {
        "field": "keyword_value"
      }
    }
  }
}

PUT test_index_lowercase/_mapping
{
  "properties": {
    "keyword_value": {
      "type": "keyword",
      "normalizer": "lowercase_normalizer",
      "fields": {
        "original": {
          "type": "keyword"
        }
      }
    }
  }
}

POST test_index_lowercase/_update_by_query?conflicts=proceed

GET test_index_lowercase/_search
{
  "size": 0,
  "aggs": {
    "1": {
      "terms": {
        "field": "keyword_value"
      }
    },
    "2": {
      "terms": {
        "field": "keyword_value.original"
      }
    }
  }
}

本文标签: