admin管理员组

文章数量:1123951

I’m encountering an issue where the number of documents in my Elasticsearch index exceeds the number of rows returned by my MySQL query. Specifically:

  • MySQL Query Result: 4802 rows.
  • Elasticsearch Index Document Count: 5515 documents.

This discrepancy persists even though the query results in MySQL are consistent. I suspect something is happening in my Logstash pipeline or Elasticsearch indexing process.



Logstash Configuration:

Here is my Logstash pipeline:

Input Configuration:

jdbc {
  jdbc_connection_string => "jdbc:mysql://ipadresss:3306/SUPER_TEST_TRACK"
  jdbc_user => "admin"
  jdbc_password => "something"
  jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
  jdbc_driver_library => "/usr/share/java/mysql-connector-java-9.1.0.jar"
  jdbc_paging_enabled => true
  jdbc_page_size => 9000
  statement => "SELECT
                  s.site AS siteName,
                  DATE_FORMAT(p.date_production, '%Y-%m-%d') AS productionDate,
                  SUM(p.initial_quantity * pcv.criterion_value) AS ptotalMeterage
                FROM
                  palet p
                JOIN palettes_criteria pc ON p.palet_id = pc.palette_id
                JOIN criterion_value pcv ON pc.criterion_value_id = pcv.criterion_value_id
                JOIN criterion c ON pcv.criterion_id = c.id_criterion
                JOIN site s ON p.id_site = s.id_site
                WHERE
                  c.criterion_name = 'unp'
                GROUP BY
                  s.site, productionDate
                ORDER BY
                  productionDate"
  schedule => "* * * * *"
}

Filter Configuration:

mutate {
  rename => {
    "siteName" => "site_name"
    "productionDate" => "production_date"
    "ptotalMeterage" => "p_total_meterage"
  }
  remove_field => ["@timestamp", "@version"]
}

Output Configuration:

elasticsearch {
  hosts => ["http://ipaddress:9200"]
  index => "produced-palette-meterage-per-site-and-date"
  document_id => "palet-%{sitename}-%{productiondate}-%{ptotalmeterage}"
}

Observations:

  1. Consistent MySQL Query Results:
  • Running the SQL query directly in MySQL consistently returns 4802 rows.
  • Example row from MySQL:
| siteName  | productionDate | ptotalMeterage      |
| BERRECHID | 2023-12-17     | 14364.279999999972  |
  1. Elasticsearch Index:
  • Document count (docs.count): 5515
  • Example document in Elasticsearch:
{
  "_index": "produced-palette-meterage-per-site-and-date",
  "_type": "_doc",
  "_id": "palet-BERRECHID-2023-12-17-14364.279999999972",
  "_source": {
    "sitename": "BERRECHID",
    "productiondate": "2023-12-17",
    "ptotalmeterage": 14364.279999999972
  }
}

where is the issue and how to solve it?

本文标签: logstashElasticsearch Index Contains More Documents Than MySQL Query ResultsStack Overflow