admin管理员组

文章数量:1315286

I'm using AWS Glue to read data from a DynamoDB table where the sort key sk (string) is a timestamp in the format 2024-04-10T00:00:00.000000+00:00. I'm trying to apply a push_down_predicate to filter records within a specific time range, but I'm getting unexpected results, including timestamps outside the specified range.

What I've Tried:

  1. DynamoDB Query: When I query directly from DynamoDB using the same timestamp format, the results are as expected.
  2. AWS Glue Job:
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
   database="my_database",  
   table_name="my_dynamodb_table",  
   push_down_predicate=f"sk >= '{start_timestamp}' AND sk < '{end_timestamp}'"
)
Here, `start_timestamp` and `end_timestamp` match the format in DynamoDB.

Observed Behavior: Instead of getting filtered results within the specified timestamp range, I'm seeing a mix of timestamps, including many outside the range.

Question:

Why isn't the push_down_predicate filtering the DynamoDB data as expected through AWS Glue, and how can I correctly apply this filter to get only the timestamps within the specified range?

I'm using AWS Glue to read data from a DynamoDB table where the sort key sk (string) is a timestamp in the format 2024-04-10T00:00:00.000000+00:00. I'm trying to apply a push_down_predicate to filter records within a specific time range, but I'm getting unexpected results, including timestamps outside the specified range.

What I've Tried:

  1. DynamoDB Query: When I query directly from DynamoDB using the same timestamp format, the results are as expected.
  2. AWS Glue Job:
dynamic_frame = glueContext.create_dynamic_frame.from_catalog(
   database="my_database",  
   table_name="my_dynamodb_table",  
   push_down_predicate=f"sk >= '{start_timestamp}' AND sk < '{end_timestamp}'"
)
Here, `start_timestamp` and `end_timestamp` match the format in DynamoDB.

Observed Behavior: Instead of getting filtered results within the specified timestamp range, I'm seeing a mix of timestamps, including many outside the range.

Question:

Why isn't the push_down_predicate filtering the DynamoDB data as expected through AWS Glue, and how can I correctly apply this filter to get only the timestamps within the specified range?

Share Improve this question edited Jan 30 at 20:33 fedonev 25.8k2 gold badges39 silver badges58 bronze badges asked Jan 30 at 13:23 Parag JadhavParag Jadhav 1,8992 gold badges25 silver badges42 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 2

DynamoDB connector does not support push down predicate filtering:

https://docs.aws.amazon/glue/latest/dg/aws-glue-programming-etl-connect-dynamodb-home.html

本文标签: amazon web servicesAWS pushdownpredicate not working with DynamoDbStack Overflow