admin管理员组

文章数量:1122846

I have a Hudi table generated by Spark; the schema was like:

id: int64
content: string
create_date: timestamp[ns]

This table was super large. Most of the queries we perform on this table involve range queries on create_date:

select xx from table where xxx and xxx and create_date>='2024-01-01 00:00:00' and create_date<='2024-01-02 00:00:00'

Each time the query has to spend a long time scanning all data in this table, even if I just want to do some filtering or aggregation on data of a certain date. How should I build indexes in this Hudi table to speed up my queries?

本文标签: