admin管理员组文章数量:1334887
I have an issue with Athena when trying to query records between dates that are stored as string.
My csv dataset, spread among several files across directories, has a quote_date
column with 10/8/2024
format. I created the prices
table with quote_date
as string (not date) otherwise the column would result empty - in my understanding because the string->date conversion expects an ISO format.
ID,QUOTE_DATE,PRICE
435,10/2/2024,5.80
123,10/12/2024,5.62
489,10/28/2024,5.04
I want to query records between dates on that column.
For some reason this works fine:
select
customer_id,
q_date,
normalized_price
from (
select *, date_parse(quote_date,'%m/%d/%Y') as q_date
from prices
where customer_id is not null
) as tbl
where
customer_id = 435
order by q_date desc
# customer_id q_date normalized_price
1 435 2024-11-19 00:00:00.000 5.62
2 435 2024-11-19 00:00:00.000 5.54
3 435 2024-11-19 00:00:00.000 5.2
...
but for some reason this, adding the filter on q_date, breaks, and unexpectedly on the quote_date
column (?!):
select
customer_id,
q_date,
normalized_price
from (
select *, date_parse(quote_date,'%m/%d/%Y') as q_date
from prices
where customer_id is not null
) as tbl
where
customer_id = 435
AND
date(q_date) > date('2024-11-15')
order by q_date desc
INVALID_FUNCTION_ARGUMENT: Invalid format: “QUOTE_DATE” This query ran against the “athena_dtn” database, unless qualified by the query.
Checking the data I can't see any malformed/anexpected string format in the quote_date
values, so I really can't figure out what would be the issue.
Any suggestions?
UPDATE
It looks like there is something wrong somewhere when converting the date string.
The following query returns INVALID_FUNCTION_ARGUMENT: Invalid format: "QUOTE_DATE"
SELECT * FROM prices WHERE customer_id = 435 AND date_parse(quote_date, '%m/%d/%Y') > DATE('2024-11-15')
The following one, adding
try()
on the date parsing, works just fine (returning the expected results)SELECT * FROM prices WHERE customer_id = 435 AND try(date_parse(quote_date, '%m/%d/%Y')) > DATE('2024-11-15')
The following query works as well, but returning no results (!)
SELECT * FROM prices WHERE customer_id = 435 AND try(date_parse(quote_date, '%m/%d/%Y')) IS NULL
I guess this solves my blocking issue, however it would be nice to learn what might be the underlying problem if anybody can provide some insight on the conversion
I have an issue with Athena when trying to query records between dates that are stored as string.
My csv dataset, spread among several files across directories, has a quote_date
column with 10/8/2024
format. I created the prices
table with quote_date
as string (not date) otherwise the column would result empty - in my understanding because the string->date conversion expects an ISO format.
ID,QUOTE_DATE,PRICE
435,10/2/2024,5.80
123,10/12/2024,5.62
489,10/28/2024,5.04
I want to query records between dates on that column.
For some reason this works fine:
select
customer_id,
q_date,
normalized_price
from (
select *, date_parse(quote_date,'%m/%d/%Y') as q_date
from prices
where customer_id is not null
) as tbl
where
customer_id = 435
order by q_date desc
# customer_id q_date normalized_price
1 435 2024-11-19 00:00:00.000 5.62
2 435 2024-11-19 00:00:00.000 5.54
3 435 2024-11-19 00:00:00.000 5.2
...
but for some reason this, adding the filter on q_date, breaks, and unexpectedly on the quote_date
column (?!):
select
customer_id,
q_date,
normalized_price
from (
select *, date_parse(quote_date,'%m/%d/%Y') as q_date
from prices
where customer_id is not null
) as tbl
where
customer_id = 435
AND
date(q_date) > date('2024-11-15')
order by q_date desc
INVALID_FUNCTION_ARGUMENT: Invalid format: “QUOTE_DATE” This query ran against the “athena_dtn” database, unless qualified by the query.
Checking the data I can't see any malformed/anexpected string format in the quote_date
values, so I really can't figure out what would be the issue.
Any suggestions?
UPDATE
It looks like there is something wrong somewhere when converting the date string.
The following query returns INVALID_FUNCTION_ARGUMENT: Invalid format: "QUOTE_DATE"
SELECT * FROM prices WHERE customer_id = 435 AND date_parse(quote_date, '%m/%d/%Y') > DATE('2024-11-15')
The following one, adding
try()
on the date parsing, works just fine (returning the expected results)SELECT * FROM prices WHERE customer_id = 435 AND try(date_parse(quote_date, '%m/%d/%Y')) > DATE('2024-11-15')
The following query works as well, but returning no results (!)
SELECT * FROM prices WHERE customer_id = 435 AND try(date_parse(quote_date, '%m/%d/%Y')) IS NULL
I guess this solves my blocking issue, however it would be nice to learn what might be the underlying problem if anybody can provide some insight on the conversion
Share Improve this question edited Nov 21, 2024 at 18:52 Stefano asked Nov 20, 2024 at 20:18 StefanoStefano 7041 gold badge6 silver badges20 bronze badges 5- Can you show us some examples of the data in the actual input files, particularly how the date is stored? – John Rotenstein Commented Nov 20, 2024 at 23:13
- @JohnRotenstein I added a simplified extract of the csv files – Stefano Commented Nov 21, 2024 at 18:26
- Can you show us the actual data in the CSV file? It might be that the date is not in the format you expect. – John Rotenstein Commented Nov 21, 2024 at 21:02
- I have hundreds of similar files, with the same format. There might be though some unexpected non utf-8 character somewhere, but that doesn't show up on an HTML page... Maybe the try function helps with that? ¯_(ツ)_/¯ – Stefano Commented Nov 21, 2024 at 22:23
- It might be that the dates aren't in M/D/Y format, or it might be that some lines have bad formats or bad characters. You should start by creating a file with some random lines from the normal CSV files and see whether it is working as expected. It would be great to identify a bad line to figure out what is happening. – John Rotenstein Commented Nov 21, 2024 at 22:37
1 Answer
Reset to default 0date
is:
an alias for
CAST(x AS date)
.
It expects a date, not timestamp, you can parse the timestamp and process it:
-- sample data
WITH dataset(q_date) as (
values ('2024-11-19 00:00:00.000'),
('2024-11-18 01:00:00.000')
)
-- query
select date_parse(q_date, '%Y-%m-%d %T.%f')
from dataset
where date_parse(q_date, '%Y-%m-%d %T.%f') >= date '2024-11-19';
版权声明:本文标题:sql - Error in Athena when trying to query records between dates that are stored as string - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742331474a2454781.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论