admin管理员组文章数量:1289529
I am working on the Google Data Analytics Certificate and trying to clean a dataset consisting of 3 columns in BigQuery:
- An Id number
- A date in MM/DD/YYYY HH:MM:SS AM/PM format
- Number of calories
Columns 1 and 3 I was able to upload as integers but I’ve had many issues with the second column. I ended up just uploading column 2 as a string. Ideally, I want to replace it with the proper format (YYYY-MM-DD HH:MM:SS) and as a timestamp.
So from this:
4/25/2016 09:37:35 AM as a string
to this:
2016-04-25 09:37:35 UTC as a timestamp
I have been trying to fix this for a while now and am very new. Any feedback or recommendations at all are greatly appreciated. Thank you!
TLDR; Have string column (all dates) in BigQuery in MM/DD/YYYY HH:MM:SS AM/PM format and want it in YYYY-MM-DD HH:MM:SS format as a timestamp.
I tried a lot of different ways to fix this issue so far:
I tried fixing the format in Excel like I did with other files but it was too big to import.
I tried casting it as a timestamp and I got an error that it was improperly formatted. I tried fixing the format and I got an error that it was the wrong datatype.
I tried parsing it as a timestamp in the correct format which worked. I saved it to a destination table and I then cast this into a timestamp and that worked as well. To add it to the main data table, I tried appending it to the file where I would then drop the other poorly formatted column but when I did this it gave me an error: Invalid schema update. Cannot add fields (field: f0_). I then rewrote the original query using a subquery to pull the Id and the fixed column together. I planned to join it to the original datatable on Id but when I ran the query it gave me the error: scalar subquery produces more than one element. I tried overwriting the datatable too and that obviously didn’t work.
The code I used to parse the column:
SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart
The subquery I used:
SELECT
Id,
(SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart
)
FROM
dataproject.bellabeat_fitness_data.412_512_heart
I tried UPDATE but before I could tweak anything I got an error that I needed to upgrade from the free tier to the upgraded free trial to use DML queries. This is the last thing I can think of to fix this issue but I don’t want to give payment information if I don’t have to.
The UPDATE code I tried using (not 100% sure if it would work since it wouldn't let me try to run it):
UPDATE
dataproject.bellabeat_fitness_data.412_512_heart
SET
Time = (SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart
)
I am working on the Google Data Analytics Certificate and trying to clean a dataset consisting of 3 columns in BigQuery:
- An Id number
- A date in MM/DD/YYYY HH:MM:SS AM/PM format
- Number of calories
Columns 1 and 3 I was able to upload as integers but I’ve had many issues with the second column. I ended up just uploading column 2 as a string. Ideally, I want to replace it with the proper format (YYYY-MM-DD HH:MM:SS) and as a timestamp.
So from this:
4/25/2016 09:37:35 AM as a string
to this:
2016-04-25 09:37:35 UTC as a timestamp
I have been trying to fix this for a while now and am very new. Any feedback or recommendations at all are greatly appreciated. Thank you!
TLDR; Have string column (all dates) in BigQuery in MM/DD/YYYY HH:MM:SS AM/PM format and want it in YYYY-MM-DD HH:MM:SS format as a timestamp.
I tried a lot of different ways to fix this issue so far:
I tried fixing the format in Excel like I did with other files but it was too big to import.
I tried casting it as a timestamp and I got an error that it was improperly formatted. I tried fixing the format and I got an error that it was the wrong datatype.
I tried parsing it as a timestamp in the correct format which worked. I saved it to a destination table and I then cast this into a timestamp and that worked as well. To add it to the main data table, I tried appending it to the file where I would then drop the other poorly formatted column but when I did this it gave me an error: Invalid schema update. Cannot add fields (field: f0_). I then rewrote the original query using a subquery to pull the Id and the fixed column together. I planned to join it to the original datatable on Id but when I ran the query it gave me the error: scalar subquery produces more than one element. I tried overwriting the datatable too and that obviously didn’t work.
The code I used to parse the column:
SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart
The subquery I used:
SELECT
Id,
(SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart
)
FROM
dataproject.bellabeat_fitness_data.412_512_heart
I tried UPDATE but before I could tweak anything I got an error that I needed to upgrade from the free tier to the upgraded free trial to use DML queries. This is the last thing I can think of to fix this issue but I don’t want to give payment information if I don’t have to.
The UPDATE code I tried using (not 100% sure if it would work since it wouldn't let me try to run it):
UPDATE
dataproject.bellabeat_fitness_data.412_512_heart
SET
Time = (SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart
)
- I don't know BigQuery, but I would try converting to unix time, possibly using this function and then converting unix back to your desired format, possibly using answers to this question – DuesserBaest Commented Feb 20 at 7:12
1 Answer
Reset to default 0Use parse_timestamp
and format_timestamp
along with the proper formatting elements.
Note that a datetime
doesn't have a timezone. To include a timezone, you would use timestamp
.
-- get a timestamp
select parse_timestamp(
'%m/%d/%Y %T %p',
'4/25/2016 09:37:35 AM',
'UTC'
)
;
-- 2016-04-25 09:37:35 UTC (string representation)
-- get the string representation you specified
select format_timestamp(
'%F %T %Z',
parse_timestamp(
'%m/%d/%Y %T %p',
'4/25/2016 09:37:35 AM',
'UTC'
)
)
;
-- 2016-04-25 09:37:35 UTC
本文标签:
版权声明:本文标题:parsing - Need help replacing poorly formatted string dates as properly formatted timestamps in BigQuery - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741459082a2379929.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论