admin管理员组

文章数量:1289529

I am working on the Google Data Analytics Certificate and trying to clean a dataset consisting of 3 columns in BigQuery:

  1. An Id number
  2. A date in MM/DD/YYYY HH:MM:SS AM/PM format
  3. Number of calories

Columns 1 and 3 I was able to upload as integers but I’ve had many issues with the second column. I ended up just uploading column 2 as a string. Ideally, I want to replace it with the proper format (YYYY-MM-DD HH:MM:SS) and as a timestamp.

So from this:

4/25/2016 09:37:35 AM as a string

to this:

2016-04-25 09:37:35 UTC as a timestamp

I have been trying to fix this for a while now and am very new. Any feedback or recommendations at all are greatly appreciated. Thank you!

TLDR; Have string column (all dates) in BigQuery in MM/DD/YYYY HH:MM:SS AM/PM format and want it in YYYY-MM-DD HH:MM:SS format as a timestamp.

I tried a lot of different ways to fix this issue so far:

I tried fixing the format in Excel like I did with other files but it was too big to import.

I tried casting it as a timestamp and I got an error that it was improperly formatted. I tried fixing the format and I got an error that it was the wrong datatype.

I tried parsing it as a timestamp in the correct format which worked. I saved it to a destination table and I then cast this into a timestamp and that worked as well. To add it to the main data table, I tried appending it to the file where I would then drop the other poorly formatted column but when I did this it gave me an error: Invalid schema update. Cannot add fields (field: f0_). I then rewrote the original query using a subquery to pull the Id and the fixed column together. I planned to join it to the original datatable on Id but when I ran the query it gave me the error: scalar subquery produces more than one element. I tried overwriting the datatable too and that obviously didn’t work.

The code I used to parse the column:

SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart

The subquery I used:

SELECT Id, (SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart) FROM dataproject.bellabeat_fitness_data.412_512_heart

I tried UPDATE but before I could tweak anything I got an error that I needed to upgrade from the free tier to the upgraded free trial to use DML queries. This is the last thing I can think of to fix this issue but I don’t want to give payment information if I don’t have to.

The UPDATE code I tried using (not 100% sure if it would work since it wouldn't let me try to run it):

UPDATE dataproject.bellabeat_fitness_data.412_512_heart SET Time = (SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart)

I am working on the Google Data Analytics Certificate and trying to clean a dataset consisting of 3 columns in BigQuery:

  1. An Id number
  2. A date in MM/DD/YYYY HH:MM:SS AM/PM format
  3. Number of calories

Columns 1 and 3 I was able to upload as integers but I’ve had many issues with the second column. I ended up just uploading column 2 as a string. Ideally, I want to replace it with the proper format (YYYY-MM-DD HH:MM:SS) and as a timestamp.

So from this:

4/25/2016 09:37:35 AM as a string

to this:

2016-04-25 09:37:35 UTC as a timestamp

I have been trying to fix this for a while now and am very new. Any feedback or recommendations at all are greatly appreciated. Thank you!

TLDR; Have string column (all dates) in BigQuery in MM/DD/YYYY HH:MM:SS AM/PM format and want it in YYYY-MM-DD HH:MM:SS format as a timestamp.

I tried a lot of different ways to fix this issue so far:

I tried fixing the format in Excel like I did with other files but it was too big to import.

I tried casting it as a timestamp and I got an error that it was improperly formatted. I tried fixing the format and I got an error that it was the wrong datatype.

I tried parsing it as a timestamp in the correct format which worked. I saved it to a destination table and I then cast this into a timestamp and that worked as well. To add it to the main data table, I tried appending it to the file where I would then drop the other poorly formatted column but when I did this it gave me an error: Invalid schema update. Cannot add fields (field: f0_). I then rewrote the original query using a subquery to pull the Id and the fixed column together. I planned to join it to the original datatable on Id but when I ran the query it gave me the error: scalar subquery produces more than one element. I tried overwriting the datatable too and that obviously didn’t work.

The code I used to parse the column:

SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart

The subquery I used:

SELECT Id, (SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart) FROM dataproject.bellabeat_fitness_data.412_512_heart

I tried UPDATE but before I could tweak anything I got an error that I needed to upgrade from the free tier to the upgraded free trial to use DML queries. This is the last thing I can think of to fix this issue but I don’t want to give payment information if I don’t have to.

The UPDATE code I tried using (not 100% sure if it would work since it wouldn't let me try to run it):

UPDATE dataproject.bellabeat_fitness_data.412_512_heart SET Time = (SELECT
parse_datetime('%m/%d/%Y %r', Time) AS Time1
FROM
dataproject.bellabeat_fitness_data.412_512_heart)

Share Improve this question asked Feb 20 at 4:11 Ryelyn DeschRyelyn Desch 91 bronze badge 1
  • I don't know BigQuery, but I would try converting to unix time, possibly using this function and then converting unix back to your desired format, possibly using answers to this question – DuesserBaest Commented Feb 20 at 7:12
Add a comment  | 

1 Answer 1

Reset to default 0

Use parse_timestamp and format_timestamp along with the proper formatting elements.

Note that a datetime doesn't have a timezone. To include a timezone, you would use timestamp.

-- get a timestamp
select parse_timestamp(
  '%m/%d/%Y %T %p',
  '4/25/2016 09:37:35 AM',
  'UTC'
)
;
-- 2016-04-25 09:37:35 UTC (string representation)

-- get the string representation you specified
select format_timestamp(
  '%F %T %Z',
  parse_timestamp(
    '%m/%d/%Y %T %p',
    '4/25/2016 09:37:35 AM',
    'UTC'
  )
)
;
-- 2016-04-25 09:37:35 UTC

本文标签: