converting dates in two formats in a CSV to one standardised format in R - Stack Overflow

IT技术

更新时间：2025-04-185

admin管理员组
文章数量:1406942

I have some data from five temperature loggers as exported as csvs. Frustratingly it exports the data into a csv with two different date formats either mm/dd/yyyy hh:mm OR mm/dd/yy hh:mm:ss AM/PM in my DateTime_BST column.

I've tried changing the format of the data directly in in the csv but it does not work, it just reverts back to two different states or doesn't change at all. So I thought I could do it in R but I'm not really sure how.

I have the code below which imports all five files into R and combines them into one dataset

hobo_files <- list.files(path = "HOBO_DAT", pattern = "\\.csv$", full.names = TRUE)

hobo_dat <- rbindlist(lapply(hobo_files, fread), use.names = TRUE, fill = TRUE)

str(hobo_dat)

str(hobo_dat)
Classes ‘data.table’ and 'data.frame':  103375 obs. of  6 variables:
 $ hobo_sn     : int  21863784 21863784 21863784 21863784 21863784 21863784 21863784 21863784 21863784 21863784 ...
 $ rec_sn      : int  461703 461703 461703 461703 461703 461703 461703 461703 461703 461703 ...
 $ DateTime_BST: chr  "06/09/2024 14:13" "06/09/2024 14:13" "06/09/2024 14:23" "06/09/2024 14:33" ...
 $ Temp        : num  19.7 NA 20.9 21.3 21.6 ...
 $ DateTime_in : chr  "12/06/2024 09:32" "12/06/2024 09:32" "12/06/2024 09:32" "12/06/2024 09:32" ...
 $ DateTime_out: chr  "06/11/2024 16:05" "06/11/2024 16:05" "06/11/2024 16:05" "06/11/2024 16:05" ...

How do I get a standardised date format for my DateTime_BST column?

I have some data from five temperature loggers as exported as csvs. Frustratingly it exports the data into a csv with two different date formats either mm/dd/yyyy hh:mm OR mm/dd/yy hh:mm:ss AM/PM in my DateTime_BST column.

I've tried changing the format of the data directly in in the csv but it does not work, it just reverts back to two different states or doesn't change at all. So I thought I could do it in R but I'm not really sure how.

I have the code below which imports all five files into R and combines them into one dataset

hobo_files <- list.files(path = "HOBO_DAT", pattern = "\\.csv$", full.names = TRUE)

hobo_dat <- rbindlist(lapply(hobo_files, fread), use.names = TRUE, fill = TRUE)

str(hobo_dat)

str(hobo_dat)
Classes ‘data.table’ and 'data.frame':  103375 obs. of  6 variables:
 $ hobo_sn     : int  21863784 21863784 21863784 21863784 21863784 21863784 21863784 21863784 21863784 21863784 ...
 $ rec_sn      : int  461703 461703 461703 461703 461703 461703 461703 461703 461703 461703 ...
 $ DateTime_BST: chr  "06/09/2024 14:13" "06/09/2024 14:13" "06/09/2024 14:23" "06/09/2024 14:33" ...
 $ Temp        : num  19.7 NA 20.9 21.3 21.6 ...
 $ DateTime_in : chr  "12/06/2024 09:32" "12/06/2024 09:32" "12/06/2024 09:32" "12/06/2024 09:32" ...
 $ DateTime_out: chr  "06/11/2024 16:05" "06/11/2024 16:05" "06/11/2024 16:05" "06/11/2024 16:05" ...

How do I get a standardised date format for my DateTime_BST column?

Share Improve this question asked Mar 6 at 14:42 mikejwilliamson 5192 gold badges9 silver badges22 bronze badges

as.POSIXct(hobo_dat$DateTime_BST, format = "%m/%d/%Y %H:%M") should give you that column as a standard R Date-time object for the example data shown. The ones that are in mm/dd/yy hh:mm:ss am/pm format you can convert with as.POSIXct(hobo_dat$DateTime_BST, format = "%m/%d/%y %H:%M:%S %p") – Allan Cameron Commented Mar 6 at 15:06
See stackoverflow/a/70304571/3358272 for "candidate formats", a loop that uses multiple candidate formats for converting to POSIXt. – r2evans Commented Mar 6 at 15:06
2 @r2evans though we need to be careful with mm/dd/yy formats - some of them might convert "successfully" with the year eg 0024 instead of 2024. If I read the post correctly, we could pick out the two formats by noting that they have a different number of characters (fixed at 17 versus fixed at 19) – Allan Cameron Commented Mar 6 at 15:08

Add a comment |

2 Answers 2

Sorted by: Reset to default 2

Revision of my previous answer that addresses y2k ambiguity.

psxfun <- function(x) {
  candidates <- c("%d/%m/%y %H:%M:%S", "%d/%m/%y %H:%M", "%Y-%m-%d %H:%M:%S", "%Y-%m-%d %H:%M", "%d/%m/%Y %H:%M:%S", "%d/%m/%Y %H:%M")
  out <- rep(.POSIXct(NA), length = length(x))
  for (fmt in candidates) {
    if (!length(isna <- is.na(out))) break
    out[isna] <- as.POSIXct(x[isna], format = fmt)
  }
  out
}

The order of formats in candidate_formats is important; for instance, %y will not match 2024 but %Y will match 24 and parse not how we intend here. Also, aside from the 2y/4y thing, it is slightly more efficient to have more-likely candidates first, as it will reduce repeats in the loop and calls to as.POSIXct; this is mostly a theoretical preference until you're dealing with millions of rows, where the extra calls might be palpable (though still manageable).

Sample data, where the last column has some 2y/4y ambiguity (mixed, even):

hobo_dat <- data.table::as.data.table(structure(list(hobo_sn = c(21863784L, 21863784L, 21863784L, 21863784L, 21863784L), rec_sn = c(461703L, 461703L, 461703L, 461703L, 461703L), DateTime_BST = c("06/09/2024 14:13", "06/09/2024 14:13", "06/09/2024 14:23", "06/09/2024 14:33", NA), Temp = c(19.7, NA, 20.9, 21.3, 21.6), DateTime_in = c("12/06/2024 09:32", "12/06/2024 09:32", "12/06/2024 09:32", "12/06/2024 09:32", NA), DateTime_out = c("06/11/24 16:05", "06/11/24 16:05", "06/11/2024 16:05", "06/11/2024 16:05", NA)), row.names = c(NA, -5L ), class = c("data.table", "data.frame")))
hobo_dat
#     hobo_sn rec_sn     DateTime_BST  Temp      DateTime_in     DateTime_out
#       <int>  <int>           <char> <num>           <char>           <char>
# 1: 21863784 461703 06/09/2024 14:13  19.7 12/06/2024 09:32   06/11/24 16:05
# 2: 21863784 461703 06/09/2024 14:13    NA 12/06/2024 09:32   06/11/24 16:05
# 3: 21863784 461703 06/09/2024 14:23  20.9 12/06/2024 09:32 06/11/2024 16:05
# 4: 21863784 461703 06/09/2024 14:33  21.3 12/06/2024 09:32 06/11/2024 16:05
# 5: 21863784 461703             <NA>  21.6             <NA>             <NA>

Each column, manually:

psxfun(hobo_dat$DateTime_BST)
# [1] "2024-09-06 14:13:00 EDT" "2024-09-06 14:13:00 EDT" "2024-09-06 14:23:00 EDT" "2024-09-06 14:33:00 EDT" NA                       
psxfun(hobo_dat$DateTime_in)
# [1] "2024-06-12 09:32:00 EDT" "2024-06-12 09:32:00 EDT" "2024-06-12 09:32:00 EDT" "2024-06-12 09:32:00 EDT" NA                       
psxfun(hobo_dat$DateTime_out)
# [1] "2024-11-06 16:05:00 EST" "2024-11-06 16:05:00 EST" "2024-11-06 16:05:00 EST" "2024-11-06 16:05:00 EST" NA

Since this is a data.table, assuming 1.16.0 or newer (for the names(.SD) := part):

hobo_dat[, names(.SD) := lapply(.SD, psxfun), .SDcols = c("DateTime_BST", "DateTime_in", "DateTime_out")]
#     hobo_sn rec_sn        DateTime_BST  Temp         DateTime_in        DateTime_out
#       <int>  <int>              <POSc> <num>              <POSc>              <POSc>
# 1: 21863784 461703 2024-09-06 14:13:00  19.7 2024-06-12 09:32:00 2024-11-06 16:05:00
# 2: 21863784 461703 2024-09-06 14:13:00    NA 2024-06-12 09:32:00 2024-11-06 16:05:00
# 3: 21863784 461703 2024-09-06 14:23:00  20.9 2024-06-12 09:32:00 2024-11-06 16:05:00
# 4: 21863784 461703 2024-09-06 14:33:00  21.3 2024-06-12 09:32:00 2024-11-06 16:05:00
# 5: 21863784 461703                <NA>  21.6                <NA>                <NA>

though it can be done just as easily with either of

## base R, even on a `data.table` object
for (nm in c("DateTime_BST", "DateTime_in", "DateTime_out")) hobo_dat[[nm]] <- psxfun(hobo_dat[[nm]])
## dplyr
library(dplyr)
hobo_dat <- mutate(hobo_dat, across(c(DateTime_BST, DateTime_in, DateTime_out), ~ psxfun(.x)))

I ended using an ifelse function to detect where rows contained an AM/PM and then specifying format using that, which works pretty well I think.

hobo_dat$DateTime_BST_cor <- ifelse(
  grepl("AM|PM", hobo_dat$DateTime_BST),  # If the datetime string contains AM/PM
  as.POSIXct(hobo_dat$DateTime_BST, format = "%m/%d/%y %I:%M:%S %p"),  # Parse with 12-hour format (AM/PM)
  as.POSIXct(hobo_dat$DateTime_BST, format = "%m/%d/%Y %H:%M")          # Parse with 24-hour format
)

本文标签： converting dates in two formats in a CSV to one standardised format in RStack Overflow

版权声明：本文标题：converting dates in two formats in a CSV to one standardised format in R - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744968878a2635113.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

converting dates in two formats in a CSV to one standardised format in R - Stack Overflow

2 Answers 2

更多相关文章