admin管理员组

文章数量:1313082

I have a very messy date column like this in excel:

For example, the row which the row_id=23 in excel is input as dd/mm/yy format, so it should be read as 2023-09-07 when I import this worksheet into R, but now, when I use rio::read to import this dataset into R, R automatically convert it as numeric format as "45116", and when I use this code :

df_dl2 <- df_dl %>%
  mutate(across(c(`specimen taken`, `specimen received at NHL`), 
                function(x) as_datetime(ifelse(grepl("^[0-9\\.]+$", x), 
                                              as.numeric(x) |> as_date(origin = "1899-12-30") |> as_datetime(), 
                                              as_datetime(x, format = "%d-%m-%Y"))),
                .names = "cleaned_{.col}"))

to convert it back to the dd/mm/yy format, it shows as 2023-07-09 instead of the correct one:2023-09-07.

It happens on all those numerical stored date.They are all input in excel as 1/8/23, 1/9/23... they should be all read as dd/mm/yy but excel and r both think they are mm/dd/yy format and convert it as this rule.

I am confused from where I can tell either excel or R that this is not U.S. date format mm/dd/yy. It looks like when the dataset was read by r, it has been converted as numerical format date as mm/dd/yy. How can I get the correct convert in R?

I have a very messy date column like this in excel:

For example, the row which the row_id=23 in excel is input as dd/mm/yy format, so it should be read as 2023-09-07 when I import this worksheet into R, but now, when I use rio::read to import this dataset into R, R automatically convert it as numeric format as "45116", and when I use this code :

df_dl2 <- df_dl %>%
  mutate(across(c(`specimen taken`, `specimen received at NHL`), 
                function(x) as_datetime(ifelse(grepl("^[0-9\\.]+$", x), 
                                              as.numeric(x) |> as_date(origin = "1899-12-30") |> as_datetime(), 
                                              as_datetime(x, format = "%d-%m-%Y"))),
                .names = "cleaned_{.col}"))

to convert it back to the dd/mm/yy format, it shows as 2023-07-09 instead of the correct one:2023-09-07.

It happens on all those numerical stored date.They are all input in excel as 1/8/23, 1/9/23... they should be all read as dd/mm/yy but excel and r both think they are mm/dd/yy format and convert it as this rule.

I am confused from where I can tell either excel or R that this is not U.S. date format mm/dd/yy. It looks like when the dataset was read by r, it has been converted as numerical format date as mm/dd/yy. How can I get the correct convert in R?

Share Improve this question edited Jan 31 at 15:00 ThomasIsCoding 103k9 gold badges36 silver badges101 bronze badges asked Jan 31 at 14:33 RstudyerRstudyer 4772 silver badges9 bronze badges 2
  • 1 Could you edit your question so others can reproduce this issue? Images of data are highly discouraged for these reasons. Good luck! – jpsmith Commented Jan 31 at 14:43
  • 1 it would be better if you could provide a simplistic example, where you can provide a string array with different expressions of dates, and provide the corresponding desired output as well – ThomasIsCoding Commented Jan 31 at 14:45
Add a comment  | 

1 Answer 1

Reset to default 1

Probably you can find some interesting clues here

> as.Date(rowSums(sapply(fmts, as.Date, x = s), TRUE), origin = "1970-01-01")
[1] "2023-08-14" "2023-09-01" "2023-08-17" "2023-09-07"

or using coalesce

> do.call(dplyr::coalesce, lapply(fmts, as.Date, x = s))
[1] "2023-08-14" "2023-09-01" "2023-08-17" "2023-09-07"

Data

> s <- c("14-08-2023", "1/9/23", "17-08-2023", "7/9/23")

> fmts <- c("%d/%m/%y", "%d-%m-%Y")

本文标签: rmixed date converted from default mmddyy in excel to ddmmyyStack Overflow