admin管理员组

文章数量:1352341

I try to parse a dataset with multiple issues in the date format. Here is a small example:

tibble(
  date = c("31.12.", "30.01.", "30.02.")
) 

I do some clean-up and end up with an warning I don't understand

parse_dmy = function(x){
  d = lubridate::dmy(x, quiet = TRUE)
  errors = x[!is.na(x) & is.na(d)]
  if(length(errors) > 0){
    cli::cli_warn("Failed to parse some dates: {.val {errors}}")
  }
  d
}


tibble::tibble(
  date = c("31.12.", "30.01.", "30.02.")
) |>
  dplyr::mutate(
    date2 = dplyr::case_when(
      stringr::str_ends(date, ".12.") ~ parse_dmy(stringr::str_c(date, 2019)),
      stringr::str_starts(date, "30.02") ~ lubridate::rollforward(parse_dmy(stringr::str_c("01.02.",2022))),
      .default = parse_dmy(stringr::str_c(date, 2022))
    ),
    case = dplyr::case_when(
      stringr::str_ends(date, ".12.") ~ "december case",
      stringr::str_starts(date, "30.02") ~ "february case",
      .default = "default case"
    ),
  )

here is the result:

# A tibble: 3 × 3
  date   date2      case         
  <chr>  <date>     <chr>        
1 31.12. 2019-12-31 december case
2 30.01. 2022-01-30 default case 
3 30.02. 2022-02-28 february case
Warning message:
There were 2 warnings in `dplyr::mutate()`.
The first warning was:
ℹ In argument: `date2 = dplyr::case_when(...)`.
Caused by warning:
! Failed to parse some dates: "30.02.2019"
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning.

I tried changing the order of the two cases, but it doesn't change anything. I don't understand why it tries to parse 30.02.2019 unless it execute the "then" clause without checking the "when" clause first. Could you help me change the code to avoid this warning? I need warnings to be on to catch further parsing errors.

I try to parse a dataset with multiple issues in the date format. Here is a small example:

tibble(
  date = c("31.12.", "30.01.", "30.02.")
) 

I do some clean-up and end up with an warning I don't understand

parse_dmy = function(x){
  d = lubridate::dmy(x, quiet = TRUE)
  errors = x[!is.na(x) & is.na(d)]
  if(length(errors) > 0){
    cli::cli_warn("Failed to parse some dates: {.val {errors}}")
  }
  d
}


tibble::tibble(
  date = c("31.12.", "30.01.", "30.02.")
) |>
  dplyr::mutate(
    date2 = dplyr::case_when(
      stringr::str_ends(date, ".12.") ~ parse_dmy(stringr::str_c(date, 2019)),
      stringr::str_starts(date, "30.02") ~ lubridate::rollforward(parse_dmy(stringr::str_c("01.02.",2022))),
      .default = parse_dmy(stringr::str_c(date, 2022))
    ),
    case = dplyr::case_when(
      stringr::str_ends(date, ".12.") ~ "december case",
      stringr::str_starts(date, "30.02") ~ "february case",
      .default = "default case"
    ),
  )

here is the result:

# A tibble: 3 × 3
  date   date2      case         
  <chr>  <date>     <chr>        
1 31.12. 2019-12-31 december case
2 30.01. 2022-01-30 default case 
3 30.02. 2022-02-28 february case
Warning message:
There were 2 warnings in `dplyr::mutate()`.
The first warning was:
ℹ In argument: `date2 = dplyr::case_when(...)`.
Caused by warning:
! Failed to parse some dates: "30.02.2019"
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning.

I tried changing the order of the two cases, but it doesn't change anything. I don't understand why it tries to parse 30.02.2019 unless it execute the "then" clause without checking the "when" clause first. Could you help me change the code to avoid this warning? I need warnings to be on to catch further parsing errors.

Share Improve this question asked Apr 1 at 2:39 BenBen 255 bronze badges 1
  • 5 case_when parses all values for all logical tests, then selects the first TRUE outcome in the order specified. The warnings will appear regardless of the order. – thelatemail Commented Apr 1 at 3:36
Add a comment  | 

2 Answers 2

Reset to default 2

To prevent the warnings, you need to evaluate your conditions in sequence, not all at once, replacing the problematic values with NA first, then replacing the remaining values conditional on these replacement values being NA.

clean_data <- function(data) {
  data <- within(data, 
               date2 <- as.Date(ifelse(
                 str_starts(date, "30.02"), 
                 rollforward(parse_dmy(str_c("01.02.", 2022))), 
                 NA)))
  cond_1 <- is.na(data$date2) & str_ends(data$date, ".12.")
  cond_2 <- is.na(data$date2) & !str_ends(data$date, ".12.")
  data$date2[cond_1] <- parse_dmy(str_c(data$date[cond_1], 2019))
  data$date2[cond_2] <- parse_dmy(str_c(data$date[cond_2], 2022))
  data
}

clean_data(df)

Gives

# A tibble: 3 × 2
  date   date2     
  <chr>  <date>    
1 31.12. 2019-12-31
2 30.01. 2022-01-30
3 30.02. 2022-02-28

with no warnings.


library(lubridate)
library(stringr)

1) We can avoid most of the string manipulation by creating a data frame with numeric day (d) and month (m) columns and we can avoid dealing with invalid dates at the end of February by using the first of the month which is the first date calculation and then taking the least of (1) that plus day of the month minus 1 or (2) the start of the next month minus 1.

library(dplyr)
library(lubridate)

date <- c("31.12.", "30.01.", "30.02.")

date %>%
 read.table(text = ., sep = ".", col.names = c("d", "m", "y")) %>%
 reframe(date = ymd(paste(2002 - 3 * (m == 12), m, 1, sep = "-")),
         date = pmin(date + d, date %m+% months(1)) - 1,
         case = case_when(
           m == 12 ~ "december case", 
           d == 30 & m == 2 ~ "february case",
           .default = "default case"))

##        date          case
## 1 1999-12-31 december case
## 2 2002-01-30  default case
## 3 2002-02-28 february case

2) We can simplify that further by using the clock package instead of lubridate. The input date is from above.

library(clock)
library(dplyr)

date %>%
 read.table(text = ., sep = ".", col.names = c("d", "m", "y")) %>%
 reframe(date = date_build(2002 - 3 * (m == 12), m, d, invalid = "previous"),
         case = case_when(
           m == 12 ~ "december case", 
           d == 30 & m == 2 ~ "february case",
           .default = "default case"))

本文标签: rWarning in parsing with lubridateStack Overflow