r - How to mark rows with NAs which were replaced by na.approx()? - Stack Overflow

IT技术

更新时间：2025-01-135

admin管理员组
文章数量:1129621

I have a dataset with multiple columns and more than 60k rows. I am using na.approx() to replace NAs in each column. I would like to identify the rows which had NA and were replaced, something like "Dataset - After removing NAs". I would appreciate if you could please help me with that.

Dataset - Before removing NAs
A    B    C
1.0  2.2  3.3
NA   5.7  8.8
6.2  7.2  NA
3.0  NA   1.1

Dataset - After removing NAs
A    A_NA   B     B_NA    C    C_NA
1.0  No     2.2   No      3.3  No
3.6  Yes    5.7   No      8.8  No
6.2  No     7.2   No      4.6  Yes
3.0  No     7.2   Yes     1.1  No

Sorry, I cannot share the original dataset, but if it is necessary I can prepare a better example.

Thanks

UPDATE

Sorry, I forgot to mention that in my original dataset, I have some columns in my dataset which are not numeric, also my dataset is a data.table.

Dataset - Before removing NAs
ID        Trial     Weight    Height    Depth     
1         Wheat     1.0       2.2       3.3
1         Wheat     NA        5.7       8.8
2         Maize     6.2       7.2       NA
4         Maize     3.0       NA        1.1

I have a dataset with multiple columns and more than 60k rows. I am using na.approx() to replace NAs in each column. I would like to identify the rows which had NA and were replaced, something like "Dataset - After removing NAs". I would appreciate if you could please help me with that.

Dataset - Before removing NAs
A    B    C
1.0  2.2  3.3
NA   5.7  8.8
6.2  7.2  NA
3.0  NA   1.1

Dataset - After removing NAs
A    A_NA   B     B_NA    C    C_NA
1.0  No     2.2   No      3.3  No
3.6  Yes    5.7   No      8.8  No
6.2  No     7.2   No      4.6  Yes
3.0  No     7.2   Yes     1.1  No

Sorry, I cannot share the original dataset, but if it is necessary I can prepare a better example.

Thanks

UPDATE

Sorry, I forgot to mention that in my original dataset, I have some columns in my dataset which are not numeric, also my dataset is a data.table.

Dataset - Before removing NAs
ID        Trial     Weight    Height    Depth     
1         Wheat     1.0       2.2       3.3
1         Wheat     NA        5.7       8.8
2         Maize     6.2       7.2       NA
4         Maize     3.0       NA        1.1

Share Improve this question edited Jan 9 at 1:21 asked Jan 8 at 12:17 Mee 1417 bronze badges

is.na(df) ? It will return a data.frame where cells with NA values will be TRUE, and the others FALSE. – Billy34 Commented Jan 8 at 12:48

Add a comment |

6 Answers 6

Sorted by: Reset to default 4

You could do this in base R by indexing:

#new columns if it is NA or not
df[paste0(names(df),"_NA")] <- is.na(df)

# use zoo::na.approx
df[-grep("_NA", names(df))] <- zoo::na.approx(df[-grep("_NA", names(df))])

# reorder (if desired)
df <- df[,sort(names(df))]

Output:

    A  A_NA   B  B_NA    C  C_NA
1 1.0 FALSE 2.2 FALSE 3.30 FALSE
2 3.6  TRUE 5.7 FALSE 8.80 FALSE
3 6.2 FALSE 7.2 FALSE 4.95  TRUE
4 3.0 FALSE  NA  TRUE 1.10 FALSE

If you wanted "Yes" or "No" instead of T/F, you could just replace is.na(df) with ifelse(is.na(df), "Yes", "No")

Data:

df <- read.table(text = "A    B    C
1.0  2.2  3.3
NA   5.7  8.8
6.2  7.2  NA
3.0  NA   1.1", header = TRUE)

EDIT Using data table, you can tweak the code by:

library(data.table)

# Identify numeric columns
idxcols <- sapply(dt, is.numeric)

dt[, paste0(names(idxcols)[idxcols], "_NA") := lapply(.SD, is.na), .SDcols = idxcols]

Data table data:

dt <- data.table::data.table(read.table(text = "ID        Trial     Weight    Height    Depth     
1         Wheat     1.0       2.2       3.3
1         Wheat     NA        5.7       8.8
2         Maize     6.2       7.2       NA
4         Maize     3.0       NA        1.1", header = TRUE))

Probably you should specify rule=2 when applying na.approx

cbind(
  zoo::na.approx(df, rule = 2),
  as.data.frame(
    ifelse(is.na(df), "Yes", "No"),
    col.names = paste0(names(df), "_NA")
  )
)[order(rep(seq_along(df), 2))]

which gives

    A A_NA   B B_NA    C C_NA
1 1.0   No 2.2   No 3.30   No
2 3.6  Yes 5.7   No 8.80   No
3 6.2   No 7.2   No 4.95  Yes
4 3.0   No 7.2  Yes 1.10   No

data

> dput(df)
structure(list(A = c(1, NA, 6.2, 3), B = c(2.2, 5.7, 7.2, NA),
    C = c(3.3, 8.8, NA, 1.1)), class = "data.frame", row.names = c(NA, 
-4L))

Write a function to take care of getting the NA's before interpolation and of putting the two tables together.
The code comments should explain how the function does it.

Dataset <- read.table(text = "
A    B    C
1.0  2.2  3.3
NA   5.7  8.8
6.2  7.2  NA
3.0  NA   1.1
", header = TRUE)


fun <- function(x) {
  # auxiliary variable
  cn <- colnames(x)
  # get the NA positions as a logical matrix
  na <- sapply(x, is.na)
  na_char <- matrix("", nrow = nrow(na), ncol = ncol(na))
  # assign the values "No" and "Yes"
  na_char[na] <- "Yes"
  na_char[!na] <- "No"
  # interpolate
  y <- sapply(x, zoo::na.approx, na.rm = FALSE) |> as.data.frame()
  # put the old and new columns in wanted order
  colnames(na_char) <- paste(colnames(na), "NA", sep = "_")
  i <- order(c(cn, colnames(na_char)))
  new_names <- c(cn, colnames(na_char))[i]
  # assign the new names and exit
  setNames(cbind(y, na_char)[, i], new_names)
}

fun(Dataset)
#>     A A_NA   B B_NA    C C_NA
#> 1 1.0   No 2.2   No 3.30   No
#> 2 3.6  Yes 5.7   No 8.80   No
#> 3 6.2   No 7.2   No 4.95  Yes
#> 4 3.0   No  NA  Yes 1.10   No

^{Created on 2025-01-08 with reprex v2.1.1}

Alternative answer using only base R:

You can just create the *_NA columns before running na.approx. You can e.g. loop over all columns with is.na and assign the result as a new column. The ordering of the columns in the way you show in your example could be done by sorting the columns alphabetically afterwards.


df <- data.frame(
  A = c(1.0, NA, 6.2, 3.0),
  B = c(2.2, 5.7, 7.2, NA),
  C = c(3.3, 8.8, NA, 1.1)
)
df_isna = lapply(df, is.na)
df_isna = lapply(df_isna, factor, labels = c("No","Yes"))
names(df_isna) = paste0(names(df_isna),"_NA")
df_new = as.data.frame(zoo::na.approx(df, rule = 2))
df_new = cbind(df_new, df_isna)
df_new = df_new[,sort(names(df_new))]

I'll add a dplyr flavor to already existing answers.

library(dplyr)

df %>%
  transmute(across(everything(), is.na, .names = "{col}_NA")) %>%
  bind_cols(zoo::na.approx(df, rule = 2)) %>%
  select(order(names(.)))

#    A  A_NA   B  B_NA    C  C_NA
#1 1.0 FALSE 2.2 FALSE 3.30 FALSE
#2 3.6  TRUE 5.7 FALSE 8.80 FALSE
#3 6.2 FALSE 7.2 FALSE 4.95  TRUE
#4 3.0 FALSE 7.2  TRUE 1.10 FALSE

1) Use na.approx and then bind it to the Yes/No data frame. Note that the pipe to setNames is within the second argument to cbind whereas the final pipe is taking the result of cbind.

library(zoo)

cbind(na.approx(DF), as.data.frame(ifelse(is.na(DF), "Yes", "No")) |>
             setNames(paste0(names(DF), "_NA")) ) |>
  ( \(x) x[order(names(x))] )()
##     A A_NA   B B_NA    C C_NA
## 1 1.0   No 2.2   No 3.30   No
## 2 3.6  Yes 5.7   No 8.80   No
## 3 6.2   No 7.2   No 4.95  Yes
## 4 3.0   No  NA  Yes 1.10   No

2) Translating that to tidyverse we have the following. This has the advantage that it goes strictly left to right and only explicitly references DF once at the beginning so that if it changes to some other name it only has to be replaced in one spot.

library(dplyr, exclude = c("filter", "lag"))
library(rlang)
library(zoo)

DF %>%
  { bind_cols(na.approx(.), data.frame(ifelse(is.na(.), "Yes", "No")) %>%
       set_names(~ paste0(.x, "_NA")))

  } %>%
  select(sort(names(.)))

##     A A_NA   B B_NA    C C_NA
## 1 1.0   No 2.2   No 3.30   No
## 2 3.6  Yes 5.7   No 8.80   No
## 3 6.2   No 7.2   No 4.95  Yes
## 4 3.0   No  NA  Yes 1.10   No

Note

DF <- data.frame(
  A = c(1, NA, 6.2, 3),
  B = c(2.2, 5.7, 7.2, NA),
  C = c(3.3, 8.8, NA, 1.1)
)

本文标签： rHow to mark rows with NAs which were replaced by naapprox()Stack Overflow

版权声明：本文标题：r - How to mark rows with NAs which were replaced by na.approx()? - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736730295a1949953.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

r - How to mark rows with NAs which were replaced by na.approx()? - Stack Overflow

6 Answers 6

data

Note

更多相关文章

r - How to mark rows with NAs which were replaced by na.approx()? - Stack Overflow

发表评论

推荐文章

sql - Mysql multi-column indexes approach when table used twice in one query - Stack Overflow

Repeat a string in JavaScript a number of times - Stack Overflow

javascript - Difference between throttling and debouncing a function - Stack Overflow

wpf - PyWinAuto.drag_mouse_input don&#39;t actually drag or drop - Stack Overflow

javascript - How to mock an exported const in jest - Stack Overflow

热门文章

Get the current URL with JavaScript? - Stack Overflow

javascript - How do I completely uninstall Node.js, and reinstall from beginning (Mac OS X) - Stack Overflow

javascript - In Node.js, how do I &quot;include&quot; functions from my other files? - Stack Overflow

excel - Preventing ClearContents on Combination of Merged and Unmerged Cells - Stack Overflow

c - Unexpected output from right rotate function - Stack Overflow

c# - How to check if a list contains an object of the same type as the caller - Stack Overflow

nginx reverse proxy redirecing to http instead of https - Stack Overflow

c - How do I differentiate between X11 and Wayland at compile time? - Stack Overflow

javascript - How to make exe files from a node.js app? - Stack Overflow

javascript - Tailwindcss not working with next.js; what is wrong with the configuration? - Stack Overflow

最新文章

Java入门级教学（IDEA的下载与安装与JDK的环境配置）

华硕笔记本电脑用U盘重装windows系统

物理网卡MAC修改器v3.0 - 真实网卡硬件MAC地址修改，重装系统不变！

如何一键安装win7系统(一键安装win7系统步骤)

Windows 11最稳定版本详解

javascript - With jQuery, how do I capitalize the first letter of a text field while the user is still editing that field? - Sta

Why is the admin bar appearing (on certain pages) for non logged-in users?

azure devops - How do I stop my build pipeline from running when I commit other branches than master? - Stack Overflow

javascript - Restart animation in CSS3: any better way than removing the element? - Stack Overflow

javascript - How to detect string which contains only spaces? - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价

wpf - PyWinAuto.drag_mouse_input don't actually drag or drop - Stack Overflow

javascript - In Node.js, how do I "include" functions from my other files? - Stack Overflow