admin管理员组文章数量:1357236
I want to change a data frame value in my 'type' column to 'H', based on a row-wise condition where any value in a row is greater than or equal to 340.
# make data frame
df <- data.frame(
cell_1 = c(50,10,10,125,110,300,75),
cell_2 = c(0,75,10,70,35,70,85),
cell_3 = c(340,230,10,10,110,10,80),
cell_4 = c(10,75,70,70,35,10,85),
cell_5 = c(0,10,300,125,110,10,75),
type = c('A','A','J','U','S','L','F'),
uniq_Id = c(1,2,3,4,5,6,7)
)
# change 'type' to 'H' if any of the cell values in each row are greater than or equal to 340
apply(df[,1:5], 1, function(i) {
if (any(i >= 340)) {
df$type = 'H'
}
})
Output to the console suggests it's working. However, there's no change in the data frame after the apply function. I want 'type' in row 1 to be 'H'.
cell_1 cell_2 cell_3 cell_4 cell_5 type uniq_Id
1 50 0 340 10 0 A 1
2 10 75 230 75 10 A 2
3 10 10 10 70 300 J 3
4 125 70 10 70 125 U 4
5 110 35 110 35 110 S 5
6 300 70 10 10 10 L 6
7 75 85 80 85 75 F 7
I want to change a data frame value in my 'type' column to 'H', based on a row-wise condition where any value in a row is greater than or equal to 340.
# make data frame
df <- data.frame(
cell_1 = c(50,10,10,125,110,300,75),
cell_2 = c(0,75,10,70,35,70,85),
cell_3 = c(340,230,10,10,110,10,80),
cell_4 = c(10,75,70,70,35,10,85),
cell_5 = c(0,10,300,125,110,10,75),
type = c('A','A','J','U','S','L','F'),
uniq_Id = c(1,2,3,4,5,6,7)
)
# change 'type' to 'H' if any of the cell values in each row are greater than or equal to 340
apply(df[,1:5], 1, function(i) {
if (any(i >= 340)) {
df$type = 'H'
}
})
Output to the console suggests it's working. However, there's no change in the data frame after the apply function. I want 'type' in row 1 to be 'H'.
cell_1 cell_2 cell_3 cell_4 cell_5 type uniq_Id
1 50 0 340 10 0 A 1
2 10 75 230 75 10 A 2
3 10 10 10 70 300 J 3
4 125 70 10 70 125 U 4
5 110 35 110 35 110 S 5
6 300 70 10 10 10 L 6
7 75 85 80 85 75 F 7
Share
Improve this question
edited Mar 29 at 10:08
ThomasIsCoding
104k9 gold badges37 silver badges103 bronze badges
asked Mar 27 at 18:40
Ray JRay J
2152 silver badges3 bronze badges
3
|
3 Answers
Reset to default 5Try any of these. They are non-destructive, i.e. they preserve the input df
.
transform(df, type = replace(type, apply(df[1:5] >= 340, 1, any), "H"))
transform(df, type = replace(type, apply(df[1:5], 1, max) >= 340, "H"))
transform(df, type = replace(type, do.call("pmax", df[1:5]) >= 340, "H"))
transform(df, type = replace(type, Reduce(pmax, df[1:5]) >= 340, "H"))
library(dplyr)
df %>%
mutate(type = replace(type, any(pick(starts_with("cell")) >= 340), "H"), .by = uniq_Id)
Hard-coding column selection is almost always bad practice; instead we can identify such columns with a pattern: 'cell'
followed by _
and a digit.
We create a Boolean matrix to check for elements which are strictly greater than 340
(threshold). Afterwards we apply rowSums()
: if there is at least one value per row above the threshold, the row-wise sum is strictly greater than 0
since TRUE
coerces to 1
and FALSE
to 0
.
We end up with a Boolean vector which length equals the number of rows in df
. This allows us to overwrite type
with H
where the condition is met.
df$type[rowSums(df[grep('cell_\\d{1}', names(df))] >= 340) > 0] = 'H'
> df
cell_1 cell_2 cell_3 cell_4 cell_5 type uniq_Id
1 50 0 340 10 0 H 1
2 10 75 230 75 10 A 2
3 10 10 10 70 300 J 3
4 125 70 10 70 125 U 4
5 110 35 110 35 110 S 5
6 300 70 10 10 10 L 6
7 75 85 80 85 75 F 7
Impressively slow on 1e6:
df$type[Rfast::rowAny(df[grep('cell_\\d{1}', names(df))] >= 340)] = 'H'
Using matrixStats::rowAnys()
cols <- grep('^cell_\\d+$', names(df))
replace(df$type, matrixStats::rowAnys(df[cols] >= 340), "H")
df$type[matrixStats::rowAnys(df[cols] >= 340)] <- "H"
Note, that matrixStats::rowAnys
as well as base::any
will ignore NA
s in the data, which might be cool or not.
Benchmark 1e6 rows
df <- df[sample.int(nrow(df), 1e6, replace=TRUE), ]
options(width=200)
microbenchmark::microbenchmark(
GG1=replace(df$type, apply(df[cols] >= 340, 1, any), "H"),
GG2=replace(df$type, apply(df[cols], 1, max) >= 340, "H"),
GG3=replace(df$type, do.call("pmax", df[cols]) >= 340, "H"),
GG4=replace(df$type, Reduce(pmax, df[cols]) >= 340, "H"),
FRI={df$type[rowMeans(df[cols] >= 340) > 0] = 'H'},
FR2=replace(df$type, rowSums(df[cols] >= 340) > 0, "H"),
JAY={df$type[matrixStats::rowAnys(df[cols] >= 340)] <- "H"},
JY2=replace(df$type, matrixStats::rowAnys(df[cols] >= 340), "H"),
times=10L
)
$ Rscript --vanilla foo.R
Unit: milliseconds
expr min lq mean median uq max neval cld
GG1 733.31394 787.81932 891.00432 886.00872 962.07881 1152.0256 100 a
GG2 923.65633 988.94131 1077.83096 1066.62161 1143.83696 1569.8646 100 b
GG3 12.38325 12.55271 21.71313 12.67726 14.81317 155.7037 100 c
GG4 13.49660 13.75258 32.01618 25.36118 35.68346 150.1286 100 cd
FRI 22.15796 37.56721 59.07614 45.35287 59.47257 223.6276 100 e
FR2 21.38604 23.57986 45.93177 40.56488 47.35564 252.3459 100 c e
JAY 15.19671 34.01841 51.37185 38.70390 54.58358 265.5255 100 de
JY2 16.16235 18.64446 41.94715 35.59563 41.59243 153.6724 100 c e
本文标签: rCan I change a value in a data frame using applywith a rowwise conditionStack Overflow
版权声明:本文标题:r - Can I change a value in a data frame using apply, with a row-wise condition? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744072898a2586212.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
apply()
. Instead, trydf$type[apply(df[,1:5], 1, function(i) any(i >= 340))] <- 'H'
ordf$type <- ifelse(apply(df[, 1:5], 1, function(i) any(i >= 340)), "H", df$type)
. – zephryl Commented Mar 27 at 18:59