admin管理员组文章数量:1122832
If I have a data frame like this:
df <- cbind.data.frame(c("a", "b", "a", "b", "b"), c(1,0,0,1,0), c(0, NA, 0, 0, 1))
What should I do to return 1 for column 3 regardless of whether I've included the character column?
apply(df, 2, FUN = function(x){sum(x == 1 & !is.na(x))})
Returns 0 for column 3
apply(df[,2:3], 2, FUN = function(x){sum(x == 1 & !is.na(x))})
Returns 1 for column 3
If I have a data frame like this:
df <- cbind.data.frame(c("a", "b", "a", "b", "b"), c(1,0,0,1,0), c(0, NA, 0, 0, 1))
What should I do to return 1 for column 3 regardless of whether I've included the character column?
apply(df, 2, FUN = function(x){sum(x == 1 & !is.na(x))})
Returns 0 for column 3
apply(df[,2:3], 2, FUN = function(x){sum(x == 1 & !is.na(x))})
Returns 1 for column 3
Share Improve this question asked Nov 21, 2024 at 17:36 etulfetulf 32 bronze badges 6 | Show 1 more comment1 Answer
Reset to default 0An explanation why apply
on the whole data set gives different results compared to the subset (df
<> df[,2:3]
).
See how apply
treats the given data if it's heterogeneous (character
and numeric
)
apply(df, 2, FUN = function(x) x)
c("a", "b", "a", "b", "b") c(1, 0, 0, 1, 0) c(0, NA, 0, 0, 1)
[1,] "a" "1" " 0"
[2,] "b" "0" NA
[3,] "a" "0" " 0"
[4,] "b" "1" " 0"
[5,] "b" "0" " 1"
Since
apply(X, MARGIN, ... expects -> X: an array, including a matrix
and it includes the first character column the result gets cast to character (only data.frame
and list
can hold different data types) and the 3rd column max cell length is 2 because of the NA
, all elements get extended to length 2 by padding with space (" 1"
, which is != 1
). There is a workaround using trimws
but that's overcomplicating things. Rather
using apply
on the homogeneous subset which keeps numeric
apply(df[,2:3], 2, function(x) x)
c(1, 0, 0, 1, 0) c(0, NA, 0, 0, 1)
[1,] 1 0
[2,] 0 NA
[3,] 0 0
[4,] 1 0
[5,] 0 1
or use sapply
, since we're operating on columns anyways.
本文标签: rApply returns different values depending on which columns are includedStack Overflow
版权声明:本文标题:r - Apply returns different values depending on which columns are included - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736308365a1933635.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
sum(x == 1, na.rm = TRUE)
– Maël Commented Nov 21, 2024 at 17:46colSums(df == 1, na.rm = TRUE)
– Maël Commented Nov 21, 2024 at 17:47sum(x == 1, na.rm = TRUE)
sum(x == 1, na.rm = T)
andsum(na.omit(x) == 1)
. ButcolSums(df == 1, na.rm = TRUE)
does the trick. – etulf Commented Nov 21, 2024 at 17:50