admin管理员组

文章数量:1122832

If I have a data frame like this:

df <- cbind.data.frame(c("a", "b", "a", "b", "b"), c(1,0,0,1,0), c(0, NA, 0, 0, 1))

What should I do to return 1 for column 3 regardless of whether I've included the character column?

apply(df, 2, FUN = function(x){sum(x == 1 & !is.na(x))})

Returns 0 for column 3

apply(df[,2:3], 2, FUN = function(x){sum(x == 1 & !is.na(x))})

Returns 1 for column 3

If I have a data frame like this:

df <- cbind.data.frame(c("a", "b", "a", "b", "b"), c(1,0,0,1,0), c(0, NA, 0, 0, 1))

What should I do to return 1 for column 3 regardless of whether I've included the character column?

apply(df, 2, FUN = function(x){sum(x == 1 & !is.na(x))})

Returns 0 for column 3

apply(df[,2:3], 2, FUN = function(x){sum(x == 1 & !is.na(x))})

Returns 1 for column 3

Share Improve this question asked Nov 21, 2024 at 17:36 etulfetulf 32 bronze badges 6
  • use sum(x == 1, na.rm = TRUE) – Maël Commented Nov 21, 2024 at 17:46
  • 1 Also, note your code can be simplified to colSums(df == 1, na.rm = TRUE) – Maël Commented Nov 21, 2024 at 17:47
  • ...which also improves the code. – Friede Commented Nov 21, 2024 at 17:48
  • Thanks! I actually get the same thing as before when I use sum(x == 1, na.rm = TRUE) sum(x == 1, na.rm = T) and sum(na.omit(x) == 1). But colSums(df == 1, na.rm = TRUE) does the trick. – etulf Commented Nov 21, 2024 at 17:50
  • 1 No it does not. It throws an error. – Friede Commented Nov 21, 2024 at 18:11
 |  Show 1 more comment

1 Answer 1

Reset to default 0

An explanation why apply on the whole data set gives different results compared to the subset (df <> df[,2:3]).

See how apply treats the given data if it's heterogeneous (character and numeric)

apply(df, 2, FUN = function(x) x)
     c("a", "b", "a", "b", "b") c(1, 0, 0, 1, 0) c(0, NA, 0, 0, 1)
[1,] "a"                        "1"              " 0"
[2,] "b"                        "0"              NA
[3,] "a"                        "0"              " 0"
[4,] "b"                        "1"              " 0"
[5,] "b"                        "0"              " 1"

Since

apply(X, MARGIN, ... expects -> X: an array, including a matrix

and it includes the first character column the result gets cast to character (only data.frame and list can hold different data types) and the 3rd column max cell length is 2 because of the NA, all elements get extended to length 2 by padding with space (" 1", which is != 1). There is a workaround using trimws but that's overcomplicating things. Rather

using apply on the homogeneous subset which keeps numeric

apply(df[,2:3], 2, function(x) x)
     c(1, 0, 0, 1, 0) c(0, NA, 0, 0, 1)
[1,]                1                 0
[2,]                0                NA
[3,]                0                 0
[4,]                1                 0
[5,]                0                 1

or use sapply, since we're operating on columns anyways.

本文标签: rApply returns different values depending on which columns are includedStack Overflow