admin管理员组文章数量:1122846
I'm writing a function to clean some CEX data (doesn't really matter), and I cannot figure out why I am unable to use %in% to subset a data frame with a list when I am able to perform the analogous operation with == on a single item. What I am attempting to perform is like f_fails()
below. Unless I'm mistaken, I need to be able to feed a string but cannot.
Is there something distinct about %in% in items 6 and 8 below that does not apply for ==? How can I perform 6 and 8 in another way?
# Test Data
set.seed(123)
df <- data.frame(
NEWID = rep(1:10, 1, each = 10),
COST = rnorm(100, 1000, 10),
UCC = round(runif(100, 3995, 4005))
)
# All of these work except the 6th one
# 1.
df[df$UCC == 4000,]
# 2.
df[df$"UCC" == 4000,]
# 3.
df[df["UCC"] == 4000,]
# 4.
df[df$UCC %in% c(4000,4001),]
# 5.
df[df$"UCC" %in% c(4000,4001),]
# 6. The one I need does not work
df[df["UCC"] %in% c(4000,4001),]
# 7. This works fine
f_works <- function(data, filter_var, one_val){
# I can feed values with == and filter
d <- data[data[filter_var] == one_val,]
d
}
# 8. This (what I want) returns an empty data frame.
f_fails <- function(data = df, filter_var, many_vals){
# I cannot feed 2+ values with %in% and filter
d <- data[data[filter_var] %in% many_vals,]
d
}
f_works(df, "UCC", 4000)
f_fails(df, "UCC", c(4000,4001))
I'm writing a function to clean some CEX data (doesn't really matter), and I cannot figure out why I am unable to use %in% to subset a data frame with a list when I am able to perform the analogous operation with == on a single item. What I am attempting to perform is like f_fails()
below. Unless I'm mistaken, I need to be able to feed a string but cannot.
Is there something distinct about %in% in items 6 and 8 below that does not apply for ==? How can I perform 6 and 8 in another way?
# Test Data
set.seed(123)
df <- data.frame(
NEWID = rep(1:10, 1, each = 10),
COST = rnorm(100, 1000, 10),
UCC = round(runif(100, 3995, 4005))
)
# All of these work except the 6th one
# 1.
df[df$UCC == 4000,]
# 2.
df[df$"UCC" == 4000,]
# 3.
df[df["UCC"] == 4000,]
# 4.
df[df$UCC %in% c(4000,4001),]
# 5.
df[df$"UCC" %in% c(4000,4001),]
# 6. The one I need does not work
df[df["UCC"] %in% c(4000,4001),]
# 7. This works fine
f_works <- function(data, filter_var, one_val){
# I can feed values with == and filter
d <- data[data[filter_var] == one_val,]
d
}
# 8. This (what I want) returns an empty data frame.
f_fails <- function(data = df, filter_var, many_vals){
# I cannot feed 2+ values with %in% and filter
d <- data[data[filter_var] %in% many_vals,]
d
}
f_works(df, "UCC", 4000)
f_fails(df, "UCC", c(4000,4001))
Share
Improve this question
asked Nov 23, 2024 at 0:42
dcoydcoy
1297 bronze badges
2 Answers
Reset to default 2In this case, %in%
expects a vector either side and data[filter_var]
returns a dataframe on the left. You need to use [[]]
instead:
f <- function(data = df, filter_var, many_vals){
d <- data[data[[filter_var]] %in% many_vals,]
}
head(f(df, "UCC", c(4000, 4001)))
# NEWID COST UCC
# 3 1 1015.587 4001
# 4 1 1000.705 4000
# 11 2 1012.241 4000
# 27 3 1008.378 4000
# 28 3 1001.534 4001
# 31 4 1004.265 4001
If you use the class()
or str()
functions, you will see that df$UCC
is a numeric vector:
class(df$UCC)
## [1] "numeric"
At the same time
class(df["UCC"])
## [1] "data.frame"
You can compare a numeric vector with a value or use %in%
operator:
df$UCC == 4000
## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## etc.
df$UCC %in% c(4000, 4001)
## [1] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## etc.
If you will try to compare a dataframe with a value (which has the same "numeric" type), you will get a matrix as a result:
class( df["UCC"] == 4000)
## [1] "matrix" "array"
When you use %in%
operator you ask if the object on the left is equal to one of the objects in the set on the right. The data frame is not a part of a numeric vector object.
class( df["UCC"] %in% c(4000, 4001))
## [1] "logical"
If, however, instead you use a numeric vector df$UCC
, it will work since both left and right side of the %in%
operator have the same "numeric vector" class:
df$UCC %in% c(4000, 4001)
## [1] FALSE FALSE TRUE TRUE FALSE FALSE
The easiest way to implement your function, is to use the dplyr
package
library(dplyr)
d <- filter(data, get({{filter_var}}) %in% many_vals)
版权声明:本文标题:r - Why does the %in% operator not behave analogously to the == operator with data frame indexing - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736300224a1930740.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论