admin管理员组文章数量:1309980
I have a df with multiple columns that look like this:
mydata <- data.frame("Code_1" = c("x000", "x001", "x002", "y003"), "Code_2" = c("y000", "y001", "y002", "y003"), "Code_3" = c("z000", "z001", "z002", "z003"))
I need to check the number of characters in each column (I have 24 in total) and would like to do this efficiently. Ideally I'd like the output in tablular form.
I have tried the following:
fun_nchar <- function(mydata, x) { mydata[[nchar(x, type = "chars", allowNA=TRUE, keepNA=NA]] <= table(x) }
When I try to run this with the first column I keep getting an error message saying: 'object 'Code_1' not found.
Ideally, I would like a table to check that all 24 columns contain either 3 or 4 characters (this is part of data cleaning).
I have a df with multiple columns that look like this:
mydata <- data.frame("Code_1" = c("x000", "x001", "x002", "y003"), "Code_2" = c("y000", "y001", "y002", "y003"), "Code_3" = c("z000", "z001", "z002", "z003"))
I need to check the number of characters in each column (I have 24 in total) and would like to do this efficiently. Ideally I'd like the output in tablular form.
I have tried the following:
fun_nchar <- function(mydata, x) { mydata[[nchar(x, type = "chars", allowNA=TRUE, keepNA=NA]] <= table(x) }
When I try to run this with the first column I keep getting an error message saying: 'object 'Code_1' not found.
Ideally, I would like a table to check that all 24 columns contain either 3 or 4 characters (this is part of data cleaning).
Share Improve this question edited Feb 3 at 11:03 Kerrie asked Feb 3 at 10:53 KerrieKerrie 193 bronze badges 3 |3 Answers
Reset to default 2You could use a combination of all
+ nchar
:
sapply(mydata, \(x) all(nchar(na.omit(x)) %in% c(3, 4)))
# Code_1 Code_2 Code_3
# TRUE TRUE TRUE
Or even, to check if all columns have either 3 or 4 characters, yet another all
to ensure all columns meet the criterion:
all(sapply(mydata, \(x) all(nchar(na.omit(x)) %in% c(3, 4))))
#[1] TRUE
You can try colMeans
+ nchar
like below
> colMeans(`dim<-`(nchar(as.matrix(mydata)) %in% c(3, 4), dim(mydata))) == 1
[1] TRUE TRUE TRUE
Try this.
> sapply(mydata, nchar) |> {\(x) !is.na(x) & x >= 3 & x <= 4}() |> apply(2, var) == 0
Code_1 Code_2 Code_3
TRUE TRUE TRUE
Test:
> mydata[1, 1] <- 'x0'
> sapply(mydata, nchar) |> {\(x) !is.na(x) & x >= 3 & x <= 4}() |> apply(2, var) == 0
Code_1 Code_2 Code_3
FALSE TRUE TRUE
You can easily incorporate a check into the script:
> stopifnot(sapply(mydata, nchar) |> {\(x) !is.na(x) & x >= 3 & x <= 4}() |> apply(2, var) == 0)
Error: apply({ .... are not all TRUE
本文标签:
版权声明:本文标题:How to create a function to count nchar in multple columns and then have output as a table in R? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741827068a2399708.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
lapply(mydata, \(x) table(nchar(x)))
? – Maël Commented Feb 3 at 10:59*apply
function runs the same function over a set of elements (for example, columns in a data.frame), it's useful when you need to repeat a function over multiple columns.sapply
returns as
implified version of the output (e.g., a vector, if possible), whilelapply
always returns al
ist – Maël Commented Feb 3 at 11:17