admin管理员组文章数量:1316408
I have a database with a large number of columns which contain multiple options from a selection. I need to pivot all of these into true or false columns so that they can be analysed in PowerBI.
Without using a function I can manually do this using the code below, but this is not feasible for the number of columns I need to adjust and the number of options within each column
# This shows all the unique values in barriers column
sort(unique(unlist(str_split(bound_wf$barriers, " AND "))))
# This creates a new df, test, with new columns for barriers_brexit and barriers_econ which have a Boolean value depending on the appearance of the phrase in the original barriers column
test <- bound_wf %>%
mutate(
barriers_brexit = grepl("Brexit", barriers),
barriers_econ = grepl("Current economic climate", barriers)
)
I have the following pseudo code to make a function for this, but I can't make it work. My guess is that r is not reading my parameters in the way I want it to.
function (column, separator = " AND ")
#get list of items
list <- sort(unique(unlist(str_split(bound_wf$column, separator))))
#create a column per item named column_item with grepl
for item in list
test <- bound_wf %>%
mutate(column_item = grepl(item, column)
Here is where I got to with the actual function.
widen <- function(column, separator){
column <- as.vector(column)
separator <- as.vector(separator)
df <- as.vector(str_c("bound_wf$", column))
#get list of items
list <- sort(unique(unlist(str_split(df, separator))))
print(list)
#create a column per item named column_item with grepl
for (item in list) {
test <- bound_wf %>%
mutate(column_item = grepl(item, column))
}
}
When I run this, list prints as bound_wf$barrier so it is just returning the df variable instead of the desired list of unique options. I think I just need help with how to get r to read parameters as the items I want them to be. I am not really familiar with functions (obviously!)
For reproducibility the dataframe would look like
df <- data.frame(id = 1:10,column = c("option one AND option three", "option three", "option one AND option two AND option three", "option one", "option two AND option three", "option one AND option four", "option two", "option one AND option two", "option three AND option four", "option one AND option two AND option four"))
Thanks to anyone who has time to comment on this.
I have a database with a large number of columns which contain multiple options from a selection. I need to pivot all of these into true or false columns so that they can be analysed in PowerBI.
Without using a function I can manually do this using the code below, but this is not feasible for the number of columns I need to adjust and the number of options within each column
# This shows all the unique values in barriers column
sort(unique(unlist(str_split(bound_wf$barriers, " AND "))))
# This creates a new df, test, with new columns for barriers_brexit and barriers_econ which have a Boolean value depending on the appearance of the phrase in the original barriers column
test <- bound_wf %>%
mutate(
barriers_brexit = grepl("Brexit", barriers),
barriers_econ = grepl("Current economic climate", barriers)
)
I have the following pseudo code to make a function for this, but I can't make it work. My guess is that r is not reading my parameters in the way I want it to.
function (column, separator = " AND ")
#get list of items
list <- sort(unique(unlist(str_split(bound_wf$column, separator))))
#create a column per item named column_item with grepl
for item in list
test <- bound_wf %>%
mutate(column_item = grepl(item, column)
Here is where I got to with the actual function.
widen <- function(column, separator){
column <- as.vector(column)
separator <- as.vector(separator)
df <- as.vector(str_c("bound_wf$", column))
#get list of items
list <- sort(unique(unlist(str_split(df, separator))))
print(list)
#create a column per item named column_item with grepl
for (item in list) {
test <- bound_wf %>%
mutate(column_item = grepl(item, column))
}
}
When I run this, list prints as bound_wf$barrier so it is just returning the df variable instead of the desired list of unique options. I think I just need help with how to get r to read parameters as the items I want them to be. I am not really familiar with functions (obviously!)
For reproducibility the dataframe would look like
df <- data.frame(id = 1:10,column = c("option one AND option three", "option three", "option one AND option two AND option three", "option one", "option two AND option three", "option one AND option four", "option two", "option one AND option two", "option three AND option four", "option one AND option two AND option four"))
Thanks to anyone who has time to comment on this.
Share Improve this question edited Jan 29 at 21:54 ThomasIsCoding 103k9 gold badges36 silver badges101 bronze badges asked Jan 29 at 20:14 LuluLulu 111 silver badge2 bronze badges3 Answers
Reset to default 3Here is a one simple version of widen()
widen <- function(df, col, sep) {
df |> mutate(v= strsplit({{col}}, sep)) |>
unnest(cols=v) |>
pivot_wider(id_cols = id, names_from=v,values_from=v, values_fn=\(x) !is.na(x), values_fill = F)
}
Usage:
widen(df, column, " AND ")
Output:
id `option one` `option three` `option two` `option four`
<int> <lgl> <lgl> <lgl> <lgl>
1 1 TRUE TRUE FALSE FALSE
2 2 FALSE TRUE FALSE FALSE
3 3 TRUE TRUE TRUE FALSE
4 4 TRUE FALSE FALSE FALSE
5 5 FALSE TRUE TRUE FALSE
6 6 TRUE FALSE FALSE TRUE
7 7 FALSE FALSE TRUE FALSE
8 8 TRUE FALSE TRUE FALSE
9 9 FALSE TRUE FALSE TRUE
10 10 TRUE FALSE TRUE TRUE
This is not quite as flexible as @margusi version above.
You can try strsplit
+ stack
+ table
with(
df,
t(table(stack(strsplit(setNames(column, id), " AND "))) > 0)
)
which gives
values
ind option four option one option three option two
1 FALSE TRUE TRUE FALSE
2 FALSE FALSE TRUE FALSE
3 FALSE TRUE TRUE TRUE
4 FALSE TRUE FALSE FALSE
5 FALSE FALSE TRUE TRUE
6 TRUE TRUE FALSE FALSE
7 FALSE FALSE FALSE TRUE
8 FALSE TRUE FALSE TRUE
9 TRUE FALSE TRUE FALSE
10 TRUE TRUE FALSE TRUE
dplyr::bind_rows()
can create a frame from a list of named vectors, even if lenght do not match; so we could just name vectors from strsplit()
/ str_split()
, collect those into a frame, recode non-NA
& NA
values to T
/ F
(creates a matrix), turn it back to a data.frame
and sort columns.
Function can be applied arcoss multiple columns with mutate(across(...))
library(dplyr, warn.conflicts = FALSE)
df <- tibble(
id = 1:10,
column = c("option one AND option three", "option three", "option one AND option two AND option three", "option one", "option two AND option three", "option one AND option four", "option two", "option one AND option two", "option three AND option four", "option one AND option two AND option four")
)
# pass all arguments to strsplit()
widen <- function(...){
strsplit(...) |>
sapply(\(x) setNames(x, make.names(x))) |>
bind_rows() |>
Negate(is.na)() |>
as.data.frame() |>
{\(x) x[,order(names(x))]}()
}
df |>
mutate(across(-id, \(x) widen(x, split = " AND ", fixed = TRUE), .unpack = TRUE))
#> # A tibble: 10 × 6
#> id column column_option.four column_option.one column_option.three column_option.two
#> <int> <chr> <lgl> <lgl> <lgl> <lgl>
#> 1 1 option one AND option three FALSE TRUE TRUE FALSE
#> 2 2 option three FALSE FALSE TRUE FALSE
#> 3 3 option one AND option two AND option three FALSE TRUE TRUE TRUE
#> 4 4 option one FALSE TRUE FALSE FALSE
#> 5 5 option two AND option three FALSE FALSE TRUE TRUE
#> 6 6 option one AND option four TRUE TRUE FALSE FALSE
#> 7 7 option two FALSE FALSE FALSE TRUE
#> 8 8 option one AND option two FALSE TRUE FALSE TRUE
#> 9 9 option three AND option four TRUE FALSE TRUE FALSE
#> 10 10 option one AND option two AND option four TRUE TRUE FALSE TRUE
As a side note,
df <- as.vector(str_c("bound_wf$", column))
would not work, what you could use in this scenario is is bound_wf[[column]]
.
本文标签:
版权声明:本文标题:r - How do I make a function to create new columns with true and false for each of multiple options contained within one column? 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741999269a2410675.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论