admin管理员组文章数量:1295844
I've found ways to create groupings from variables but not how to create a single sub-group.
df <- data.frame(days = c(0,1,2,3,4,5,6,8,10), n = c(408,51,103,112,35,17,7,6,1))
What I want to do is have an output like the one below in order to create a histogram with "days" on the x-axis. I can't figure out how to create a "6+" group that also sums the corresponding n values.
days | n |
---|---|
0 | 408 |
1 | 51 |
2 | 103 |
3 | 112 |
4 | 35 |
5 | 17 |
6+ | 14 |
I've found ways to create groupings from variables but not how to create a single sub-group.
df <- data.frame(days = c(0,1,2,3,4,5,6,8,10), n = c(408,51,103,112,35,17,7,6,1))
What I want to do is have an output like the one below in order to create a histogram with "days" on the x-axis. I can't figure out how to create a "6+" group that also sums the corresponding n values.
days | n |
---|---|
0 | 408 |
1 | 51 |
2 | 103 |
3 | 112 |
4 | 35 |
5 | 17 |
6+ | 14 |
I would then use ggplot2 and geom_bar() to ideally make a histogram.
This is what I tried but it gave me an error. I'm not sure if there is an easier way to group the numbers 6 and up together?
df$days <- ifelse(df$days > 5, "6+", df$days) %>% summarise(df$n = sum(n), .groups = 'drop')
Share
asked Feb 12 at 1:14
user27294105user27294105
111 silver badge1 bronze badge
4 Answers
Reset to default 1library(dplyr) # I presume you're using its "summarise"
df |>
mutate(grp = if_else(days >= 6, "6+", paste(days))) |>
summarize(n = sum(n), .by = grp)
or shorter:
df |>
count(grp = if_else(days >= 6, "6+", paste(days)), wt = n)
Result
grp n
1 0 408
2 1 51
3 2 103
4 3 112
5 4 35
6 5 17
7 6+ 14
Either way, I'm making a group called "6+" and making the other groups into text by using paste
(so 1
becomes "1"
), then combining the n's for each grp.
In some cases you might want to make the grouping into a factor to make it sort as you want. For instance, if you wanted a group of "up to 4" you could use
df |>
count(grp = if_else(days <= 4, "up to 4", paste(days)) |>
forcats::fct_reorder(days), wt = n)
to make the "up to 4" category appear first, even though by default it would otherwise appear alphabetically last.
grp n
1 up to 4 709
2 5 17
3 6 7
4 8 6
5 10 1
In base R, you can similarly use an ifelse
statement in aggregate
. I wrapped it in setNames
to rename the columns, but thats just cosmetic:
setNames(
aggregate(n ~ ifelse(days >= 6, "6+", days), df, sum),
c("days", "n"))
Output:
days n
1 0 408
2 1 51
3 2 103
4 3 112
5 4 35
6 5 17
7 6+ 14
It's great that you provided your attempted code and it may be worthwhile to walk through why your attempt returned an error, as there may be some confusion. Here is your attempt, reposted for convenience:
df$days <- ifelse(df$days > 5, "6+", df$days) %>%
summarise(df$n = sum(n), .groups = 'drop')
There are a few different types of problems. First, the pipe operator (%>%
) takes the result of the left-hand side expression and passes it as the first argument to the function on the right statement (here, summarize
). So ifelse(df$days > 5, "6+", df$days)
returns a simple character vector ([1] "0" "1" "2" "3" "4" "5" "6+" "6+" "6+"
) which is then piped into the next step. Now summarize
is (1) expecting a data frame, not a vector and so (2) can't find n
in the input, since the only thing it is receiving is the "0", "1"..."6+" vector. Second, the use of $
notation is pipes is discouraged as you simply just refer to the name directly (i.e, n
, not df$n
).
A corrected version of your attempted code may be:
df %>%
mutate(days = ifelse(days > 5, "6+", days)) %>%
summarise(n = sum(n), .by = days)
However I would use @JonSpring's elegant solution if you wanted a dplyr
approach. Hope this helps clarify, and happy coding!
Here comes a fast base R way.
> Map(c, df[df$days < 6, ], replace(lapply(df[df$days > 5, ], sum), 1, '6+')) |>
+ as.data.frame()
days n
1 0 408
2 1 51
3 2 103
4 3 112
5 4 35
6 5 17
7 6+ 14
Or if you like data.table
.
> library(data.table)
> setDT(df)
>
> df[, days := fifelse(days > 5, '6+', as.character(days))][, .(n = sum(n)), by = days]
days n
<char> <num>
1: 0 408
2: 1 51
3: 2 103
4: 3 112
5: 4 35
6: 5 17
7: 6+ 14
You can do
local({
i = df$days > 5
rbind(df[!i, ], c('days' = '6+', 'n' = sum(df$n[i])))
})
or
with(df, {
i = days > 5
rbind(df[!i, ], c('days' = '6+', 'n' = sum(n[i])))
})
days n
1 0 408
2 1 51
3 2 103
4 3 112
5 4 35
6 5 17
7 6+ 14
where the local({ .. })
/with({ .. })
is not needed but prevents the index variable i
from being recognised in the global environment.
Even simpler but "less" efficient.
rbind(df[df$days < 6, ], c('days' = '6+', 'n' = sum(df$n[df$days > 5])))
or
with(df, rbind(df[days < 6, ], c('days' = '6+', 'n' = sum(n[days > 5]))))
本文标签: groupGrouping only specific numbers and finding the sum of a separate column using RStack Overflow
版权声明:本文标题:group - Grouping only specific numbers and finding the sum of a separate column using R - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741626656a2389121.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论