r - Reformat data to summarize and collapse rows into simple table - Stack Overflow

IT技术

更新时间：2025-01-127

admin管理员组
文章数量:1125772

I have a dataset that looks something like this:

name	party	count	year	likes	retweet
Tom	R	1	2016	1357	23
Dave	R	1	2016	1881	34
Larry	D	1	2016	324	45
Tim	D	1	2016	5587	56
Rob	R	1	2016	9847	67
Sam	D	1	2016	4466	78
Tom	R	1	2017	###	##
Dave	R	1	2017	###	##
Larry	D	1	2017	###	##
Tim	D	1	2017	###	##
Rob	R	1	2017	###	##
Sam	D	1	2017	###	##
Tom	R	1	2018	###	##
Dave	R	1	2018	###	##
Larry	D	1	2018	###	##
Tim	D	1	2018	###	##
Rob	R	1	2018	###	##
Sam	D	1	2018	###	##

I have a dataset that looks something like this:

name	party	count	year	likes	retweet
Tom	R	1	2016	1357	23
Dave	R	1	2016	1881	34
Larry	D	1	2016	324	45
Tim	D	1	2016	5587	56
Rob	R	1	2016	9847	67
Sam	D	1	2016	4466	78
Tom	R	1	2017	###	##
Dave	R	1	2017	###	##
Larry	D	1	2017	###	##
Tim	D	1	2017	###	##
Rob	R	1	2017	###	##
Sam	D	1	2017	###	##
Tom	R	1	2018	###	##
Dave	R	1	2018	###	##
Larry	D	1	2018	###	##
Tim	D	1	2018	###	##
Rob	R	1	2018	###	##
Sam	D	1	2018	###	##

I have several columns that I want to collapse and sum, so that I can have a simpler output of a long dataset. I you'll notice that there are a bunch of "###" symbols, but that was just because I didn't think it was necessary to add additional information, as I believe what I provided is enough to solve my problem.

But I want to reformat the data, so it's easier to create a plot. Basically, I want the variables below. So, I want the first column to be the year (one row for each year in the dataset). The second column should be the sum of rows that were collapsed to create that row (I created the "count" variable in Table 1, so that I could sum the row value.) I then want four new columns, each representing the sum total of likes and retweets for Democrats and Republicans for that year. It should look something like this:

year	count	likes_republicans	likes_democrats	retweets_republicans	retweets_democrats
2016	6	13085	10377	124	179
2017	6	###	###	###	###
2018	6	###	###	###	###

I feel like there is a simple command in dplyr to do this type of problem, but I don't remember what it is and I feel like I really struggle to understand it and how it works. Any help would be greatly appreciated.

Share Improve this question asked Jan 9 at 3:19 lwe 3811 silver badge10 bronze badges

Df |> summarise(count = n(), likes_R = sum(likes[party == “R”]), .by = year) etc. – Jon Spring Commented Jan 9 at 3:44
Or df |> summarise(across(likes:retweet, sum), .by = c(year, party)) and tidy::pivot_wider both values by party if you like. – Jon Spring Commented Jan 9 at 3:57

Add a comment |

2 Answers 2

Sorted by: Reset to default 1

Sticking to data.table.

> library(data.table)
> dt <- melt(dt, c(1, 2, 4), c(3, 5:6)) |> 
+   dcast(year ~ party + variable, fun=sum)
> dt[, count := D_count + R_count]
> dt
Key: <year>
    year D_count D_likes D_retweet R_count R_likes R_retweet count
   <int>   <int>   <int>     <int>   <int>   <int>     <int> <int>
1:  2016       3   10377      6859       3   13085      4733     6
2:  2017       3    3766      5346       3    5944      7332     6
3:  2018       3    7426      2255       3   56041      6109     6

Data:

dt <- data.table::fread('
  name  party   count   year    likes   retweet
Tom R   1   2016    1357    23
Dave    R   1   2016    1881    34
Larry   D   1   2016    324 45
Tim D   1   2016    5587    56
Rob R   1   2016    9847    4676
Sam D   1   2016    4466    6758
Tom R   1   2017    3456    2543
Dave    R   1   2017    2345    3546
Larry   D   1   2017    675 1423
Tim D   1   2017    2345    3567
Rob R   1   2017    143 1243
Sam D   1   2017    746 356
Tom R   1   2018    134 1423
Dave    R   1   2018    54673   563
Larry   D   1   2018    5467    1243
Tim D   1   2018    1423    365
Rob R   1   2018    1234    4123
Sam D   1   2018    536 647
')

The following solution should work with dplyr and tidyr.

It would be worth testing this with missing data and other edge cases e.g. voters of other parties.

library(dplyr)
library(tidyr) #Allows pivot_wider

# Create a minimal reprex
data = tibble(name = c("Tom","Dave","Larry","Tim","Rob","Sam"),
              party = c("R", "R", "D", "D", "R", "D"),
              count = rep(1, 6),
              year = c(rep(2016, 3), rep(2017, 3)),
              likes = c(1357, 1881, 324, 5587, 9847, 4466),
              retweet = c(23, 34, 45, 56, 67, 78))

# Create the likes data using summarise and group_by before pivot_wider
likes_data = data %>% 
  mutate(party = case_when(
    party == "D" ~ "democrats",
    party == "R" ~ "republicans"
  )) %>% 
  group_by(year) %>% 
  mutate(count = n()) %>% 
  group_by(year, count, party) %>% 
  summarise(likes = sum(likes, na.rm = TRUE)) %>% 
  ungroup() %>%
  pivot_wider(names_from = "party",
              names_prefix = "likes_",
              values_from = c(likes))
 
  
# Create the retweets data using summarise and group_by before pivot_wider
retweets_data = data %>% 
  mutate(party = case_when(
    party == "D" ~ "democrats",
    party == "R" ~ "republicans"
  )) %>% 
  group_by(year) %>% 
  mutate(count = n()) %>% 
  group_by(year, count, party) %>% 
  summarise(retweets = sum(retweet, na.rm = TRUE)) %>% 
  ungroup() %>%
  pivot_wider(names_from = "party",
              names_prefix = "retweets_",
              values_from = c(retweets))


# Join the data
summary_data = left_join(likes_data, retweets_data)
summary_data

Joining with `by = join_by(year, count)`
# A tibble: 2 × 6
   year count likes_democrats likes_republicans retweets_democrats retweets_republicans
  <dbl> <int>           <dbl>             <dbl>              <dbl>                <dbl>
1  2016     3             324              3238                 45                   57
2  2017     3           10053              9847                134                   67

本文标签： rReformat data to summarize and collapse rows into simple tableStack Overflow

版权声明：本文标题：r - Reformat data to summarize and collapse rows into simple table - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736674845a1947124.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

r - Reformat data to summarize and collapse rows into simple table - Stack Overflow

2 Answers 2

更多相关文章