admin管理员组文章数量:1125409
I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.
myDf
name day score
joe monday 16
joe monday 16
harry wednesday 88
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88
joe tuesday 16
joe monday 16
Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;
mydf_new
name day score
joe monday 16 (a1)
joe monday 16 (a2)
harry wednesday 88 (a1)
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88 (a2)
joe tuesday 16
joe monday 16 (a3)
The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE))
but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.
I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.
myDf
name day score
joe monday 16
joe monday 16
harry wednesday 88
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88
joe tuesday 16
joe monday 16
Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;
mydf_new
name day score
joe monday 16 (a1)
joe monday 16 (a2)
harry wednesday 88 (a1)
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88 (a2)
joe tuesday 16
joe monday 16 (a3)
The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE))
but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.
- How to add sequential values to identified duplicates before first character?; Disambiguate non-unique elements in a character vector; Add a unique identifier to the same column value in R data frame – Gusbourne Commented 2 days ago
3 Answers
Reset to default 2I have created a new column (identifier
) for the output so that the original column score
is not changed.
We can group the data by name
and day
, if the number of rows in a group is more than 1 we create a sequence a1
, a2
and paste it with score
value.
library(dplyr)
mydf |>
mutate(identifier = if(n() > 1) paste0(score, " (a", row_number(), ")")
else as.character(score), .by = c(name, day))
# name day score identifier
#1 joe monday 16 16 (a1)
#2 joe monday 16 16 (a2)
#3 harry wednesday 88 88 (a1)
#4 harry thursday 55 55
#5 james tuesday 10 10
#6 will monday 10 10
#7 harry wednesday 88 88 (a2)
#8 joe tuesday 16 16
#9 joe monday 16 16 (a3)
and the same logic translated in base R :
transform(mydf, identifier = ave(score, name, day, FUN = \(x)
if(length(x) > 1) paste0(x, " (a", seq_along(x), ")") else x))
Your proposed output seems very suboptimal. I would do something like this:
library(data.table)
DT <- fread("myDf
name day score
joe monday 16
joe monday 16
harry wednesday 88
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88
joe tuesday 16
joe monday 16")
#setDT(DT) #if data is in a data.frame
DT[, attempt := seq_len(.N), by = .(name, day)]
setorder(DT, name, day, attempt)
print(DT)
# name day score attempt
# <char> <char> <int> <int>
#1: harry thursday 55 1
#2: harry wednesday 88 1
#3: harry wednesday 88 2
#4: james tuesday 10 1
#5: joe monday 16 1
#6: joe monday 16 2
#7: joe monday 16 3
#8: joe tuesday 16 1
#9: will monday 10 1
Make the rows unique by combining the group id and a consecutive numbering
library(dplyr)
myDf %>%
mutate(grp = paste0(cur_group_id(), "_", row_number()), .by = c(name, day))
output
name day score grp
1 joe monday 16 1_1
2 joe monday 16 1_2
3 harry wednesday 88 2_1
4 harry thursday 55 3_1
5 james tuesday 10 4_1
6 will monday 10 5_1
7 harry wednesday 88 2_2
8 joe tuesday 16 6_1
9 joe monday 16 1_3
本文标签: rLooping through a data frame whilst grouping rowsStack Overflow
版权声明:本文标题:r - Looping through a data frame whilst grouping rows - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736642656a1946030.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论