r - Looping through a data frame whilst grouping rows - Stack Overflow

IT技术

更新时间：2025-01-127

admin管理员组
文章数量:1125409

I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.

myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16

Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;

mydf_new
name      day        score
joe      monday     16 (a1)
joe      monday     16 (a2)
harry    wednesday  88 (a1)
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88 (a2)
joe      tuesday    16
joe      monday     16 (a3)

The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE)) but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.

I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.

myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16

Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;

mydf_new
name      day        score
joe      monday     16 (a1)
joe      monday     16 (a2)
harry    wednesday  88 (a1)
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88 (a2)
joe      tuesday    16
joe      monday     16 (a3)

The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE)) but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.

Share Improve this question asked 2 days ago Joe 1,1474 silver badges20 bronze badges

How to add sequential values to identified duplicates before first character?; Disambiguate non-unique elements in a character vector; Add a unique identifier to the same column value in R data frame – Gusbourne Commented 2 days ago

Add a comment |

3 Answers 3

Sorted by: Reset to default 2

I have created a new column (identifier) for the output so that the original column score is not changed.

We can group the data by name and day , if the number of rows in a group is more than 1 we create a sequence a1, a2 and paste it with score value.

library(dplyr)

mydf |>
  mutate(identifier = if(n() > 1) paste0(score, " (a", row_number(), ")") 
         else as.character(score), .by = c(name, day))

#   name       day score identifier
#1   joe    monday    16    16 (a1)
#2   joe    monday    16    16 (a2)
#3 harry wednesday    88    88 (a1)
#4 harry  thursday    55         55
#5 james   tuesday    10         10
#6  will    monday    10         10
#7 harry wednesday    88    88 (a2)
#8   joe   tuesday    16         16
#9   joe    monday    16    16 (a3)

and the same logic translated in base R :

transform(mydf, identifier = ave(score, name, day, FUN = \(x) 
                if(length(x) > 1) paste0(x, " (a", seq_along(x), ")") else x))

Your proposed output seems very suboptimal. I would do something like this:

library(data.table)
DT <- fread("myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16")

#setDT(DT) #if data is in a data.frame

DT[, attempt := seq_len(.N), by = .(name, day)]
setorder(DT, name, day, attempt)
print(DT)
#     name       day score attempt
#   <char>    <char> <int>   <int>
#1:  harry  thursday    55       1
#2:  harry wednesday    88       1
#3:  harry wednesday    88       2
#4:  james   tuesday    10       1
#5:    joe    monday    16       1
#6:    joe    monday    16       2
#7:    joe    monday    16       3
#8:    joe   tuesday    16       1
#9:   will    monday    10       1

Make the rows unique by combining the group id and a consecutive numbering

library(dplyr)

myDf %>% 
  mutate(grp = paste0(cur_group_id(), "_", row_number()), .by = c(name, day))

output

   name       day score grp
1   joe    monday    16 1_1
2   joe    monday    16 1_2
3 harry wednesday    88 2_1
4 harry  thursday    55 3_1
5 james   tuesday    10 4_1
6  will    monday    10 5_1
7 harry wednesday    88 2_2
8   joe   tuesday    16 6_1
9   joe    monday    16 1_3

本文标签： rLooping through a data frame whilst grouping rowsStack Overflow

版权声明：本文标题：r - Looping through a data frame whilst grouping rows - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736642656a1946030.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

r - Looping through a data frame whilst grouping rows - Stack Overflow

3 Answers 3

更多相关文章