admin管理员组

文章数量:1125409

I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.

myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16

Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;

mydf_new
name      day        score
joe      monday     16 (a1)
joe      monday     16 (a2)
harry    wednesday  88 (a1)
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88 (a2)
joe      tuesday    16
joe      monday     16 (a3)

The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE)) but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.

I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.

myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16

Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;

mydf_new
name      day        score
joe      monday     16 (a1)
joe      monday     16 (a2)
harry    wednesday  88 (a1)
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88 (a2)
joe      tuesday    16
joe      monday     16 (a3)

The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE)) but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.

Share Improve this question asked 2 days ago JoeJoe 1,1474 silver badges20 bronze badges 1
  • How to add sequential values to identified duplicates before first character?; Disambiguate non-unique elements in a character vector; Add a unique identifier to the same column value in R data frame – Gusbourne Commented 2 days ago
Add a comment  | 

3 Answers 3

Reset to default 2

I have created a new column (identifier) for the output so that the original column score is not changed.

We can group the data by name and day , if the number of rows in a group is more than 1 we create a sequence a1, a2 and paste it with score value.

library(dplyr)

mydf |>
  mutate(identifier = if(n() > 1) paste0(score, " (a", row_number(), ")") 
         else as.character(score), .by = c(name, day))

#   name       day score identifier
#1   joe    monday    16    16 (a1)
#2   joe    monday    16    16 (a2)
#3 harry wednesday    88    88 (a1)
#4 harry  thursday    55         55
#5 james   tuesday    10         10
#6  will    monday    10         10
#7 harry wednesday    88    88 (a2)
#8   joe   tuesday    16         16
#9   joe    monday    16    16 (a3)

and the same logic translated in base R :

transform(mydf, identifier = ave(score, name, day, FUN = \(x) 
                if(length(x) > 1) paste0(x, " (a", seq_along(x), ")") else x))

Your proposed output seems very suboptimal. I would do something like this:

library(data.table)
DT <- fread("myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16")

#setDT(DT) #if data is in a data.frame

DT[, attempt := seq_len(.N), by = .(name, day)]
setorder(DT, name, day, attempt)
print(DT)
#     name       day score attempt
#   <char>    <char> <int>   <int>
#1:  harry  thursday    55       1
#2:  harry wednesday    88       1
#3:  harry wednesday    88       2
#4:  james   tuesday    10       1
#5:    joe    monday    16       1
#6:    joe    monday    16       2
#7:    joe    monday    16       3
#8:    joe   tuesday    16       1
#9:   will    monday    10       1

Make the rows unique by combining the group id and a consecutive numbering

library(dplyr)

myDf %>% 
  mutate(grp = paste0(cur_group_id(), "_", row_number()), .by = c(name, day))

output

   name       day score grp
1   joe    monday    16 1_1
2   joe    monday    16 1_2
3 harry wednesday    88 2_1
4 harry  thursday    55 3_1
5 james   tuesday    10 4_1
6  will    monday    10 5_1
7 harry wednesday    88 2_2
8   joe   tuesday    16 6_1
9   joe    monday    16 1_3

本文标签: rLooping through a data frame whilst grouping rowsStack Overflow