admin管理员组文章数量:1391968
I have last names (left) and first names (right) separated by a comma.
Among the last names, I often (but not always) have duplicates separated by an underscore. How to remove the duplicates and, for compound names, replace the underscore with a space?
Also, first names never have duplicates but always begin with an underscore, which I would like to replace with a space. For compound first names also separated by an underscore, I would like to replace this underscore with a space.
I have thousands of lines and I'm struggling to find the solution.
Thanks for help
Input data:
> dat0
name_ko
1 BLA_BLA,_BLIM
2 CLO_CLO,_SPITCH_SPLOTCH
3 BAD_BOY,_GOOD
4 GOOD_BOY,_BAD_GIRL
Desired output:
> dat1
name_ok
1 BLA, BLIM
2 CLO, SPITCH SPLOTCH
3 BAD BOY, GOOD
4 GOOD BOY, BAD GIRL
Data:
name_ko <- c(
"BLA_BLA,_BLIM",
"CLO_CLO,_SPITCH_SPLOTCH",
"BAD_BOY,_GOOD",
"GOOD_BOY,_BAD_GIRL")
dat0 <- data.frame(name_ko)
I have last names (left) and first names (right) separated by a comma.
Among the last names, I often (but not always) have duplicates separated by an underscore. How to remove the duplicates and, for compound names, replace the underscore with a space?
Also, first names never have duplicates but always begin with an underscore, which I would like to replace with a space. For compound first names also separated by an underscore, I would like to replace this underscore with a space.
I have thousands of lines and I'm struggling to find the solution.
Thanks for help
Input data:
> dat0
name_ko
1 BLA_BLA,_BLIM
2 CLO_CLO,_SPITCH_SPLOTCH
3 BAD_BOY,_GOOD
4 GOOD_BOY,_BAD_GIRL
Desired output:
> dat1
name_ok
1 BLA, BLIM
2 CLO, SPITCH SPLOTCH
3 BAD BOY, GOOD
4 GOOD BOY, BAD GIRL
Data:
name_ko <- c(
"BLA_BLA,_BLIM",
"CLO_CLO,_SPITCH_SPLOTCH",
"BAD_BOY,_GOOD",
"GOOD_BOY,_BAD_GIRL")
dat0 <- data.frame(name_ko)
Share
Improve this question
asked Mar 13 at 22:27
denisdenis
8425 silver badges14 bronze badges
3
|
1 Answer
Reset to default 3You can try
name_ok = gsub("_"," ",gsub("(\\b\\w+)_(\\1)", "\\1",name_ko))
"BLA, BLIM"
"CLO, SPITCH SPLOTCH"
"BAD BOY, GOOD"
"GOOD BOY, BAD GIRL"
To handle triplets and more as Margusl and zephryl suggested - thank you
name_ko <- c(
"BLA_BLA,_BLIM",
"CLO_CLO,_SPITCH_SPLOTCH",
"BAD_BOY,_GOOD",
"GOOD_BOY,_BAD_GIRL",
"BAD_BAD_BAD_BOY_BOY,_GOOD",
"BAD_BOY_BAD_BOY,_GOOD"
)
name_ok = sapply(strsplit(name_ko, ","), function(x) {
last_names <- unique(unlist(strsplit(trimws(x[1]), "_")))
first_names <- gsub("_", " ",trimws(x[2]))
paste(paste(last_names, collapse = " "), first_names, sep = ", ")
})
"BLA, BLIM"
"CLO, SPITCH SPLOTCH"
"BAD BOY, GOOD"
"GOOD BOY, BAD GIRL"
"BAD BOY, GOOD"
"BAD BOY, GOOD"
本文标签: stringRemove duplicate names while replacing underscores with spaces in RStack Overflow
版权声明:本文标题:string - Remove duplicate names while replacing underscores with spaces in R - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744682042a2619470.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
BAD_BAD_BAD_BOY_BOY,_GOOD
? – margusl Commented Mar 13 at 23:00"BAD_BOY_BAD_BOY,_GOOD"
? – zephryl Commented Mar 13 at 23:03