admin管理员组文章数量:1355602
I have a pretty basic problem of incorrectly nested parentheses, but I am having a lot of difficulty identifying which parentheses are incorrectly nested. I am working with administrative data with misspellings and shortened words. I would like to change all misspellings and shortened versions of COMPANY to the complete, correctly spelled word. I have created some example data to show the problem that I am experiencing.
data <- tibble(respondents.long = c("COPMANY", "COMPANY", "CO", "COMP", "CO ", "COMP ", "COMPNY", "CO#"))
The code below should result in shortened or misspelled versions of COMPANY being changed to the complete, non-misspelled version. A major goal for the code that I am generating is that it is as easily replicable as possible and easy to add on to. So, I have added comments to the subexpressions of the regex used indicated by (?#).
data %>%
mutate(
respondents.long =
# if shortened or misspelled versions of COMPANY are found or if CO is immediately followed by a # sign
if_else(str_detect(respondents.long, regex("(?: ) (?# non-capture group; matches empty space before CO)
CO (?# matches literal CO)
(?!MPANY) (?# negative lookahead; indicates that MPANY does not follow CO)
(?:[MPANY]+(?:(?: )|$) (?# non-capture group; first choice; CO plus any combination of letters between brackets when the entire string is followed by a space or end of line)
| (?# OR operator; choices)
(?: ) (?# non-capture; second choice CO plus empty space)
| (?# OR operator; choices)
(?=#) (?# positive lookahead; third choice # immediately following CO)
| (?# OR operator; choices)
$) (?# fourth choice; CO at the end of line)",
# include "comments = T" to comment (?#) on regex sub-expressions
comments = T)),
# replace those string
str_replace_all(respondents.long,
# string to be detected
"(?: )CO(?!MPANY)(?:[MPANY]+(?:(?: )|$)|(?: )|(?=#)|$)",
# replacement
" COMPANY "),
# else leave as is
respondents.long))
The base code has worked for other words, so I'm certain that I'm overlooking something. I also tested the regex, and it works on regex101
本文标签: rIdentifying incorrectly nested parentheses in regexStack Overflow
版权声明:本文标题:r - Identifying incorrectly nested parentheses in regex - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744024739a2577842.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论