admin管理员组文章数量:1356234
The following issue is observed only on Java and not on other regex flavors (e.g PCRE).
I have the following regex: (?:(?<=([A-Za-z\d]))|\b)(MyString)
. There's a capturing group on [A-Za-z\d]
in the lookbehind.
And I'm trying to match (through Pattern.matcher(regex)
; to be precise, I'm calling replaceAll
) the following string: string.MyString
.
On PCRE, I will match MyString
, and it will be the second group in the match. On Java, however, I will match the g
in string
as group 1, and MyString
as group 2.
- Why does Java do that? To me this regex implies that a character matching
[A-Za-z\d]
should only be matched if it directly precedesMyString
, which is not the case here. - How can I avoid that and not match this
g
? I want to keep the capturing group in case I have to match a string likestringMyString
, in which case I do need thatg
as group 1.
The following issue is observed only on Java and not on other regex flavors (e.g PCRE).
I have the following regex: (?:(?<=([A-Za-z\d]))|\b)(MyString)
. There's a capturing group on [A-Za-z\d]
in the lookbehind.
And I'm trying to match (through Pattern.matcher(regex)
; to be precise, I'm calling replaceAll
) the following string: string.MyString
.
On PCRE, I will match MyString
, and it will be the second group in the match. On Java, however, I will match the g
in string
as group 1, and MyString
as group 2.
- Why does Java do that? To me this regex implies that a character matching
[A-Za-z\d]
should only be matched if it directly precedesMyString
, which is not the case here. - How can I avoid that and not match this
g
? I want to keep the capturing group in case I have to match a string likestringMyString
, in which case I do need thatg
as group 1.
- Looks like the Java regex engine does not reset Group 1 contents upon a failed match and once the match is found, the submatch is returned with the match. Looks like a bug to me, but it is probably related to regex specific Java functions. – Wiktor Stribiżew Commented Mar 28 at 11:54
1 Answer
Reset to default 5There is a line on the java.util.regex.Pattern docs
The captured input associated with a group is always the subsequence that the group most recently matched. If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails. Matching the string "aba" against the expression (a(b)?)+, for example, leaves group two set to "b". All captured input is discarded at the beginning of each match.
I think this line explains the behavior:
- If a group is evaluated a second time because of quantification then its previously-captured value, if any, will be retained if the second evaluation fails.
The last line:
- All captured input is discarded at the beginning of each match.
So if you have this string:
string.MyString.srting.MyString
And this regex:
(?:(?<=([tr]))|\b)(MyString)
You can see that the group 1 value is different in both matches as all captured input is discarded.
See an example on regex101
本文标签:
版权声明:本文标题:Confusing behavior of a capturing group in a positive lookbehind in a Java regex with Pattern.matcher - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744039563a2580410.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论