admin管理员组

文章数量:1345016

I want to build a regex that will match all unescaped $ in strings that represents a regex.

In this case, a character is unescaped if it contains an equal number of backslashes behind it (each pair of backslashes means the backslash char itself, and therefore the next character isn't escaped).

I came up with this pattern: (?<!\\)(\\{2})*\$

Explanation: although this will also match the backslashes preceding the $, this is the closest I came to a solution. This assures an equal number of backslashes before a $ which is not preceded by another backslash, making an odd number of backslashes.

My issues is that it seems that I need 2 consecutive non consuming groups to make the total number of backslashes even, but this is not possible. Is there another way to make that?

I want to build a regex that will match all unescaped $ in strings that represents a regex.

In this case, a character is unescaped if it contains an equal number of backslashes behind it (each pair of backslashes means the backslash char itself, and therefore the next character isn't escaped).

I came up with this pattern: (?<!\\)(\\{2})*\$

Explanation: although this will also match the backslashes preceding the $, this is the closest I came to a solution. This assures an equal number of backslashes before a $ which is not preceded by another backslash, making an odd number of backslashes.

My issues is that it seems that I need 2 consecutive non consuming groups to make the total number of backslashes even, but this is not possible. Is there another way to make that?

Share Improve this question edited 12 hours ago Benny Brudner asked 12 hours ago Benny BrudnerBenny Brudner 394 bronze badges 5
  • "but this is not possible": I don't quite understand what the problem is. – trincot Commented 11 hours ago
  • 1 If the issue is that you don't want to get the backslashes in the match, then use \K just before \$. Not sure what you are asking. – trincot Commented 11 hours ago
  • \K should work. notionally, you could nest the lookbehinds, but variable-length is probably not supported (?<=(?<!\\)(\\\\)*)\$ – jhnc Commented 11 hours ago
  • @CasimiretHippolyte: It deserves an answer – anubhava Commented 11 hours ago
  • Nothing to do with Java that allows variable length lookbehinds. It's a PCRE question. – Casimir et Hippolyte Commented 8 hours ago
Add a comment  | 

3 Answers 3

Reset to default 3

One way to do it with PCRE is to avoid all bytes preceded by a backslash using the (*SKIP)(*FAIL) verbs sequence and to catch the DOLLAR SIGN in an other branch:

~ \\ . (*SKIP)(*F) | \$ ~xs

demo

Don't fet that to figure correctly a literal backslash in a quoted PHP string for a regex pattern, this one has to be escaped two times (one time for the regex, since it's a special character to form an escape sequence like \w or \$, and one time for the quoted string since this same character is used to form an escape sequence too like \'), so 4 backslashes to figure a single literal backslash:

$pattern = '~ \\\\ . (*SKIP)(*F) | \$ ~xs';

demo

A nowdoc string avoids the quoted string backslashes:

$pattern = <<<'REGEX'
~ \\ . (*SKIP)(*F) | \$ ~xs
REGEX;

Make the repeating group non-capturing, and add the meta escape \K.

/(?<!\\)(?:\\{2})*\K\$/g

Here it is on Regex101.

If you could not use the \K and (*SKIP*)(*FAIL*) syntax, like in Python (and many other regex flavors), you could.

Logic: Group 1 ($1) will consume and capture the preceding characters required for a match. Group 2 ($2) will consume and capture the unescaped $, \$, preceded by 0, or another even number of, literal backslashes, \.

REGEX PATTERN:

([^\\](?:\\\\)*)(\$)

REPLACEMENT STRING:

$1

Regex Demo: https://regex101/r/qq0Bug/2

TEST STRING:

aaa$aaa\$aaa\\$aaa\\\$
\$ \\$ \\\$ \\\\$ \\\\\$

RESULT:

aaaaaa\$aaa\\aaa\\\$
\$ \\ \\\$ \\\\ \\\\\$

MATCHES AND GROUPS:

MATCH 1:  2-4   a$
GROUP 1:  2-3   a
GROUP 2:  3-4   $

MATCH 2:  11-15 a\\$
GROUP 1:  11-14 a\\
GROUP 2:  14-15 $

MATCH 3:  25-29  \\$
GROUP 1:  25-28  \\
GROUP 2:  28-29 $

MATCH 4:  34-40  \\\\$
GROUP 1:  34-39  \\\\
GROUP 2:  39-40 $

REGEX NOTES:

  • ( Begin capture group 1, referred to by $1 in the replacement string.
    • [^\\] Negated character class [^...]. Match any character that is not a literal backslash \.
    • (?:\\\\)* Non-capture group (?:...). Match two consecutive literal backslash characters \\ 0 or more (*) times.
  • ) End group 1
  • (\$) Group 2 ($2) match a literal $.

本文标签: escapingregex to match all unescaped 3939 in a regex stringStack Overflow