admin管理员组

文章数量:1289363

So I understand that [^A-Za-z] would match any character that's not a letter.

Is there any way to do this with a group? For example: (?^:&) - would match any sequence of characters that is not the sequence &

NOTE: as Mark Reed pointed out, it would be pointless to match an empty string, as an empty string is a sequence of characters that is not the sequence, so I would like the regex to match as many characters as possible

FOR EXAMPLE:

in Ben & Jerry's the matches would be Ben and Jerry's (note that the whitespaces after Ben and before Jerry's are captured too.

NOTE: if possible, please do not use look behinds, because I will be using the regex in a JS script, and Javascript does not support look behinds.

So I understand that [^A-Za-z] would match any character that's not a letter.

Is there any way to do this with a group? For example: (?^:&) - would match any sequence of characters that is not the sequence &

NOTE: as Mark Reed pointed out, it would be pointless to match an empty string, as an empty string is a sequence of characters that is not the sequence, so I would like the regex to match as many characters as possible

FOR EXAMPLE:

in Ben & Jerry's the matches would be Ben and Jerry's (note that the whitespaces after Ben and before Jerry's are captured too.

NOTE: if possible, please do not use look behinds, because I will be using the regex in a JS script, and Javascript does not support look behinds.

Share Improve this question edited May 23, 2017 at 12:09 CommunityBot 11 silver badge asked Apr 20, 2016 at 20:06 user3186555user3186555 8
  • 1 @anubhava fixed it. sorry, I got behinds and aheads mixed up – user3186555 Commented Apr 20, 2016 at 20:13
  • It will be much easier to split the input by & – anubhava Commented Apr 20, 2016 at 20:17
  • unfortunately, while splitting may seem simple, in my script it will make it more plicated. My script's objective is to bullet-proof regexes where the string-to-be-matched will not contain any &, but only &s, It's a bit plicated after that point, but a split will not work sadly. – user3186555 Commented Apr 20, 2016 at 20:20
  • @anubhava I need a solution that fixes the regex, and not the script – user3186555 Commented Apr 20, 2016 at 20:22
  • 1 Negation is tricky for general regular expressions. After all, the empty string is "a sequence of characters that is not &". I think what you want is "a sequence of as many characters as possible that does not include &". – Mark Reed Commented Apr 20, 2016 at 20:33
 |  Show 3 more ments

3 Answers 3

Reset to default 4

What you need is a regex that will match alternatives, and will only capture into Group 1 the last alternative that will present a tempered greedy token (or an unrolled version for better performance - if you only have 2 or 3):

&|((?:(?!&)[\s\S])+)

See the regex demo (an unrolled version - &|([^&]*(?:&(?!amp;)[^&]*)*)

The pattern:

  • & - matches & entity
  • | - or
  • ((?:(?!&)[\s\S])+) - matches and captures into group 1 any chunk of text (1+ characters) that is not a starting point for a & sequence. Since it is for JS, you need a [\s\S] (or [^]) to match any character including a newline. Otherwise, use . instead (if you only intend to match lines).

var re = /&|((?:(?!&)[\s\S])+)/g; 
var str = 'abc Ben & Jerry\'s    foobar ssss  sss  sss &\n\n\nsssss&sssss     &\n\nsssss&sssss     &sssss\n&sssss&\n&&';
var res = [];
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {// A part of code only necessary for the 
        re.lastIndex++;            // unrolled pattern (as it can match empty string)
    }
    res.push(m[1]);                // Only collect the captured texts
}
document.body.innerHTML = "<pre>BEFORE:<br/>" + str.replace(/&/g, '&amp;') + "</pre>";
document.body.innerHTML += "<pre>AFTER:<br/>" + res.join("") + "</pre>";

Easy:

(.*?)(?:&amp;)|((?!&amp;).*)$

Demo

Explanation:

  1. (.*?): Take everything but non greedy.
  2. (?:&amp;): ?: is non-capturing group. A group that you don't want to get the value.
  3. ((?!&amp;).*)$: get the rest of the string which is not &amp;

See Randal’s Rule.

Randal's Rule

Randal Schwartz (author of Learning Perl) says:

Use capturing when you know what you want to keep.

Use split when you know what you want to throw away.

var s = "Ben &amp; Jerry's";
var a = s.split(/&amp;/);
document.body.innerHTML = "<pre>[" + a.join("][") + "]</pre>";

To show how much work (?!...) for negative look-ahead saves us, the equivalent regex to match a string that does not contain the sequence &amp; is

^([^&]|&+[^&a]|(&+a)+([^&m]|&+[^&a])|(&+a)+m((&+a)+m)*([^&p]|&+[^&a]|(&+a)+([^&m]|&+[^&a]))|(&+a)+m((&+a)+m)*p((&+a)+m((&+a)+m)*p)*([^&;]|&+[^&a]|(&+a)+([^&m]|&+[^&a])|(&+a)+m((&+a)+m)*([^&p]|&+[^&a]|(&+a)+([^&m]|&+[^&a]))))*(&+|(&+a)+(&+)?|(&+a)+m((&+a)+m)*(&+|(&+a)+(&+)?)?|(&+a)+m((&+a)+m)*p((&+a)+m((&+a)+m)*p)*(&+|(&+a)+(&+)?|(&+a)+m((&+a)+m)*(&+|(&+a)+(&+)?)?)?)?$

本文标签: javascriptnotgroup in regexStack Overflow