admin管理员组

文章数量:1410712

To my knowledge [ab] and (a|b) should be equivalent in purpose when trying to match against a set of characters. Now, look at two regex:

/^(\s|\u00A0)+|(\s|\u00A0)+$/g
/^[\s\u00A0]+|[\s\u00A0]+$/g

They should both match against whitespaces at the beginning and end of a string (See section on Polyfill here for more info on the regex itself). When using square brackets all things work well, but when you switch to parenthesis even the simplest of strings causes the browser to run seemingly indefinitely. This happens on latest Chrome and Firefox.

This jsfiddle demonstrates this:

a ="a                                                               b";

// Doesn't work
// alert(a.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g,''));
// Works
alert(a.replace(/^[\s\u00A0]+|[\s\u00A0]+$/g,''));

Is this a crazy quirk with the browser's implementation of the regex engine or is there something else about the regex's algorithm which causes this?

To my knowledge [ab] and (a|b) should be equivalent in purpose when trying to match against a set of characters. Now, look at two regex:

/^(\s|\u00A0)+|(\s|\u00A0)+$/g
/^[\s\u00A0]+|[\s\u00A0]+$/g

They should both match against whitespaces at the beginning and end of a string (See section on Polyfill here for more info on the regex itself). When using square brackets all things work well, but when you switch to parenthesis even the simplest of strings causes the browser to run seemingly indefinitely. This happens on latest Chrome and Firefox.

This jsfiddle demonstrates this:

a ="a                                                               b";

// Doesn't work
// alert(a.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g,''));
// Works
alert(a.replace(/^[\s\u00A0]+|[\s\u00A0]+$/g,''));

Is this a crazy quirk with the browser's implementation of the regex engine or is there something else about the regex's algorithm which causes this?

Share Improve this question edited Jul 1, 2015 at 4:39 nhahtdh 56.8k15 gold badges129 silver badges164 bronze badges asked Jun 10, 2015 at 15:18 ParhamParham 3,4926 gold badges33 silver badges46 bronze badges 11
  • 3 This question has been asked a lot. A quick search turned up this: stackoverflow./questions/22132450/… – FLXN Commented Jun 10, 2015 at 15:24
  • 2 @FLXN but the non-class regexp shouldn't freeze Chrome/Firefox for several minutes, right? This is more than just "using | is slower than character groups". – Ahmed Fasih Commented Jun 10, 2015 at 15:30
  • 1 Check out regex101./r/jI0oA2/1 vs regex101./r/aW7kA7/1 especially the debugger which shows you why the bad one takes so long. On average, each extra space in the test string adds ~20 steps in the regexp engine for the bad case. For the good case, each space adds ~5 steps. – Ahmed Fasih Commented Jun 10, 2015 at 15:52
  • 1 the ( and ) create addressable groupings, [ and ] do not, so (...)+ creates many groups, while [..]+ creates one. even if no matches are found, the first expression is a lot more work to figure that out. – Les Commented Jun 10, 2015 at 16:15
  • @FLXN: As Ahmed Fasih pointed out, I am wondering why this crashed the browsers and if this was a bug or something with how the regexs are expanded. That SO question does shed light on this matter together with Ahmed's links to regex101. – Parham Commented Jun 10, 2015 at 16:51
 |  Show 6 more ments

1 Answer 1

Reset to default 8

The problem you are seeing is called catastrophic backtracking, as explained here.

First of all, let me simplify and clarify your test case:

a = Array(30).join("\u00a0") + "b";  // A string with 30 consecutive \u00a0
s = Date.now();
t = a.replace(/^(\s|\u00A0)+$/g, '');
console.log(Date.now()-s, a.length);

What's happening is with the second part of the expression: ^(\s|\u00A0)+$. Note that \s matches a number of whitespace characters, including \u00A0 itself. This means both \s and \u00A0 matches each of the 30 \u00A0 characters.

Therefore if you try to match the string with /(\s|\u00A0)+/, you will find that each of the 2^30 different binations of 30-character whitespace patterns will result in a match. When the regular expression matcher matched the first 30 characters it will try to match end of string ($) and failed, so it backtracks and ends up trying all 2^30 binations.

Your original string (in jsfiddle, where the one in stackflow is already "normalized" to all spaces) is a \u00a0 \u00a0 ... \u00a0 b with roughly 30 \u00a0 characters, so it took the browser roughly 2^30 effort to plete. It does not hang the browser, but will take a few minutes to plete.

本文标签: Why does this JavaScript regex crash in the browserStack Overflow