admin管理员组文章数量:1410712
To my knowledge [ab]
and (a|b)
should be equivalent in purpose when trying to match against a set of characters. Now, look at two regex:
/^(\s|\u00A0)+|(\s|\u00A0)+$/g
/^[\s\u00A0]+|[\s\u00A0]+$/g
They should both match against whitespaces at the beginning and end of a string (See section on Polyfill here for more info on the regex itself). When using square brackets all things work well, but when you switch to parenthesis even the simplest of strings causes the browser to run seemingly indefinitely. This happens on latest Chrome and Firefox.
This jsfiddle demonstrates this:
a ="a b";
// Doesn't work
// alert(a.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g,''));
// Works
alert(a.replace(/^[\s\u00A0]+|[\s\u00A0]+$/g,''));
Is this a crazy quirk with the browser's implementation of the regex engine or is there something else about the regex's algorithm which causes this?
To my knowledge [ab]
and (a|b)
should be equivalent in purpose when trying to match against a set of characters. Now, look at two regex:
/^(\s|\u00A0)+|(\s|\u00A0)+$/g
/^[\s\u00A0]+|[\s\u00A0]+$/g
They should both match against whitespaces at the beginning and end of a string (See section on Polyfill here for more info on the regex itself). When using square brackets all things work well, but when you switch to parenthesis even the simplest of strings causes the browser to run seemingly indefinitely. This happens on latest Chrome and Firefox.
This jsfiddle demonstrates this:
a ="a b";
// Doesn't work
// alert(a.replace(/^(\s|\u00A0)+|(\s|\u00A0)+$/g,''));
// Works
alert(a.replace(/^[\s\u00A0]+|[\s\u00A0]+$/g,''));
Is this a crazy quirk with the browser's implementation of the regex engine or is there something else about the regex's algorithm which causes this?
Share Improve this question edited Jul 1, 2015 at 4:39 nhahtdh 56.8k15 gold badges129 silver badges164 bronze badges asked Jun 10, 2015 at 15:18 ParhamParham 3,4926 gold badges33 silver badges46 bronze badges 11- 3 This question has been asked a lot. A quick search turned up this: stackoverflow./questions/22132450/… – FLXN Commented Jun 10, 2015 at 15:24
- 2 @FLXN but the non-class regexp shouldn't freeze Chrome/Firefox for several minutes, right? This is more than just "using | is slower than character groups". – Ahmed Fasih Commented Jun 10, 2015 at 15:30
- 1 Check out regex101./r/jI0oA2/1 vs regex101./r/aW7kA7/1 especially the debugger which shows you why the bad one takes so long. On average, each extra space in the test string adds ~20 steps in the regexp engine for the bad case. For the good case, each space adds ~5 steps. – Ahmed Fasih Commented Jun 10, 2015 at 15:52
- 1 the ( and ) create addressable groupings, [ and ] do not, so (...)+ creates many groups, while [..]+ creates one. even if no matches are found, the first expression is a lot more work to figure that out. – Les Commented Jun 10, 2015 at 16:15
- @FLXN: As Ahmed Fasih pointed out, I am wondering why this crashed the browsers and if this was a bug or something with how the regexs are expanded. That SO question does shed light on this matter together with Ahmed's links to regex101. – Parham Commented Jun 10, 2015 at 16:51
1 Answer
Reset to default 8The problem you are seeing is called catastrophic backtracking, as explained here.
First of all, let me simplify and clarify your test case:
a = Array(30).join("\u00a0") + "b"; // A string with 30 consecutive \u00a0
s = Date.now();
t = a.replace(/^(\s|\u00A0)+$/g, '');
console.log(Date.now()-s, a.length);
What's happening is with the second part of the expression: ^(\s|\u00A0)+$
. Note that \s
matches a number of whitespace characters, including \u00A0
itself. This means both \s
and \u00A0
matches each of the 30 \u00A0
characters.
Therefore if you try to match the string with /(\s|\u00A0)+/
, you will find that each of the 2^30
different binations of 30-character whitespace patterns will result in a match. When the regular expression matcher matched the first 30 characters it will try to match end of string ($
) and failed, so it backtracks and ends up trying all 2^30
binations.
Your original string (in jsfiddle, where the one in stackflow is already "normalized" to all spaces) is a \u00a0 \u00a0 ... \u00a0 b
with roughly 30 \u00a0
characters, so it took the browser roughly 2^30
effort to plete. It does not hang the browser, but will take a few minutes to plete.
本文标签: Why does this JavaScript regex crash in the browserStack Overflow
版权声明:本文标题:Why does this JavaScript regex crash in the browser? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744879004a2630090.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论