admin管理员组文章数量:1195311
Im using this regex to get the contents of a tag in a file.
var regex = new RegExp("<tag:main>((?:.|\\s)*)</tag:main>");
This causes the v8 engine to hang indefinitely.
Now, if I use new RegExp("<tag:main>([\s\S]*)</tag:main>")
, all is good.
Anyone have an idea why the first one takes too long?
Im using this regex to get the contents of a tag in a file.
var regex = new RegExp("<tag:main>((?:.|\\s)*)</tag:main>");
This causes the v8 engine to hang indefinitely.
Now, if I use new RegExp("<tag:main>([\s\S]*)</tag:main>")
, all is good.
Anyone have an idea why the first one takes too long?
Share Improve this question asked Mar 9, 2010 at 9:25 Heinrich Lee YuHeinrich Lee Yu 1,47215 silver badges31 bronze badges 3 |3 Answers
Reset to default 21This catastrophically backtracks on long sequences of spaces that occur after the last closing </tag:main>
tag. Consider the case where the subject string ends with 100 spaces. First it matches them all with the .
on the left of the alternation. That fails because there's no closing tag, so it tries matching the last character with the \s
instead. That fails too, so it tries matching the second-to-last space as a \s
and the last space as a .
. That fails (still no closing tag) so it tries the last space as a \s
. When that fails it matches the third-to-last space as a \s
and tries all 4 ways to match the last two spaces. When that fails it tries the fourth-to-last space as a \s
and all 8 ways on the last 3 spaces. Then 16, 32 etc. The universe ends before it gets to the 100th-to-last space.
Different VMs have different reactions to regexp matches that take forever because of catastrophic backtracking. Some will simply report 'no match'. In V8 it's like writing any other infinite or near-infinite loop.
Using non-greedy *
will do what you want (you want to stop at the first </tag:main>
, not the last), but will still do catastrophic backtracking for long strings of spaces where the closing sequence is missing.
Making sure the same characters in the inner bracket can't match both sides of the alternation will reduce the problem from an exponential one to one that is linear in the length of the string. Use a character class instead of an alternation or put \n
on the right hand side of the alternation bar. \n
is disjoint with .
so if you hit a long sequence of spaces the regexp engine doesn't try all left-right-left etc. combinations before terminating.
I presume that it is catastrophically back tracking.
I think that part of the issue may well be that the dot and \s are not mutually exclusive.
If I change your expression to
<tag:main>((?:.|[\r\n])*)</tag:main>
and run it in the Regex Buddy debugger it fails a lot quicker in the event that the test string is not a match.
Instead of (?:.|\s)*
, you can use [^]*
to match any character including various forms of newline.
There's no alternation, so no risk of catastrophic backtracking.
本文标签: Javascript regex hangs (using v8)Stack Overflow
版权声明:本文标题:Javascript regex hangs (using v8) - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738518585a2091290.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
var regex = new RegExp("<tag:main>((?:.|\\s)*?)</tag:main>");
. Your regexp might cause issues if there are multiple tag elements in the document. – Andy E Commented Mar 9, 2010 at 9:42