admin管理员组

文章数量:1319021

I'm new to the domain of regular expressions.
All I'll post below are simplified examples from my code.

I have a string, let's say test_1,some_2,foo,bar_4, that I want to replace by title: test (1) title: some (2) title: foo () title: bar (4)

What I have now is (which works):

var test = "test_1,some_2,foo,bar_4,";
console.log(test.replace(/(.*?)(?:_(\d))?,/g, "title: $1 ($2)\n"));

which outputs:

title: test (1)
title: some (2)
title: foo ()
title: bar (4)

In an effort to makes things right, I want to get rid off the a after the last item. The list will look like test_1,some_2,foo,bar_4 (no a after bar_4)

So the new code:

var test = "test_1,some_2,foo,bar_4";
console.log(test.replace(/(.*?)(?:_(\d))?(?:,|$)/g, "title: $1 ($2) "));

outputs something wrong. There's an extra empty match at the end:

title: test (1)
title: some (2)
title: foo ()
title: bar (4)
title:  ()

My questions are: Why? How to fix it? Is there any possible improvements in the actual regex?

demo jsFiddle

I'm new to the domain of regular expressions.
All I'll post below are simplified examples from my code.

I have a string, let's say test_1,some_2,foo,bar_4, that I want to replace by title: test (1) title: some (2) title: foo () title: bar (4)

What I have now is (which works):

var test = "test_1,some_2,foo,bar_4,";
console.log(test.replace(/(.*?)(?:_(\d))?,/g, "title: $1 ($2)\n"));

which outputs:

title: test (1)
title: some (2)
title: foo ()
title: bar (4)

In an effort to makes things right, I want to get rid off the a after the last item. The list will look like test_1,some_2,foo,bar_4 (no a after bar_4)

So the new code:

var test = "test_1,some_2,foo,bar_4";
console.log(test.replace(/(.*?)(?:_(\d))?(?:,|$)/g, "title: $1 ($2) "));

outputs something wrong. There's an extra empty match at the end:

title: test (1)
title: some (2)
title: foo ()
title: bar (4)
title:  ()

My questions are: Why? How to fix it? Is there any possible improvements in the actual regex?

demo jsFiddle

Share Improve this question edited Dec 15, 2012 at 13:42 Alexander 23.5k11 gold badges64 silver badges73 bronze badges asked Dec 15, 2012 at 12:10 CronosSCronosS 3,1593 gold badges23 silver badges29 bronze badges 2
  • There is an empty match because all parts of your regex are optional: .*? can match 0 characters, (?:...)? is an optional group, and with your last change you made the ma optional. – melpomene Commented Dec 15, 2012 at 12:16
  • 1 Change to this: /(.+?)(?:_(\d))?(?:,|$)/ – nhahtdh Commented Dec 15, 2012 at 12:16
Add a ment  | 

3 Answers 3

Reset to default 5

You are getting that last false-positive match because your regular expression is matching empty strings:

"".replace(/(.*?)(?:_(\d))?(?:,|$)/g, "title: '$1' ('$2') ");

title: '' ('') 

So, in your case after all characters have been consumed, it will match an empty string.

You can control by changing your first group to be non-optional, considering it is not really an optional one as it shows.

/(.*?)(?:_(\d))?(?:,|$)/g
 --^^--

For example,

var str = "test_1,some_2,foo,bar_4";
test.replace(/([a-z]+)(?:_(\d))?(?:,|$)/gi, "title: '$1' ('$2') ");

title: test (1) title: some (2) title: foo () title: bar (4)

That is,

  • ([a-z]+): Matching at least one alphabetical character, and
  • gi: Making the string case-insensitive.

As a simplest solution, you can just add trailing ma to original string before matching regular expression.

Your problem is that your pattern matches not only what you want but also empty strings:

(.*?)  # matches any string (including an empty one) not containing \n
(?:_(\d))?  # it is an optional group
(?:,|$)  # it matches a ma or the end of the string

So when your regex engines evaluates the end of your string against your pattern it sees that:

  • the first group matches because an empty string is being processed
  • the second group matches because it is optional
  • the third group matches because the end of the string is being processed

so the whole pattern matches and you get an extra match. You can see it clearly in the console using the match method of strings

> s.match(/(.*?)(?:_(\d))?(?:,|$)/g)
  ["test_1,", "some_2,", "foo,", "bar_4", ""]

You have at least two options for dealing with the problem:

  • change the first group of your pattern in a way that doesn't match the empty string but still fits your needs (it depends on the strings you have to process)
  • leave your regex untouched and process the string returned by replace removing the unwanted part

The first option is the elegant one. The second can be easily achieved with an extra line of code:

> var result = s.replace(/(.*?)(?:_(\d))?(?:,|$)/g, "title: $1 ($2) ");
> result = result.slice(0, result.lastIndexOf("title"));
  "title: test (1) title: some (2) title: foo () title: bar (4) "

本文标签: javascriptRegular expression matches an extra empty groupStack Overflow