admin管理员组

文章数量:1316825

I'm trying to do a replace on the following string prototype: "I‘m singing & dancing in the rain." The following regular expression matches the instance properly, but also captures the character following the instance of &amp. "(&)[#?a-zA-Z0-9;]" captures the following string from the above prototype: "&l".

How can I limit it to only capture the &?

Edit: I should add that I don't want to match "&" by itself.

I'm trying to do a replace on the following string prototype: "I‘m singing & dancing in the rain." The following regular expression matches the instance properly, but also captures the character following the instance of &amp. "(&)[#?a-zA-Z0-9;]" captures the following string from the above prototype: "&l".

How can I limit it to only capture the &?

Edit: I should add that I don't want to match "&" by itself.

Share Improve this question edited Nov 19, 2009 at 16:22 sholsinger asked Nov 19, 2009 at 16:12 sholsingersholsinger 3,0882 gold badges24 silver badges41 bronze badges
Add a ment  | 

5 Answers 5

Reset to default 4

look for (this copes with named, decimal and hexadecimal entities):

&([A-Za-z]+|#x[\dA-Fa-f]+|#\d+);

replace with

&$1;

Be warned: This has a real probability to go wrong. I remend using a HTML parser to decode the text. You can decode it twice, if it was double encoded. HTML and regex don't play well together even on the small scale.

Since you are in JavaScript, I expect you are in a browser. If you are, you have a nice DOM parser at your hands. Create a new element, assign the string to its inner HTML property and read out the text value. Done.

I gather that you want to match &, but only if it is followed by an alphanumeric character or certain punctuation. That calls for lookahead. This regular expression should match what you want without capturing or consuming any additional characters.

(&)(?=[#?a-zA-Z0-9;])

Actually you're matching the string &l but captured is only the &. This is because of the character class after the capture group which will match an additional character.

But your original regex is a little flawed to begin with anyway. A (not optimal) replacement might be:

&(#[0-9]+|#x[0-9a-zA-Z]+|[a-zA-Z]+);

which will match the plete entity or character declaration and capture the &.

If you only want to match &, why did you include the character class [#?a-zA-Z0-9;] as well?

In english, your expression would be "Match & followed by a character that is #, ?, a lowercase letter, an uppercase letter or ;".

Just use (&)

You probably meant:

"&([#a-zA-Z0-9]+;)"

本文标签: javascriptRegex To Match ampampentity or ampamp09 And Capture ampampStack Overflow