admin管理员组

文章数量:1303668

I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...

My pattern:

<p>(.*?)</p>

Subject:

<p> My content. </p> <img src=":ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>

Result :

My content

What I want:

My content. Second sentence.

I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...

My pattern:

<p>(.*?)</p>

Subject:

<p> My content. </p> <img src="https://encrypted-tbn3.gstatic./images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>

Result :

My content

What I want:

My content. Second sentence.
Share Improve this question asked Feb 19, 2013 at 23:48 tonymx227tonymx227 5,45117 gold badges53 silver badges97 bronze badges 6
  • 3 Don't parse HTML with RegEx – gilly3 Commented Feb 19, 2013 at 23:51
  • 1 You can get the body of <p> tags just fine with regex (despite the warnings against parsing generally with it), but if you're using JavaScript there's no need to since you have document.getElementsByTagName("p"). – Reinstate Monica -- notmaynard Commented Feb 19, 2013 at 23:58
  • @iamnotmaynard - document.getElementsByTagName() is a DOM method. It is only available to JavaScript because the browser provides it. With node.js, there is no browser, and node.js does not natively parse HTML into a DOM. You can't assume that, just because you are using the JavaScript language, a browser DOM is available. A DOM can be made available to node.js if such a package is installed, such as jsdom. – gilly3 Commented Feb 20, 2013 at 0:06
  • @gilly3 Ah, I see. Was not aware of that. – Reinstate Monica -- notmaynard Commented Feb 20, 2013 at 0:07
  • @gilly3, hoh no... Not that easy generic answer again -_-. Using regex for what he wants is perfectly fine. – Jean-Philippe Leclerc Commented Feb 20, 2013 at 0:45
 |  Show 1 more ment

2 Answers 2

Reset to default 5

There is no "capture all group matches" (analogous to PHP's preg_match_all) in JavaScript, but you can cheat by using .replace:

var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
    //arguments[0] is the entire match
    matches.push(arguments[1]);
});

To get more than one match of a pattern the global flag g is added.
The match method ignores capture groups () when matching globally, but the exec method does not. See MDN exec.

var m,
    rex = /<p>(.*?)<\/p>/g,
    str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic./images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';

while ( ( m = rex.exec( str ) ) != null ) {
    console.log( m[1] );
}

//  My content. 
//  Second sentence. 

If there may be newlines between the paragraphs, use [\s\S], meaning match any space or non-space character, instead of ..

Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.

本文标签: javascriptExtract text between paragraph tag using RegExStack Overflow