admin管理员组文章数量:1303668
I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...
My pattern:
<p>(.*?)</p>
Subject:
<p> My content. </p> <img src=":ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>
Result :
My content
What I want:
My content. Second sentence.
I try to extract text between parapgraph tag using RegExp in javascript. But it doen't work...
My pattern:
<p>(.*?)</p>
Subject:
<p> My content. </p> <img src="https://encrypted-tbn3.gstatic./images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>
Result :
My content
What I want:
My content. Second sentence.
Share
Improve this question
asked Feb 19, 2013 at 23:48
tonymx227tonymx227
5,45117 gold badges53 silver badges97 bronze badges
6
- 3 Don't parse HTML with RegEx – gilly3 Commented Feb 19, 2013 at 23:51
-
1
You can get the body of
<p>
tags just fine with regex (despite the warnings against parsing generally with it), but if you're using JavaScript there's no need to since you havedocument.getElementsByTagName("p")
. – Reinstate Monica -- notmaynard Commented Feb 19, 2013 at 23:58 -
@iamnotmaynard -
document.getElementsByTagName()
is a DOM method. It is only available to JavaScript because the browser provides it. With node.js, there is no browser, and node.js does not natively parse HTML into a DOM. You can't assume that, just because you are using the JavaScript language, a browser DOM is available. A DOM can be made available to node.js if such a package is installed, such as jsdom. – gilly3 Commented Feb 20, 2013 at 0:06 - @gilly3 Ah, I see. Was not aware of that. – Reinstate Monica -- notmaynard Commented Feb 20, 2013 at 0:07
- @gilly3, hoh no... Not that easy generic answer again -_-. Using regex for what he wants is perfectly fine. – Jean-Philippe Leclerc Commented Feb 20, 2013 at 0:45
2 Answers
Reset to default 5There is no "capture all group matches" (analogous to PHP's preg_match_all
) in JavaScript, but you can cheat by using .replace
:
var matches = [];
html.replace(/<p>(.*?)<\/p>/g, function () {
//arguments[0] is the entire match
matches.push(arguments[1]);
});
To get more than one match of a pattern the global flag g
is added.
The match
method ignores capture groups ()
when matching globally, but the exec
method does not. See MDN exec.
var m,
rex = /<p>(.*?)<\/p>/g,
str = '<p> My content. </p> <img src="https://encrypted-tbn3.gstatic./images?q=tbn:ANd9GcTJ9ylGJ4SDyl49VGh9Q9an2vruuMip-VIIEG38DgGM3GvxEi_H"> <p> Second sentence. </p>';
while ( ( m = rex.exec( str ) ) != null ) {
console.log( m[1] );
}
// My content.
// Second sentence.
If there may be newlines between the paragraphs, use [\s\S]
, meaning match any space or non-space character, instead of .
.
Note that this kind of regex will fail on nested paragraphs as it will match up to the first closing tag.
本文标签: javascriptExtract text between paragraph tag using RegExStack Overflow
版权声明:本文标题:javascript - Extract text between paragraph tag using RegEx - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1741762148a2396472.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论