javascript - Regular expression to get link text - Stack Overflow

IT技术

更新时间：2025-03-101

admin管理员组
文章数量:1291757

I'm stumped! I've googled and read and read and read and I'm sure there is something really dumb that I'm doing wrong. This is from a Greasemonkey script that I can't for the life of me get to initiate AND perform correctly. I'm trying to match this:

<a href="/browse/post/SOMETHING/">**SOMETHING** (1111)</a>

Here's what I'm using:

var titleRegex = new RegExp("<a href=\"/browse/post/\d*/\">(.*) \(");

I'm sure I'm missing some kind of escape characters? But I just can't figure it out so that Firefox doesn't error out.

I generate the regexp using / -- In Firefox error console I receive "unterminated parenthetical"

I'm stumped! I've googled and read and read and read and I'm sure there is something really dumb that I'm doing wrong. This is from a Greasemonkey script that I can't for the life of me get to initiate AND perform correctly. I'm trying to match this:

<a href="/browse/post/SOMETHING/">**SOMETHING** (1111)</a>

Here's what I'm using:

var titleRegex = new RegExp("<a href=\"/browse/post/\d*/\">(.*) \(");

I'm sure I'm missing some kind of escape characters? But I just can't figure it out so that Firefox doesn't error out.

I generate the regexp using http://regexpal./ -- In Firefox error console I receive "unterminated parenthetical"

Share Improve this question edited Dec 27, 2011 at 22:01 Brock Adams 93.6k23 gold badges241 silver badges305 bronze badges asked Dec 27, 2011 at 21:36 spazzed 911 gold badge2 silver badges9 bronze badges

5 stackoverflow./questions/1732348/… – asawyer Commented Dec 27, 2011 at 21:38
for ease of reading I always prefer literal regex, e.g. "here is a string".match(/match me/i) – tomfumb Commented Dec 27, 2011 at 21:48
I'd be curious to learn more about using an XML parser to acplish something like this. I'm basically trying to modify an existing script to acplish what I need it to do -- do you have a good example of a greasemonkey script that does things like this the right way? – spazzed Commented Dec 27, 2011 at 22:03

Add a ment |

3 Answers 3

Sorted by: Reset to default 5

When building a regex from a string instead of a regex literal, you need to double the backslashes.

Then, \d* only matches digits. I'm assuming that SOMETHING is just a placeholder, but if that were to contain anything but digits, it would fail.

Also, you should be using (.*?) (lazy) instead of (.*) (greedy), or you might be matching too much. Perhaps ([^(]*) would be even better.

Hard to say, though, without knowing more about the actual text you're trying to match.

All in all:

var titleRegex = new RegExp("<a href=\"/browse/post/\\d*/\">([^(]*) \\(");

Here's a simple fix:

/href=\".*?\">(.*?)\(/

The general idea is to take a string of HTML, parse it into a document (a tree of dom elements) then traverse it to extract information.

If the link was:

<a href="/browse/post/something/"><b>something</b> else</a>

First traverse the tree to find the anchor tag, then:

anchor.textContent // returns "something else"

It is simple to extract the text from an element, even when there are other elements in the tree below which also contain text. This is also more robust than the regex example. Say someone added a class attribute to the anchor, then the regex in the accepted answer would no-longer match the anchor tag. But a traversal based solution would still work.

In the simple case, you can create a div, then set the innerHTML to your HTML string, then traverse it:

var html = '<p><a href="/browse/post/">Lorem</p> <p>Ipsum</p></a>';
var div = document.createElement("div");
div.innerHTML = html;
var anchors = div.getElementsByTagName("a");
for (var i = 0; i < anchors.length; i++) {
    console.log(anchors[i].textContent);
}

A more sophisticated version of this is packaged in the jQuery(string) function.

var html = '<div><p><a href="/browse/post/">Lorem</p> <p>Ipsum</p></a></div>';
jQuery(html).find("a").each(function() {
    console.log(jQuery(this).text());
});

Live example: http://jsfiddle/ygcFM/

本文标签： javascriptRegular expression to get link textStack Overflow

版权声明：本文标题：javascript - Regular expression to get link text - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1741541371a2384340.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

javascript - Regular expression to get link text - Stack Overflow

3 Answers 3

更多相关文章