admin管理员组

文章数量:1291757

I'm stumped! I've googled and read and read and read and I'm sure there is something really dumb that I'm doing wrong. This is from a Greasemonkey script that I can't for the life of me get to initiate AND perform correctly. I'm trying to match this:

<a href="/browse/post/SOMETHING/">**SOMETHING** (1111)</a>

Here's what I'm using:

var titleRegex = new RegExp("<a href=\"/browse/post/\d*/\">(.*) \(");

I'm sure I'm missing some kind of escape characters? But I just can't figure it out so that Firefox doesn't error out.

I generate the regexp using / -- In Firefox error console I receive "unterminated parenthetical"

I'm stumped! I've googled and read and read and read and I'm sure there is something really dumb that I'm doing wrong. This is from a Greasemonkey script that I can't for the life of me get to initiate AND perform correctly. I'm trying to match this:

<a href="/browse/post/SOMETHING/">**SOMETHING** (1111)</a>

Here's what I'm using:

var titleRegex = new RegExp("<a href=\"/browse/post/\d*/\">(.*) \(");

I'm sure I'm missing some kind of escape characters? But I just can't figure it out so that Firefox doesn't error out.

I generate the regexp using http://regexpal./ -- In Firefox error console I receive "unterminated parenthetical"

Share Improve this question edited Dec 27, 2011 at 22:01 Brock Adams 93.6k23 gold badges241 silver badges305 bronze badges asked Dec 27, 2011 at 21:36 spazzedspazzed 911 gold badge2 silver badges9 bronze badges 3
  • 5 stackoverflow./questions/1732348/… – asawyer Commented Dec 27, 2011 at 21:38
  • for ease of reading I always prefer literal regex, e.g. "here is a string".match(/match me/i) – tomfumb Commented Dec 27, 2011 at 21:48
  • I'd be curious to learn more about using an XML parser to acplish something like this. I'm basically trying to modify an existing script to acplish what I need it to do -- do you have a good example of a greasemonkey script that does things like this the right way? – spazzed Commented Dec 27, 2011 at 22:03
Add a ment  | 

3 Answers 3

Reset to default 5

When building a regex from a string instead of a regex literal, you need to double the backslashes.

Then, \d* only matches digits. I'm assuming that SOMETHING is just a placeholder, but if that were to contain anything but digits, it would fail.

Also, you should be using (.*?) (lazy) instead of (.*) (greedy), or you might be matching too much. Perhaps ([^(]*) would be even better.

Hard to say, though, without knowing more about the actual text you're trying to match.

All in all:

var titleRegex = new RegExp("<a href=\"/browse/post/\\d*/\">([^(]*) \\(");

Here's a simple fix:

/href=\".*?\">(.*?)\(/

The general idea is to take a string of HTML, parse it into a document (a tree of dom elements) then traverse it to extract information.

If the link was:

<a href="/browse/post/something/"><b>something</b> else</a>

First traverse the tree to find the anchor tag, then:

anchor.textContent // returns "something else"

It is simple to extract the text from an element, even when there are other elements in the tree below which also contain text. This is also more robust than the regex example. Say someone added a class attribute to the anchor, then the regex in the accepted answer would no-longer match the anchor tag. But a traversal based solution would still work.

In the simple case, you can create a div, then set the innerHTML to your HTML string, then traverse it:

var html = '<p><a href="/browse/post/">Lorem</p> <p>Ipsum</p></a>';
var div = document.createElement("div");
div.innerHTML = html;
var anchors = div.getElementsByTagName("a");
for (var i = 0; i < anchors.length; i++) {
    console.log(anchors[i].textContent);
}

A more sophisticated version of this is packaged in the jQuery(string) function.

var html = '<div><p><a href="/browse/post/">Lorem</p> <p>Ipsum</p></a></div>';
jQuery(html).find("a").each(function() {
    console.log(jQuery(this).text());
});

Live example: http://jsfiddle/ygcFM/

本文标签: javascriptRegular expression to get link textStack Overflow