admin管理员组

文章数量:1353228

This is how I get the tags of a body of text.

var tags =  body.match(/#([a-z0-9]+)/gi);

However, if the sentence is:

The brown #fox jumped over ‘ fence.

The regex above will treat "8216;" as a tag, which is what I do not want. I only want "fox" as a tag.

Note: I just want a basic regex solution.

This is how I get the tags of a body of text.

var tags =  body.match(/#([a-z0-9]+)/gi);

However, if the sentence is:

The brown #fox jumped over ‘ fence.

The regex above will treat "8216;" as a tag, which is what I do not want. I only want "fox" as a tag.

Note: I just want a basic regex solution.

Share Improve this question edited May 17, 2012 at 4:36 TIMEX asked May 17, 2012 at 4:30 TIMEXTIMEX 273k367 gold badges802 silver badges1.1k bronze badges 3
  • Note that not all hashtags are ASCII. Example: twitter./#!/search/%23%E4%BB%8A%E6%97%A5%E3%81%AF – icktoofay Commented May 17, 2012 at 4:37
  • @icktoofay how would I use regex to handle all utf-8 chars? – TIMEX Commented May 17, 2012 at 4:38
  • Unfortunately, it's a little tricky. JavaScript leaves some Unicode processing to you. You probably want to see Twitter's official JavaScript library for text processing to see how it matches hashtags. (Spoiler: It's plex.) – icktoofay Commented May 17, 2012 at 5:40
Add a ment  | 

4 Answers 4

Reset to default 8

Try this one:

/(^#|\s#)([a-z0-9]+)/gi

LIVE DEMO: http://jsfiddle/DerekL/NpjyR/

or this:

/(^#|[^&]#)([a-z0-9]+)/gi   //this will exclude every &#

Assuming you have access to the DOM, you could use the DOM to decode the HTML and then match on the text content:

var temp = document.createElement('div');
temp.innerHTML = body;
var tags = temp.textContent.match(/#([a-z0-9]+)/gi);

Try this one:

#([a-z0-9]+)\b(?!;)
liked unliked
true false
i others
true null

本文标签: