admin管理员组文章数量:1221467
We are working on a project where we want users to be able to use both emoji syntax (like :smile:
, :heart:
, :confused:
,:stuck_out_tongue:
) as well as normal emoticons (like :)
, <3
, :/
, :p
)
I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:
- normal strings or URL's - http
:/
/example - within the emoji syntax -
:p
encil:
How can I find these emoticon character sequences but not when other characters are near them?
The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:
(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)
You can play with a demo of it in action here:
We are working on a project where we want users to be able to use both emoji syntax (like :smile:
, :heart:
, :confused:
,:stuck_out_tongue:
) as well as normal emoticons (like :)
, <3
, :/
, :p
)
I'm having trouble with the emoticon syntax because sometimes those character sequences will occur in:
- normal strings or URL's - http
:/
/example.com - within the emoji syntax -
:p
encil:
How can I find these emoticon character sequences but not when other characters are near them?
The entire regex I'm using for all the emoticons is huge, so here's a trimed down version:
(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)
You can play with a demo of it in action here: http://regexr.com/3a8o5
Share Improve this question asked Jan 21, 2015 at 21:21 Chris BarrChris Barr 34k28 gold badges102 silver badges152 bronze badges 2 |4 Answers
Reset to default 8Match emoji first (to take care of the :pencil: example) and then check for a terminating whitespace or newline:
(\:\w+\:|\<[\/\\]?3|[\(\)\\\D|\*\$][\-\^]?[\:\;\=]|[\:\;\=B8][\-\^]?[3DOPp\@\$\*\\\)\(\/\|])(?=\s|[\!\.\?]|$)
This regex matches the following (preferring emoji) returning the match in matching group 1:
:( :) :P :p :O :3 :| :/ :\ :$ :* :@
:-( :-) :-P :-p :-O :-3 :-| :-/ :-\ :-$ :-* :-@
:^( :^) :^P :^p :^O :^3 :^| :^/ :^\ :^$ :^* :^@
): (: $: *:
)-: (-: $-: *-:
)^: (^: $^: *^:
<3 </3 <\3
:smile: :hug: :pencil:
It also supports terminal punctuation as a delimiter in addition to white space.
You can see more details and test it here: https://regex101.com/r/aM3cU7/4
Make a positive look-ahead for a space
([\:\<]-?[)(|\\/pP3D])(?:(?=\s))
| | | |
| | | |
| | | |-> match last separating space
| | |-> match last part of the emot
| |-> it may have a `-` or not
|-> first part of the emoticon
Since you're using javascript, and you don't have access to look arounds:
/([\:\<]-?[)|\\/pP3D])(\s|$)/g.exec('hi :) ;D');
And then just splice()
the resulting array out of its last entry (that's most probably a space)
I assume these emoticons will commonly be used with spaces before and after. Then \s
might be what you're looking for, as it represents a white space.
Then your regex would become
\s+(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)\s
You want regex look-arounds regarding spacing. Another answer here suggested a positive look-ahead, though I'd go double-negative:
(?<!\S)(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)
While JavaScript doesn't support (?<!pattern)
, look-behind can be mimicked:
test_string.replace(/(\S)?(\:\)|\:\(|<3|\:\/|\:-\/|\:\||\:p)(?!\S)/,
function($0, $1) { return $1 ? $0 : replacement_text; });
All I did was prefix your code with (?<!\S)
in front and suffix with(?!\S)
in back. The prefix ensures you do not follow a non-whitespace character, so the only valid leading entries are spaces or nothing (start of line). The suffix does the same thing, ensuring you are not followed by a non-whitespace character. See also this more thorough regex walk-through.
One of the comments to the question itself was suggesting \b
(word boundary) markers. I don't recommend these. In fact, this suggestion would do the opposite of what you want; \b:/
will indeed match http://
since there is a word boundary between the p
and the :
. This kind of reasoning would suggest \B
(not a word boundary), e.g. \B:/\B
. This is more portable (it works with pretty much all regex parsers while look-arounds do not), and you can choose it in that case, but I prefer the look-arounds.
本文标签: javascriptRegex matching emoticonsStack Overflow
版权声明:本文标题:javascript - Regex matching emoticons - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1739367592a2160113.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
/\b:\)\b/
– elclanrs Commented Jan 21, 2015 at 21:24