regex - Javascript unicode (greek) regular expressions - Stack Overflow-软件玩家

admin管理员组
文章数量:1399116

I would like to use this regular expression new RegExp("\b"+pat+"\b") in greek text but the "\b" metacharacter supports only ASCII characters.

I tried XregExp library but i didnt manage to solve the issue.

Any suggestions would be greatly appreciated.

I would like to use this regular expression new RegExp("\b"+pat+"\b") in greek text but the "\b" metacharacter supports only ASCII characters.

I tried XregExp library but i didnt manage to solve the issue.

Any suggestions would be greatly appreciated.

Share Improve this question edited Sep 1, 2012 at 5:22 slevithan 1,41414 silver badges20 bronze badges asked Apr 13, 2011 at 13:33 kylito 1211 silver badge4 bronze badges

4 possible duplicate of utf-8 word boundary regex in javascript – mplungjan Commented Apr 13, 2011 at 13:40
Did you use the Unicode plugin to XRegExp? – R. Martinho Fernandes Commented Apr 13, 2011 at 13:40
2 Javascript does not support Unicode, even though this is the dominant character set on the web. Use a language that does, and preferably one that meets at least the Level 1 requirements for basic Unicode regular expression support. – tchrist Commented Apr 13, 2011 at 13:58
1 @Martinho, as I explain in my answer, the XRegExp plug in does not correct \b to work according to the requirements of The Unicode Standard. It cannot be correctly implemented using only Unicode general categories, and even its approximation is mind-bending: (?:(?<=\w)(?!\w)|(?<!\w)(?=\w)). You would have to replace \w with [\pL\pM\p{Nd}\p{Nd}\p{Pc}] wherever it occurs there, and you couldn’t — because Javascript cannot manage to do standard lookbehinds. So that plugin cannot solve this problem. – tchrist Commented Apr 13, 2011 at 14:41
1 @Tim: Because the ECMA standard — and almost all implementations — have dragged their feet for so long that they’re easily more than a decade out of date, I can think of no alternative to offloading more of the heavy-lifting to server-side back-end processing. The ICU regex library and Perl are both Level-1(plus) pliant with the Unicode Standard, so either will work fine with Unicode. Also, PHP, Ruby 1.9, and Python (and in that order) all go a substantial distance further than Javascript does towards pliance, and would at least allow for what the OP desires. Sorry there’s no good news. – tchrist Commented Apr 13, 2011 at 14:46

| Show 6 more ments

2 Answers 2

Sorted by: Reset to default 4

I think this was helpful to your answer.,

<script src="xregexp.js"></script>
<script src="xregexp-unicode-base.js"></script>
<script>
    var unicodeWord = XRegExp("^\\p{L}+$");

    unicodeWord.test("Русский"); // true
    unicodeWord.test("日本語"); // true
    unicodeWord.test("العربية"); // true
</script>

<!-- \p{L} is included in the base script, but other categories, scripts,
and blocks require token packages -->
<script src="xregexp-unicode-scripts.js"></script>
<script>
    XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
</script>

Please refer the following location : http://xregexp./plugins/

So the answer is just, that you can not use the JavaScript native mechanisms or any library which uses those mechanisms to match words the way you want to. As you already stated, \b matches words. Words must consists of word characters. And in JavaScript (and actually other regex implementations word characters are a-z, A-Z, 0-9 and _. But many other Languages just implement the \b metacharacter in a different way JavaScript does.

The answer "JavaScript does not support Unicode" is a bit to easy and in fact pletely wrong. JavaScript just doesn't use unicode for the character classes. If JavaScript wouldn't support unicode you couldn't even use unicode Characters in String literals and of course this is possible in JavaScript.

According to the ECMA 262 Standard (ECMAScript) (Section 15.10.2.6):

[...] The production Assertion :: \ b evaluates by returning an internal AssertionTester closure that takes a State argument x and performs the following:

Let e be x's endIndex.
Call IsWordChar(e–1) and let a be the Boolean result.
Call IsWordChar(e) and let b be the Boolean result.
If a is true and b is false, return true.
If a is false and b is true, return true.
Return false. [..]

The abstract operation IsWordChar takes an integer parameter e and performs the following:

If e == –1 or e == InputLength, return false.
Let c be the character Input[e].
If c is one of the sixty-three characters below, return true. a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _
Return false

This just shows, that the \b uses the Algorithm of "isWordChar" to check if what you try to match is actually a word. Int he definition of "isWordChar" you can see the exact definition of which characters will return true for "isWordChar".

In my Opinion this has absolutely nothing to do with the character set being used. It's neither ASCII nor UNICODE pilant here. It's just these 63 characters.

本文标签： regexJavascript unicode (greek) regular expressionsStack Overflow

版权声明：本文标题：regex - Javascript unicode (greek) regular expressions - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744211880a2595457.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

regex - Javascript unicode (greek) regular expressions - Stack Overflow

2 Answers 2

更多相关文章

regex - Javascript unicode (greek) regular expressions - Stack Overflow

发表评论

推荐文章

create_users capabilities on a role on multisite

javascript - Array destructuring and spread operator - Stack Overflow

javascript - Ant design 4 validate form items from array - Stack Overflow

How to convert attachment pages to Posts?

javascript - IntersectionObserver unobserve not working on target in callback? - Stack Overflow

热门文章

plugin development - Unit Testing action hook

javascript - Emberjs application refresh on route other than index gives 404 error - Stack Overflow

javascript - Best Practice for Storing JSON Data That Will Be Passed to jQuery Plugin - Stack Overflow

firebase - Google Analytics Api is not logging event if hit from server or via CURL - Stack Overflow

How to prevent resized featured images?

javascript - Server-Side HighStock charts generation with NodeJS - Stack Overflow

javascript - How to toggle (jQuery) with an anchor - Stack Overflow

google tag manager - Adding Tags and Triggers to GTM with Python - Stack Overflow

jquery - Recursive search for a div in javascript - Stack Overflow

Revslider error. Impossible to enter admin dashboard

最新文章

windows设置断电重启开机后自动输入锁屏密码登录

Windows系统设置开机默认开启数字小键盘

Windows11 开机自动同步时间（开机时间不更新问题）

windows配置开机自启动软件或脚本

【Redis】Windows设置Redis为开机自启动

Loading cross domain XML with Javascript using a hybrid iframe-proxyxsljsonp concept? - Stack Overflow

javascript - Populate modal form with jquery - Stack Overflow

CI pipeline failing after installing Laravel Horizon - Stack Overflow

database - Block search SQL from happening

javascript - Uncaught ReferenceError: process is not definedLine 0: Parsing error - Stack Overflow

惠普OMEN 15-CE001TX 2EF91PA参数报价

苹果新款MacBook Pro 15英寸 i732GB1TBVega Pro 20参数报价

联想Y330A-PSE L参数报价

神舟战神Z7 D6 i7-12650H16GB512GBRTX4050旗舰版参数报价

神舟战神Z7 D6 i7-12650H16GB1TBRTX4050参数报价