admin管理员组文章数量:1393092
I'm trying to strip a string to leave only word characters remaining. For anything using the Latin alphabet, I can manage it quite easily with
str = str.replace(/\W/g, '').replace(/[0-9]/g, '');
(I think I probably don't need both replace
s, but I'm very new to regular expressions and not sure what I'm doing)
However, this also strips out foreign characters such as chinese or arabic.
How would I write a function to do this?
strOne = "test!(£)98* string";
strTwo = "你好,325!# 世界";
cleanUp (strOne); // Output: "test string"
cleanUp (strTwo); // Output: "您好 世界"
(In case anyone is wondering, the chinese is me running "hello world" through an online translator)
On a library note, I don't know if it's relevant but I'm using dojo and would like to avoid jquery if possible.
I'm trying to strip a string to leave only word characters remaining. For anything using the Latin alphabet, I can manage it quite easily with
str = str.replace(/\W/g, '').replace(/[0-9]/g, '');
(I think I probably don't need both replace
s, but I'm very new to regular expressions and not sure what I'm doing)
However, this also strips out foreign characters such as chinese or arabic.
How would I write a function to do this?
strOne = "test!(£)98* string";
strTwo = "你好,325!# 世界";
cleanUp (strOne); // Output: "test string"
cleanUp (strTwo); // Output: "您好 世界"
(In case anyone is wondering, the chinese is me running "hello world" through an online translator)
On a library note, I don't know if it's relevant but I'm using dojo and would like to avoid jquery if possible.
Share Improve this question asked Sep 6, 2013 at 10:18 EmmaEmma 3351 gold badge4 silver badges13 bronze badges 2- I suppose you can look into unicodes like in this SO post, another link or you can convert the characters to english using a plugin then cleanup, the only plugin I can think for now is a jquery one though :) google's translate plugin – user2587132 Commented Sep 6, 2013 at 10:31
-
1
To merge both replaces you can use
|
. The pipe character means this or that (this|that), so in your case the regex would be/\W|[0-9]/g
. – Ron Commented Sep 6, 2013 at 10:43
2 Answers
Reset to default 4you need a regex pattern using unicode character properties, namely \P{Letter}
.
unfortunately the native js regex engine does not support these constructs (cf. mdn docs). however there is (at least) this third-party library which includes a js plugin adding the support.
code sample:
var regex, str;
str = "whatever";
regex = XRegExp('\\P{Letter}');
str = XRegExp.replace(str, regex, '');
\W
is equivalent to [^a-zA-Z_0-9]
instead you need to list all the characters that you want to strip out.
str = str.replace(/[
put the characters you want to get rid of here]*/g, '');
本文标签: regexJavascript regular expression to leave only words (international version)Stack Overflow
版权声明:本文标题:regex - Javascript regular expression to leave only words (international version) - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1744569025a2613227.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论