admin管理员组文章数量:1342531
I need a some help to replace all non-word characters in a string.
As an example (stadtbezirkspräsident'
should bee stadtbezirkspräsident
.
This Regex should work for all languages so it's kind of tricky because I have no idea how to match characters like ñ
or œ
. I tried solving this with
string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' ');
but ther are still to many special characters like Ø
left.
Perhaps there is a general Selector for this, or anybody has solved this problem before?
I need a some help to replace all non-word characters in a string.
As an example (stadtbezirkspräsident'
should bee stadtbezirkspräsident
.
This Regex should work for all languages so it's kind of tricky because I have no idea how to match characters like ñ
or œ
. I tried solving this with
string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' ');
but ther are still to many special characters like Ø
left.
Perhaps there is a general Selector for this, or anybody has solved this problem before?
Share Improve this question edited Nov 8, 2012 at 18:19 Tim Pietzcker 337k59 gold badges518 silver badges571 bronze badges asked Nov 3, 2012 at 13:53 BeMoreDifferent.BeMoreDifferent. 7541 gold badge8 silver badges17 bronze badges 2- Ø is a letter in various languages (Danish, for example) :) – Dominik Honnef Commented Nov 3, 2012 at 13:57
- similar: this question. javascript regex doesn't have any native unicode-aware matchers – ben author Commented Nov 3, 2012 at 14:07
3 Answers
Reset to default 6If you have define all the Unicode ranges yourself, it's going to be a lot of work.
It might make more sense to use Steven Levithan's XRexExp
package with Unicode add-ons and utilize its Unicode property shortcuts:
var regex = new XRegExp("\\P{L}+", "g")
string = XRegExp.replace(string, regex, "")
Try to use trick
str.replace(/(?!\w)[\x00-\xC0]/g, '')
This is more of a ment to Tim Pietzcker’s answer, but presenting code in ments is awkward... Here’s a simple example of using the XRexExp package:
<p id=orig>Bundespräsident / ß+ð/ə¿α!</p>
<p id=new></p>
<script src="http://cdnjs.cloudflare./ajax/libs/xregexp/2.0.0/xregexp-min.js">
</script>
<script src="http://xregexp./addons/unicode/unicode-base.js">
</script>
<script>
var regex = new XRegExp("\\P{L}+", "g");
var string = document.getElementById('orig').innerHTML;
string = XRegExp.replace(string, regex, "");
document.getElementById('new').innerHTML = string;
</script>
For production use, you would probably want to download some versions of the base package and the Unicode plug-in and use them on your server.
Note: The code checks for characters that are not classified as letters (alphabetic) in Unicode. I suppose this corresponds to what you mean by “word character”, though words in a natural language may contain hyphens, apostrophes, and other non-letters.
Beware that characters are added to Unicode, and the category of a character might (rarely) change. The package has been maintained well, though; it corresponds to Unicode 6.1 (version 6.2 is out, but it has no new letters).
本文标签: javascriptReplace all nonword characters like *Stack Overflow
版权声明:本文标题:javascript - Replace all non-word characters like ?*+# - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743697553a2523760.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论