admin管理员组

文章数量:1342531

I need a some help to replace all non-word characters in a string.

As an example (stadtbezirkspräsident' should bee stadtbezirkspräsident.

This Regex should work for all languages so it's kind of tricky because I have no idea how to match characters like ñ or œ. I tried solving this with

string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' ');

but ther are still to many special characters like Ø left.

Perhaps there is a general Selector for this, or anybody has solved this problem before?

I need a some help to replace all non-word characters in a string.

As an example (stadtbezirkspräsident' should bee stadtbezirkspräsident.

This Regex should work for all languages so it's kind of tricky because I have no idea how to match characters like ñ or œ. I tried solving this with

string.replace(/[&\/\\#,+()$~%.'":*?<>-_{}]/g,' ');

but ther are still to many special characters like Ø left.

Perhaps there is a general Selector for this, or anybody has solved this problem before?

Share Improve this question edited Nov 8, 2012 at 18:19 Tim Pietzcker 337k59 gold badges518 silver badges571 bronze badges asked Nov 3, 2012 at 13:53 BeMoreDifferent.BeMoreDifferent. 7541 gold badge8 silver badges17 bronze badges 2
  • Ø is a letter in various languages (Danish, for example) :) – Dominik Honnef Commented Nov 3, 2012 at 13:57
  • similar: this question. javascript regex doesn't have any native unicode-aware matchers – ben author Commented Nov 3, 2012 at 14:07
Add a ment  | 

3 Answers 3

Reset to default 6

If you have define all the Unicode ranges yourself, it's going to be a lot of work.

It might make more sense to use Steven Levithan's XRexExp package with Unicode add-ons and utilize its Unicode property shortcuts:

var regex = new XRegExp("\\P{L}+", "g")
string = XRegExp.replace(string, regex, "")

Try to use trick

str.replace(/(?!\w)[\x00-\xC0]/g, '')

This is more of a ment to Tim Pietzcker’s answer, but presenting code in ments is awkward... Here’s a simple example of using the XRexExp package:

<p id=orig>Bundespräsident / ß+ð/ə¿α!</p>
<p id=new></p>
<script src="http://cdnjs.cloudflare./ajax/libs/xregexp/2.0.0/xregexp-min.js">
</script>
<script src="http://xregexp./addons/unicode/unicode-base.js">
</script>
<script>
var regex = new XRegExp("\\P{L}+", "g");
var string = document.getElementById('orig').innerHTML;
string = XRegExp.replace(string, regex, "");
document.getElementById('new').innerHTML = string;
</script>

For production use, you would probably want to download some versions of the base package and the Unicode plug-in and use them on your server.

Note: The code checks for characters that are not classified as letters (alphabetic) in Unicode. I suppose this corresponds to what you mean by “word character”, though words in a natural language may contain hyphens, apostrophes, and other non-letters.

Beware that characters are added to Unicode, and the category of a character might (rarely) change. The package has been maintained well, though; it corresponds to Unicode 6.1 (version 6.2 is out, but it has no new letters).

本文标签: javascriptReplace all nonword characters like *Stack Overflow