admin管理员组

文章数量:1389754

I have this string with text mixed together with tabs, spaces, CR/LF and maybe more special characters as well.

How can I go about cleaning the string so that I only have the words left.

I tried the obvious

var txtArr = dirtyString.split(" ");

This is of course yielding some results but not good enough since it makes everything that had tabs or CR/LF or what-have-you-that-is-not-a-letter-nor-number-but-not-empty-space-either, show up as a concatenated word to the word that was next to it in the string thus rendering the amount of words fewer than it's supposed to be, and not correct either.

So, I'm a bit stuck. There's probably a regex trick to use for stuff like this. I'd appreciate some input. Thanks.

I have this string with text mixed together with tabs, spaces, CR/LF and maybe more special characters as well.

How can I go about cleaning the string so that I only have the words left.

I tried the obvious

var txtArr = dirtyString.split(" ");

This is of course yielding some results but not good enough since it makes everything that had tabs or CR/LF or what-have-you-that-is-not-a-letter-nor-number-but-not-empty-space-either, show up as a concatenated word to the word that was next to it in the string thus rendering the amount of words fewer than it's supposed to be, and not correct either.

So, I'm a bit stuck. There's probably a regex trick to use for stuff like this. I'd appreciate some input. Thanks.

Share Improve this question asked Jan 27, 2014 at 9:24 AdergaardAdergaard 81014 silver badges26 bronze badges 3
  • You want to split the string into an array? – Itay Commented Jan 27, 2014 at 9:27
  • Well, that's not really important. I want to only have words left that I know are words. If I have them in an array or in a string separated by COMMA or by SPACE or whatever is, as I said, not the main issue here. – Adergaard Commented Jan 27, 2014 at 9:28
  • Do you want to replace with empty string "". You can use replace function with some regex – Md. Yusuf Commented Jan 27, 2014 at 9:30
Add a ment  | 

3 Answers 3

Reset to default 2

An easy solution to your problem would be Pattren matching.
try:
var txtArr = dirtyString.split(/\s/g);

\s means find whitespaces only (skiping words).
\S not used here but its good to mention that it used to select only char sequence without whitespaces.
\w not used here but its good to mention that it used to select only words.
/g means Global- replace all results not only the first one.
Read more about JavaScript's Regular Expressions methods and usage here.

Try this

var dirtyString = "avcbn n@nb @#$%^&*()";
alert(dirtyString.replace(/[^a-zA-Z ]/g,""));

Try this,

var str="agsah gfdhgfh fgdhfd";

alert(str.replace(/\s/g,''));

\s for space and tab

g for global match to remove all occurrence

If you want to remove any character that is not a word character from the basic Latin alphabet. then you can use \W instead of \s.

var str="agsah gfdhgfh fgdhfd";

alert(str.replace(/\W/g,''));

For more about Regex visit here

本文标签: jQueryJavaScript clean string so only words existStack Overflow