admin管理员组

文章数量:1133954

How do you split a long piece of text into separate lines? Why does this return line1 twice?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

["line1", "line1"]

I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.

I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n line endings.

How do you split a long piece of text into separate lines? Why does this return line1 twice?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

["line1", "line1"]

I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.

I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n line endings.

Share Improve this question asked Feb 17, 2011 at 21:17 JoJoJoJo 20.1k35 gold badges110 silver badges165 bronze badges
Add a comment  | 

7 Answers 7

Reset to default 162
arrayOfLines = lineString.match(/[^\r\n]+/g);

As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.

Use

result = subject.split(/\r?\n/);

Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.

I am assuming following constitute newlines

  1. \r followed by \n
  2. \n followed by \r
  3. \n present alone
  4. \r present alone

Please Use

var re=/\r\n|\n\r|\n|\r/g;

arrayofLines=lineString.replace(re,"\n").split("\n");

for an array of all Lines including the empty ones.

OR

Please Use

arrayOfLines = lineString.match(/[^\r\n]+/g); 

For an array of non empty Lines

Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:

var lines = text.split(/[\r\n]+/g);

With whitespace trimming:

var lines = text.trim().split(/\s*[\r\n]+\s*/g);

Unicode Compliant Line Splitting

Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):

const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)

I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document

本文标签: javascriptJS regex to split by lineStack Overflow