admin管理员组

文章数量:1344603

Now that many browsers support reading local files with HTML5's FileReader, this opens the door to websites which go beyond 'database front-ends' into scripts which can do something useful with local data without having to send it up to a server first.

Pre-processing images and video before upload aside, one big application of FileReader would be loading data from some kind of on-disk table (CSV, TSV, whatever) into the browser for manipulation - perhaps for plotting or analysis in D3.js or creating landscapes in WebGL.

Problem is, most examples out there on StackOverflow and other sites use FileReader's .readAsText() property, which reads the whole file into RAM before returning a result.

javascript: how to parse a FileReader object line by line

To read a file without loading the data into RAM, one would need to use .readAsArrayBuffer(), and this SO post is the closest I can get to a good answer:

filereader api on big files

However, it's a bit too specific to that particular problem, and in all honesty, I could try for days to make the solution more general, and e out empty handed because I didn't understand the significance of the chunk sizes or why Uint8Array is used. A solution to the more general problem of reading a file line-by-line using a user-definable line separator (ideally with .split() since that also accept regex), and then doing something per-line (like printing it to the console.log) would be ideal.

Now that many browsers support reading local files with HTML5's FileReader, this opens the door to websites which go beyond 'database front-ends' into scripts which can do something useful with local data without having to send it up to a server first.

Pre-processing images and video before upload aside, one big application of FileReader would be loading data from some kind of on-disk table (CSV, TSV, whatever) into the browser for manipulation - perhaps for plotting or analysis in D3.js or creating landscapes in WebGL.

Problem is, most examples out there on StackOverflow and other sites use FileReader's .readAsText() property, which reads the whole file into RAM before returning a result.

javascript: how to parse a FileReader object line by line

To read a file without loading the data into RAM, one would need to use .readAsArrayBuffer(), and this SO post is the closest I can get to a good answer:

filereader api on big files

However, it's a bit too specific to that particular problem, and in all honesty, I could try for days to make the solution more general, and e out empty handed because I didn't understand the significance of the chunk sizes or why Uint8Array is used. A solution to the more general problem of reading a file line-by-line using a user-definable line separator (ideally with .split() since that also accept regex), and then doing something per-line (like printing it to the console.log) would be ideal.

Share Improve this question edited May 23, 2017 at 10:31 CommunityBot 11 silver badge asked May 25, 2015 at 14:34 J.JJ.J 3,6071 gold badge33 silver badges37 bronze badges 5
  • "A solution to the more general problem of reading a file line-by-line using a user-definable line separator (ideally with .split() since that also accept regex)" if you can use split, you have already loaded the whole file... – n00dl3 Commented May 25, 2015 at 14:43
  • Not if you split on chunks as you read it :) Say, read 1Mb, split, process lines, with the remainder add another 1Mb, rinse repeat :) – J.J Commented May 25, 2015 at 14:45
  • 1 The reason you use Uint8Array (or Buffer in node.js) is because the file may be binary and javascript strings can't handle binary data (for example, the byte 0x00 - otherwise known as the nul terminator (yes, that's nul with one "l")) – slebetman Commented May 25, 2015 at 15:50
  • 1 I'll give two points here. First on the use of Uint8Array. Remember that a file is a sequence of bytes, not characters, as a string is. As a result, the file is read in chunks of bytes (using Uint8Array), not chunks of characters (that's why the answer says it "assumes the input is ASCII"). To convert to characters, the character encoding needs to be known (nowadays it can be assumed UTF-8 unless otherwise given by metadata) and a character decoder such as the TextEncoder class used. – Peter O. Commented May 25, 2015 at 15:51
  • 1 Next on line separators. In general there are only two or three possible choices for line separators: LF, CR/LF and CR. Of these, the first two are the most mon (the first in Linux, the second in Windows). Other choices would be exceedingly unusual. Therefore, it would be better to create a line reader that handles the most mon choices for line separaters, making it unnecessary to need to specify the line separator manually. – Peter O. Commented May 25, 2015 at 15:51
Add a ment  | 

2 Answers 2

Reset to default 12

I've made a LineReader class at the following Gist URL. As I mentioned in a ment, it's unusual to use other line separators than LF, CR/LF and maybe CR. Thus, my code only considers LF and CR/LF as line separators.

https://gist.github./peteroupc/b79a42fffe07c2a87c28

Example:

new LineReader(file).readLines(function(line){
 console.log(line);
});

Here is an adapted TypeScript class version of the code from Peter O.

export class BufferedFileLineReader {
  bufferOffset = 0;
  callback: (line: string) => void = () => undefined;
  currentLine = '';
  decodeOptions: TextDecodeOptions = { 'stream': true };
  decoder = new TextDecoder('utf-8', { 'ignoreBOM': true });
  endCallback: () => void = () => undefined;
  lastBuffer: Uint8Array | undefined;
  offset = 0;
  omittedCR = false;
  reader = new FileReader();
  sawCR = false;

  readonly _error = (event: Event): void => {
    throw event;
  };

  readonly _readFromView = (dataArray: Uint8Array, offset: number): void => {
    for (let i = offset; i < dataArray.length; i++) {
      // Treats LF and CRLF as line breaks
      if (dataArray[i] == 0x0A) {
        // Line feed read
        const lineEnd = (this.sawCR ? i - 1 : i);
        if (lineEnd > 0) {
          this.currentLine += this.decoder.decode(dataArray.slice(this.bufferOffset, lineEnd), this.decodeOptions);
        }
        this.callback(this.currentLine);
        this.decoder.decode(new Uint8Array([]));
        this.currentLine = '';
        this.sawCR = false;
        this.bufferOffset = i + 1;
        this.lastBuffer = dataArray;
      } else if (dataArray[i] == 0x0D) {
        if (this.omittedCR) {
          this.currentLine += '\r';
        }
        this.sawCR = true;
      } else if (this.sawCR) {
        if (this.omittedCR) {
          this.currentLine += '\r';
        }
        this.sawCR = false;
      }
      this.omittedCR = false;
    }

    if (this.bufferOffset != dataArray.length) {
      // Decode the end of the line if no current line was reached
      const lineEnd = (this.sawCR ? dataArray.length - 1 : dataArray.length);
      if (lineEnd > 0) {
        this.currentLine += this.decoder.decode(dataArray.slice(this.bufferOffset, lineEnd), this.decodeOptions);
      }
      this.omittedCR = this.sawCR;
    }
  };

  readonly _viewLoaded = (): void => {
    if (!this.reader.result) {
      this.endCallback();
    }

    const dataArray = new Uint8Array(this.reader.result as ArrayBuffer);
    if (dataArray.length > 0) {
      this.bufferOffset = 0;
      this._readFromView(dataArray, 0);
      this.offset += dataArray.length;
      const s = this.file.slice(this.offset, this.offset + 256);
      this.reader.readAsArrayBuffer(s);
    } else {
      if (this.currentLine.length > 0) {
        this.callback(this.currentLine);
      }
      this.decoder.decode(new Uint8Array([]));
      this.currentLine = '';
      this.sawCR = false;
      this.endCallback();
    }
  }

  constructor(private file: File) {
    this.reader.addEventListener('load', this._viewLoaded);
    this.reader.addEventListener('error', this._error);
  }

  public readLines(callback: (line: string) => void, endCallback: () => void) {
    this.callback = callback;
    this.endCallback = endCallback;
    const slice = this.file.slice(this.offset, this.offset + 8192);
    this.reader.readAsArrayBuffer(slice);
  }
}

Thanks again Peter O for the wonderful answer.

本文标签: javascriptRead FileReader object linebyline without loading the whole file into RAMStack Overflow