admin管理员组

文章数量:1344979

Situation

Say the user may upload a file on a web page, that usually is of a big size (minimum 80 MB, can be way more), and of a specific type, say, for example, PDF.

Considering these are huge files, we wouldn't want to waste bandwidth unnecessarely uploading the file only to realize the file's type is wrong. Therefore, we'd want to make sure, on the client side, that the file is indeed a PDF file, and only THEN send it if it indeed it.

Fortunately, the PDF file format has a 5 bytes Magic number, equal to 25 50 44 46 2D.

(It is an example, it could be any file format, I'm using PDF as a reference. What matters is that it is a file format you can differentiate with its magic bytes, which we consider a good enough verification here. Besides, my question could be relevant to other cases, not just this file format example, please consider the PDF example solely as a way to give one practical example about the problem)

Hence my question: How would I read the 5 first bytes of the file, or more generally, the first N bytes of a file?

You wouldn't want to read the full file, since it can be huge and the client's hard drive might be slow, you really only need to read those five bytes, and only if they are correct, you will read the rest of the file to send it to the server.

If there isn't a way, is there any workarounds or ongoing proposals for such a feature?

What I've tried

The FileReader API allows to read a file into an array buffer (see this answer and the docs):

let reader = new FileReader();
  
reader.onload = function() {
  let arrayBuffer = this.result,
    array = new Uint8Array(arrayBuffer),
    binaryString = String.fromCharCode.apply(null, array);

  console.log(binaryString);
 }
reader.readAsArrayBuffer(this.files[0]);

This however reads the whole file.

Similar questions that do not give a solution to my question

  • Can you read line by line in javascript? 's accepted answer relies on some external API.
  • Is there a way to get specific parts of a file with FileReader in JavaScript? asked to read according to a specific character repeated accross the file, which was obviously not doable without reading the entire file first
  • All NodeJS-related questions: I'm doing this in a web browser, any non-Browser proposal isn't a solution

Comments

(Responding to significant comments here, since comments are meant to be temporary)

Does slicing of the actual file itself help? – @C3roe

It gives me the expected result, but what's the guarantee it does indeed only read the first n bytes and doesn't just read it all and then slice it? Is there any implementation details for this in standards? MDN states: "a new Blob object which contains data from a subset of the blob on which it's called.", implying there was a full blob to slice from in the first place.

Situation

Say the user may upload a file on a web page, that usually is of a big size (minimum 80 MB, can be way more), and of a specific type, say, for example, PDF.

Considering these are huge files, we wouldn't want to waste bandwidth unnecessarely uploading the file only to realize the file's type is wrong. Therefore, we'd want to make sure, on the client side, that the file is indeed a PDF file, and only THEN send it if it indeed it.

Fortunately, the PDF file format has a 5 bytes Magic number, equal to 25 50 44 46 2D.

(It is an example, it could be any file format, I'm using PDF as a reference. What matters is that it is a file format you can differentiate with its magic bytes, which we consider a good enough verification here. Besides, my question could be relevant to other cases, not just this file format example, please consider the PDF example solely as a way to give one practical example about the problem)

Hence my question: How would I read the 5 first bytes of the file, or more generally, the first N bytes of a file?

You wouldn't want to read the full file, since it can be huge and the client's hard drive might be slow, you really only need to read those five bytes, and only if they are correct, you will read the rest of the file to send it to the server.

If there isn't a way, is there any workarounds or ongoing proposals for such a feature?

What I've tried

The FileReader API allows to read a file into an array buffer (see this answer and the docs):

let reader = new FileReader();
  
reader.onload = function() {
  let arrayBuffer = this.result,
    array = new Uint8Array(arrayBuffer),
    binaryString = String.fromCharCode.apply(null, array);

  console.log(binaryString);
 }
reader.readAsArrayBuffer(this.files[0]);

This however reads the whole file.

Similar questions that do not give a solution to my question

  • Can you read line by line in javascript? 's accepted answer relies on some external API.
  • Is there a way to get specific parts of a file with FileReader in JavaScript? asked to read according to a specific character repeated accross the file, which was obviously not doable without reading the entire file first
  • All NodeJS-related questions: I'm doing this in a web browser, any non-Browser proposal isn't a solution

Comments

(Responding to significant comments here, since comments are meant to be temporary)

Does slicing of the actual file itself help? https://stackoverflow/a/24845020/1427878 – @C3roe

It gives me the expected result, but what's the guarantee it does indeed only read the first n bytes and doesn't just read it all and then slice it? Is there any implementation details for this in standards? MDN states: "a new Blob object which contains data from a subset of the blob on which it's called.", implying there was a full blob to slice from in the first place.

Share edited 22 hours ago RedStoneMatt asked 23 hours ago RedStoneMattRedStoneMatt 6844 silver badges15 bronze badges 19
  • 2 Does slicing of the actual file itself help? stackoverflow/a/24845020/1427878 – C3roe Commented 23 hours ago
  • 2 See this question: you should investigate whether your "first five bytes" approach, regardless of how you do it, is good enough to distinguish PDF from non-PDF files. – Pointy Commented 23 hours ago
  • 1 "however reads the whole file" - but still on the client side, so covers your requirement of not uploading. – fdomn-m Commented 22 hours ago
  • 1 @Pointy "whether your "first five bytes" approach, regardless of how you do it, is good enough to distinguish PDF from non-PDF files." No. Anybody can write those bytes to a file, followed by, for example, Opus audio packets, images, whatever, because File objects, which inherit from Blob allow for this new Blob([PDF_bytes, audio_bytes, image_bytes, any_other_bytes]). Generally, the accept attribute can be used, but that doesn't guarantee anything, either, relevant to the file the user decides to select, and the actual underlying bytes. – guest271314 Commented 22 hours ago
  • 1 Be careful with documentation on MDN - or anywhere else for that matter. In particular documentation on MDN can be wrong. – guest271314 Commented 22 hours ago
 |  Show 14 more comments

3 Answers 3

Reset to default 1

In client-side Javascript, a File object represents a file on the local file system. It does not immediately read the file into the browser's memory. Reading would be an asynchronous operation whereas a File can be constructed synchronously.

File is a subclass of Blob whose slice method produces another Blob (again, synchronously) that represents a subset of contiguous bytes from the file (again, without reading them).

Actually reading the file's or slice's contents requires the invocation of the asynchronous text method (or bytes or arrayBuffer methods), or the use of a ReadableStream obtained via the stream method. All these methods introduce asynchronousness.

One can therefore use small slices of a file

  • to read only a small part of it (this is the OP's use case, see the snippet below) or
  • to process the file in small chunks (see How can I read chunks from stream in JavaScript?, the memory allocation graph reflects the size of the chunks, not the size of the entire file).

async function sniff(file) {
  console.log(await file.slice(0, 5).text());
}
<input type="file" onchange="sniff(this.files[0])">

There is no problem in using the HTML tag . Currently, it is a pointer to the actual local file.

The key to read only the slice is present on the previous question. But I believe that it is worth showing all the related question constraints:

Get the file from the fileEvent change event listener

const file = event.target.files[0];

Then use the following functions, using your reader.

const firstFiveBytes = file.slice(0, 5); // Creates a small blob
reader.readAsArrayBuffer(firstFiveBytes); // Reads only 5 bytes

Then test against your magic string (It can be a number, or not)

const expected = [0x25, 0x50, 0x44, 0x46, 0x2D]; // or whatever you want to check.

Trust this helps.

Not possible using HTML <input type="file"> alone. The entire file is always read.

It is possible using a fetch() request with range header set - and a Web extension where the user can fetch() file: protocol; or launch a local server to fetch() the file with range header; or use Native Messaging with an extension, in which case you can do whatever you want using the local application on the users' machine.

Now if you just want to read the first N bytes from the full File object

let bytes = file.slice(0, 5);

本文标签: javascriptHow do I read the first N bytes of a file from an HTML File inputStack Overflow