admin管理员组文章数量:1344979
Situation
Say the user may upload a file on a web page, that usually is of a big size (minimum 80 MB, can be way more), and of a specific type, say, for example, PDF.
Considering these are huge files, we wouldn't want to waste bandwidth unnecessarely uploading the file only to realize the file's type is wrong. Therefore, we'd want to make sure, on the client side, that the file is indeed a PDF file, and only THEN send it if it indeed it.
Fortunately, the PDF file format has a 5 bytes Magic number, equal to 25 50 44 46 2D
.
(It is an example, it could be any file format, I'm using PDF as a reference. What matters is that it is a file format you can differentiate with its magic bytes, which we consider a good enough verification here. Besides, my question could be relevant to other cases, not just this file format example, please consider the PDF example solely as a way to give one practical example about the problem)
Hence my question: How would I read the 5 first bytes of the file, or more generally, the first N bytes of a file?
You wouldn't want to read the full file, since it can be huge and the client's hard drive might be slow, you really only need to read those five bytes, and only if they are correct, you will read the rest of the file to send it to the server.
If there isn't a way, is there any workarounds or ongoing proposals for such a feature?
What I've tried
The FileReader
API allows to read a file into an array buffer (see this answer and the docs):
let reader = new FileReader();
reader.onload = function() {
let arrayBuffer = this.result,
array = new Uint8Array(arrayBuffer),
binaryString = String.fromCharCode.apply(null, array);
console.log(binaryString);
}
reader.readAsArrayBuffer(this.files[0]);
This however reads the whole file.
Similar questions that do not give a solution to my question
- Can you read line by line in javascript? 's accepted answer relies on some external API.
- Is there a way to get specific parts of a file with FileReader in JavaScript? asked to read according to a specific character repeated accross the file, which was obviously not doable without reading the entire file first
- All NodeJS-related questions: I'm doing this in a web browser, any non-Browser proposal isn't a solution
Comments
(Responding to significant comments here, since comments are meant to be temporary)
Does slicing of the actual file itself help? – @C3roe
It gives me the expected result, but what's the guarantee it does indeed only read the first n bytes and doesn't just read it all and then slice it? Is there any implementation details for this in standards? MDN states: "a new Blob object which contains data from a subset of the blob on which it's called."
, implying there was a full blob to slice from in the first place.
Situation
Say the user may upload a file on a web page, that usually is of a big size (minimum 80 MB, can be way more), and of a specific type, say, for example, PDF.
Considering these are huge files, we wouldn't want to waste bandwidth unnecessarely uploading the file only to realize the file's type is wrong. Therefore, we'd want to make sure, on the client side, that the file is indeed a PDF file, and only THEN send it if it indeed it.
Fortunately, the PDF file format has a 5 bytes Magic number, equal to 25 50 44 46 2D
.
(It is an example, it could be any file format, I'm using PDF as a reference. What matters is that it is a file format you can differentiate with its magic bytes, which we consider a good enough verification here. Besides, my question could be relevant to other cases, not just this file format example, please consider the PDF example solely as a way to give one practical example about the problem)
Hence my question: How would I read the 5 first bytes of the file, or more generally, the first N bytes of a file?
You wouldn't want to read the full file, since it can be huge and the client's hard drive might be slow, you really only need to read those five bytes, and only if they are correct, you will read the rest of the file to send it to the server.
If there isn't a way, is there any workarounds or ongoing proposals for such a feature?
What I've tried
The FileReader
API allows to read a file into an array buffer (see this answer and the docs):
let reader = new FileReader();
reader.onload = function() {
let arrayBuffer = this.result,
array = new Uint8Array(arrayBuffer),
binaryString = String.fromCharCode.apply(null, array);
console.log(binaryString);
}
reader.readAsArrayBuffer(this.files[0]);
This however reads the whole file.
Similar questions that do not give a solution to my question
- Can you read line by line in javascript? 's accepted answer relies on some external API.
- Is there a way to get specific parts of a file with FileReader in JavaScript? asked to read according to a specific character repeated accross the file, which was obviously not doable without reading the entire file first
- All NodeJS-related questions: I'm doing this in a web browser, any non-Browser proposal isn't a solution
Comments
(Responding to significant comments here, since comments are meant to be temporary)
Does slicing of the actual file itself help? https://stackoverflow/a/24845020/1427878 – @C3roe
It gives me the expected result, but what's the guarantee it does indeed only read the first n bytes and doesn't just read it all and then slice it? Is there any implementation details for this in standards? MDN states: "a new Blob object which contains data from a subset of the blob on which it's called."
, implying there was a full blob to slice from in the first place.
3 Answers
Reset to default 1In client-side Javascript, a File
object represents a file on the local file system. It does not immediately read the file into the browser's memory. Reading would be an asynchronous operation whereas a File
can be constructed synchronously.
File
is a subclass of Blob
whose slice
method produces another Blob
(again, synchronously) that represents a subset of contiguous bytes from the file (again, without reading them).
Actually reading the file's or slice's contents requires the invocation of the asynchronous text
method (or bytes
or arrayBuffer
methods), or the use of a ReadableStream
obtained via the stream
method. All these methods introduce asynchronousness.
One can therefore use small slices of a file
- to read only a small part of it (this is the OP's use case, see the snippet below) or
- to process the file in small chunks (see How can I read chunks from stream in JavaScript?, the memory allocation graph reflects the size of the chunks, not the size of the entire file).
async function sniff(file) {
console.log(await file.slice(0, 5).text());
}
<input type="file" onchange="sniff(this.files[0])">
There is no problem in using the HTML tag . Currently, it is a pointer to the actual local file.
The key to read only the slice is present on the previous question. But I believe that it is worth showing all the related question constraints:
Get the file from the fileEvent change event listener
const file = event.target.files[0];
Then use the following functions, using your reader.
const firstFiveBytes = file.slice(0, 5); // Creates a small blob
reader.readAsArrayBuffer(firstFiveBytes); // Reads only 5 bytes
Then test against your magic string (It can be a number, or not)
const expected = [0x25, 0x50, 0x44, 0x46, 0x2D]; // or whatever you want to check.
Trust this helps.
Not possible using HTML <input type="file">
alone. The entire file is always read.
It is possible using a fetch()
request with range header set - and a Web extension where the user can fetch()
file:
protocol; or launch a local server to fetch()
the file with range header; or use Native Messaging with an extension, in which case you can do whatever you want using the local application on the users' machine.
Now if you just want to read the first N bytes from the full File
object
let bytes = file.slice(0, 5);
本文标签: javascriptHow do I read the first N bytes of a file from an HTML File inputStack Overflow
版权声明:本文标题:javascript - How do I read the first N bytes of a file from an HTML File input? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1743771234a2536161.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
File
objects, which inherit fromBlob
allow for thisnew Blob([PDF_bytes, audio_bytes, image_bytes, any_other_bytes])
. Generally, theaccept
attribute can be used, but that doesn't guarantee anything, either, relevant to the file the user decides to select, and the actual underlying bytes. – guest271314 Commented 22 hours ago