admin管理员组

文章数量:1313347

Running a FLASK Python web application that zips up a number of files and serves them to the user based on the user's filters.

Issue, after the user clicks download the backend pulls all the files and zip creation starts, but this can take minutes. The user won't know if it's hung.

I decided streaming the zip file as it's being created gets the file to the user quicker, and it also lets the user know that the web app is working on it. The issue with this is in order to use the browser's download section (the little pop up or the download page with progress bars), you need to provide the content-length header, but we don't know the size of the zip file because it hasn't finished being created yet. I've tried my best to estimate the size of the zip file once it's complete, and I thought it would have been easy as my zip is just ZIP_STORED, but there is internal zip structure that I'm not able to accurately measure. The browser just ends up rejecting the download with ERR_CONTENT_LENGTH_MISMATCH.

I can provide a Server Sent Event (SSE) route to make my own progress bar by reading the number of bytes sent and polling it in a seperate /progress route, but I really had my heart set on using the browsers download section and it's a point of pride for me at this point. I could also just not stream it, then as it's being created use SSE to provide updates on the zip, then once it's finished send it with a content-length header... Not quite as nice as I'd like it to be though.

def calculate_total_size(filenames):
    total_size = 0
    for file in filenames:
        matching_filepath, _ = get_file_info(file)
        if matching_filepath:
            total_size += os.path.getsize(matching_filepath)

    # Add overhead for ZIP file structure (22 bytes per file + 22 bytes for the central directory)
    total_size += 22 * (len(filenames) + 1)
    return total_size

def generate_file_entries(filenames):
    for file in filenames:
        matching_filepath, filename = get_file_info(file)
        if matching_filepath:
            file_stat = os.stat(matching_filepath)
            modified_at = datetime.utcfromtimestamp(file_stat.st_mtime)
            with open(matching_filepath, 'rb') as f:
                chunk = f.read()
                if isinstance(chunk, bytes):  # Ensure only bytes are yielded
                    yield filename, modified_at, 0o600, ZIP_64, [chunk]
                else:
                    print(f"Unexpected data type for file contents: {type(chunk)}")

Running a FLASK Python web application that zips up a number of files and serves them to the user based on the user's filters.

Issue, after the user clicks download the backend pulls all the files and zip creation starts, but this can take minutes. The user won't know if it's hung.

I decided streaming the zip file as it's being created gets the file to the user quicker, and it also lets the user know that the web app is working on it. The issue with this is in order to use the browser's download section (the little pop up or the download page with progress bars), you need to provide the content-length header, but we don't know the size of the zip file because it hasn't finished being created yet. I've tried my best to estimate the size of the zip file once it's complete, and I thought it would have been easy as my zip is just ZIP_STORED, but there is internal zip structure that I'm not able to accurately measure. The browser just ends up rejecting the download with ERR_CONTENT_LENGTH_MISMATCH.

I can provide a Server Sent Event (SSE) route to make my own progress bar by reading the number of bytes sent and polling it in a seperate /progress route, but I really had my heart set on using the browsers download section and it's a point of pride for me at this point. I could also just not stream it, then as it's being created use SSE to provide updates on the zip, then once it's finished send it with a content-length header... Not quite as nice as I'd like it to be though.

def calculate_total_size(filenames):
    total_size = 0
    for file in filenames:
        matching_filepath, _ = get_file_info(file)
        if matching_filepath:
            total_size += os.path.getsize(matching_filepath)

    # Add overhead for ZIP file structure (22 bytes per file + 22 bytes for the central directory)
    total_size += 22 * (len(filenames) + 1)
    return total_size

def generate_file_entries(filenames):
    for file in filenames:
        matching_filepath, filename = get_file_info(file)
        if matching_filepath:
            file_stat = os.stat(matching_filepath)
            modified_at = datetime.utcfromtimestamp(file_stat.st_mtime)
            with open(matching_filepath, 'rb') as f:
                chunk = f.read()
                if isinstance(chunk, bytes):  # Ensure only bytes are yielded
                    yield filename, modified_at, 0o600, ZIP_64, [chunk]
                else:
                    print(f"Unexpected data type for file contents: {type(chunk)}")
Share Improve this question asked Jan 30 at 18:03 john stamosjohn stamos 1,1255 gold badges18 silver badges38 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

Have you tried using a transfer method that does not require giving the full content length? Something like this

from flask import Flask, Response
import zipfile
import io

app = Flask(__name__)

def stream_zip():
    with io.BytesIO() as zip_buffer:
        with zipfile.ZipFile(zip_buffer, "w", zipfile.ZIP_STORED) as zip_file:
            files = {"file1.txt": "Hello, World!", "file2.txt": "Flask Streaming!"}
            for filename, content in files.items():
                zip_file.writestr(filename, content)
                zip_buffer.seek(0)
                yield zip_buffer.read()  # Yield current ZIP contents
                zip_buffer.truncate(0)  # Clear buffer after yielding

@app.route('/download')
def download():
    return Response(stream_zip(), mimetype="application/zip", headers={
        "Content-Disposition": "attachment; filename=download.zip"
    })

if __name__ == "__main__":
    app.run(debug=True)

Below is a more minimal example on streaming files

def stream_data():
    for i in range(10):
        yield f"Chunk {i}\n".encode()  # Simulate file content

@app.route('/download')
def download():
    return Response(stream_data(), mimetype="application/octet-stream", headers={
        "Content-Disposition": "attachment; filename=streamed.txt"
    })

本文标签: