download large file from URL using python - Stack Overflow

IT技术

更新时间：2025-04-112

admin管理员组
文章数量:1401849

I have a task to download around 16K+ ( max size is of 1GB) files from given URL to location. Files are of different format like pdf, ppt, doc, docx, zip, jpg, iso etc. So had written below piece of code which

downloads file some times and some times only 26KB file only will be downloaded.
Also sometimes get error message " [Errno 10054] An existing connection was forcibly closed by the remote host"

def download_file(s):
    for row in sheet.iter_rows(min_row=2):
        try:
            url = row[6].value #reading from excel
            # Send GET request to the URL
            response = s.get(url)    
            if response.status_code == 200:    
                with open(save_path, 'wb') as file:
                    file.write(response.content)
        except Exception as e:
            print(f"Error: {e}")


if __name__ == "__main__":
    with requests.session() as s:
        res = s.post(login_url, data=login_data)
        download_file(s)

Tried alternative approach using shutil and downloading in chunks . still the issue is observed. reffered solutions from here and here

import shutil
with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

response = requests.get(url, stream=True)
with open(book_name, 'wb') as f:
     for chunk in response.iter_content(1024 * 1024 * 2): 
        f.write(chunk)

I have a task to download around 16K+ ( max size is of 1GB) files from given URL to location. Files are of different format like pdf, ppt, doc, docx, zip, jpg, iso etc. So had written below piece of code which

downloads file some times and some times only 26KB file only will be downloaded.
Also sometimes get error message " [Errno 10054] An existing connection was forcibly closed by the remote host"

def download_file(s):
    for row in sheet.iter_rows(min_row=2):
        try:
            url = row[6].value #reading from excel
            # Send GET request to the URL
            response = s.get(url)    
            if response.status_code == 200:    
                with open(save_path, 'wb') as file:
                    file.write(response.content)
        except Exception as e:
            print(f"Error: {e}")


if __name__ == "__main__":
    with requests.session() as s:
        res = s.post(login_url, data=login_data)
        download_file(s)

Tried alternative approach using shutil and downloading in chunks . still the issue is observed. reffered solutions from here and here

import shutil
with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

response = requests.get(url, stream=True)
with open(book_name, 'wb') as f:
     for chunk in response.iter_content(1024 * 1024 * 2): 
        f.write(chunk)

Share Improve this question asked Mar 22 at 5:41 user166013 1,5415 gold badges22 silver badges38 bronze badges

Add a comment |

3 Answers 3

Sorted by: Reset to default 3

Streaming and Adding Retry Logic should resolve the issue you are facing, refer to the following code sample:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def download_file(url, local_filename, session):
    try:
        with session.get(url, stream=True, timeout=10) as response:
            response.raise_for_status()  # Raise an error on bad status codes
            with open(local_filename, 'wb') as f:
                for chunk in response.iter_content(chunk_size=8192):  # 8KB chunks
                    if chunk:  # filter out keep-alive new chunks
                        f.write(chunk)
        print(f"Downloaded {local_filename} successfully.")
    except Exception as e:
        print(f"Error downloading {local_filename}: {e}")

def create_session_with_retries():
    session = requests.Session()
    # Configure retries: 5 attempts with exponential backoff
    retries = Retry(
        total=5,
        backoff_factor=0.3,
        status_forcelist=[500, 502, 503, 504],
        allowed_methods=["GET", "POST"]
    )
    adapter = HTTPAdapter(max_retries=retries)
    session.mount('http://', adapter)
    session.mount('https://', adapter)
    return session

if __name__ == "__main__":
    # Example URL and filename
    url = "https://example/largefile.zip"
    local_filename = "largefile.zip"
    
    # Create a session with retries enabled
    session = create_session_with_retries()
    
    # Download the file
    download_file(url, local_filename, session)

I am not sure, but you could maybe try 'Session' instead of 'session' to keep-alive: like in the docs ^{https://requests.readthedocs.io/en/latest/user/advanced/#keep-alive}

if __name__ == "__main__":
    with requests.Session() as s:
        res = s.post(login_url, data=login_data)
        download_file(s)

Keep-Alive, and thus Session is your friend as mentioned by Mark.

Though, check the server HTTP response headers, if it is sending back Connection: close it may ignore the client keep-alive regardless. In which case, catching the exception and retrying may be the best approach.

本文标签： download large file from URL using pythonStack Overflow

版权声明：本文标题：download large file from URL using python - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1744324052a2600643.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

download large file from URL using python - Stack Overflow

3 Answers 3

更多相关文章