admin管理员组

文章数量:1402777

I am using the following function to download files from S3 using boto3 and tqdm:

def download_data(*, folder_path, Bucket, Prefix):
    """
    Download the data into a folder from S3 bucket.

    Parameters
    ----------
    folder_path : str or Path
        The path to the destination folder.
    Bucket : str
        The bucket from which we want to download a dataset.
    Prefix : str
        The dataset located in S3 that we are interested to download.

    Returns
    -------
    None
    """
    folder_path = Path(folder_path)
    folder_path.mkdir(parents=True, exist_ok=True)  # Ensures folder exists

    try:
        client = boto3.client('s3')
        objects = client.list_objects_v2(Bucket=Bucket, Prefix=f"{Prefix}/")

        Keys = [obj['Key'] for obj in objects.get('Contents', []) if obj['Key'].endswith('.csv')]

        if not Keys:
            return f"No files found in {Prefix}."

        for Key in Keys:
            Filename = folder_path / Path(Key).name

            file_size = client.head_object(Bucket=Bucket, Key=Key)['ContentLength']
            size_MB = file_size/(1024**2)
            with tqdm(total=file_size, desc=f'Downloading {Path(Key).name} [{size_MB:.2f} MB]', unit='B',
                      unit_scale=True, leave=True, bar_format='{l_bar}{bar}|{rate_fmt}') as pbar:
                def progress_callback(bytes_transferred):
                    pbar.update(bytes_transferred)
                client.download_file(Bucket=Bucket, Key=Key, Filename=Filename, Callback=progress_callback)
            if (len(Keys) == 1) or (Key == Keys[-1]):  # How to avoid this and still get correctly formatted output text and progress bar
                pbar.write("")  # Can we get proper formatting with print instead of this?
            sys.stdout.flush()  # Can we do away with this?
            pbar.write(f"Download path: {str(Filename)}")
        print(f"{Prefix}: Download completed successfully!\n"+"--"*40)
        sys.stdout.flush()  # Can we do away with this?
    except Exception as e:
        print(f"Error: {e}. Download did not complete successfully!")
        sys.stdout.flush()  # Can we do away with this?

I have commented at the troublesome parts in the above code. If I do not write code this way, I get badly formatted outputs (no new line as indicated in the image). I have illustrated the issue and the desired results in the following image:

Can anyone help me in cleaning up my code or getting rid of the 'hacks' in my code, and still ensuring that I get the desired output?

本文标签: