admin管理员组

文章数量:1122846

Looking to use pytest-recording in my tests that involve connecting to and downloading data from S3.

I import all the functions from the script I'm testing. This is using prod env vars, but only to test downloading and reading data from S3 (not uploading). In a REPL, the exact same code works fine -- I can connect to my S3 instance, read and download the data that is on it.

Now, in my test suite, I keep getting an IncompleteReadError when running pytest --record-mode=once. This happens regardless of whether I delete existing cassettes or not.

Here's the original functions I'm testing:

def s3_connect_get_files(
    validated_target_params: dict,
) -> Tuple[s3fs.S3FileSystem, List[str]]:
    """
    connects to our s3 instance, returning
    our bucket s3fs object and a list of the files
    in the target infile directory specified in
    our data model for the target.

    returns:
        - Tuple(the_bucket (s3fs obj), files (list) )

    raises:
        - FileNotFoundError if we can't find the dir
        on our s3
    """
    try:
        the_bucket = s3fs.S3FileSystem(
            key=AWS_ACCESS_KEY_ID,
            secret=AWS_SECRET_ACCESS_KEY,
            client_kwargs={"endpoint_url": validated_target_params["endpoint_url"]},
        )
        files = the_bucket.glob(
            f"{validated_target_params['in_path']}/"
            f"{validated_target_params['glob_pattern']}"
        )
        return the_bucket, files
    except FileNotFoundError as e:
        logger.error("could not connect to s3. check credentials!")
        logger.error(f"original error type: {type(e).__name__}")
        logger.error(f"original error message: {e}")
        raise

def read_files(
    the_bucket: s3fs.S3FileSystem, files: List[str], validated_target_params: dict
) -> dict:
    """reads all files into memory

    raises:
        - NotImplementedError; if we encounter a reader
        type we haven't defined yet.
    """
    records = {}

    logger.info("Now reading data to be validated and de-duped.")
    for file in files:
        if (
            validated_target_params["reader"].value == "pandas"
        ):  # we need to call value as we're using an Enum
            try:
                df = pd.read_csv(the_bucket.open(file))
                file_last_modified = the_bucket.info(file).get("LastModified")

                df["file_last_modified"] = file_last_modified

                records[file] = {
                    "data": df,
                    "last_modified_at": file_last_modified,
                }
                logger.info(f"Loaded {file} with {len(df)} rows using pandas")
            except Exception as e:
                logger.error(f"original error type: {type(e).__name__}")
                logger.error(f"original error message: {e}")
                raise

        else:
            raise NotImplementedError(
                f"{validated_target_params['reader']} is not yet implemented as a reader."
            )

    return records

and here are the corresponding tests:

@pytest.mark.vcr()
def test_s3_connect_get_files(validated_target_params) -> None:
    """test s3 connection and file retrieval"""
    the_bucket, files = s3_connect_get_files(validated_target_params)
    assert isinstance(the_bucket, S3FileSystem)
    assert isinstance(files, list)


@pytest.mark.vcr()
def test_read_files_pandas(validated_target_params) -> None:
    """test reading files using pandas"""
    the_bucket, files = s3_connect_get_files(validated_target_params)
    records = read_files(the_bucket, files, validated_target_params)
    assert isinstance(records, dict)
    assert len(records) == len(files)
    assert all(isinstance(df["data"], pd.DataFrame) for df in records.values())

If I comment out test_read_files_pandas, it runs fine and all tests pass. If I keep it in, it inevitably fails, like this:

E           botocore.exceptions.IncompleteReadError: 0 read, but total bytes expected is 6163243.

.venv/lib/python3.11/site-packages/aiobotocore/response.py:125: IncompleteReadError

I am new to pytest-recording, and to be honest, not the best test writer ever. So, I do apologise if I've made a stupid mistake, and would greatly appreciate any pointers as to how to either get these tests to pass, or modify my functions.

本文标签: pythonpytestrecordingVCR for S3 IncompleteReadError (but only sometimes)Stack Overflow