python - pytest-recordingVCR for S3: IncompleteReadError (but only sometimes?) - Stack Overflow-软件玩家

admin管理员组
文章数量:1122846

Looking to use pytest-recording in my tests that involve connecting to and downloading data from S3.

I import all the functions from the script I'm testing. This is using prod env vars, but only to test downloading and reading data from S3 (not uploading). In a REPL, the exact same code works fine -- I can connect to my S3 instance, read and download the data that is on it.

Now, in my test suite, I keep getting an IncompleteReadError when running pytest --record-mode=once. This happens regardless of whether I delete existing cassettes or not.

Here's the original functions I'm testing:

def s3_connect_get_files(
    validated_target_params: dict,
) -> Tuple[s3fs.S3FileSystem, List[str]]:
    """
    connects to our s3 instance, returning
    our bucket s3fs object and a list of the files
    in the target infile directory specified in
    our data model for the target.

    returns:
        - Tuple(the_bucket (s3fs obj), files (list) )

    raises:
        - FileNotFoundError if we can't find the dir
        on our s3
    """
    try:
        the_bucket = s3fs.S3FileSystem(
            key=AWS_ACCESS_KEY_ID,
            secret=AWS_SECRET_ACCESS_KEY,
            client_kwargs={"endpoint_url": validated_target_params["endpoint_url"]},
        )
        files = the_bucket.glob(
            f"{validated_target_params['in_path']}/"
            f"{validated_target_params['glob_pattern']}"
        )
        return the_bucket, files
    except FileNotFoundError as e:
        logger.error("could not connect to s3. check credentials!")
        logger.error(f"original error type: {type(e).__name__}")
        logger.error(f"original error message: {e}")
        raise

def read_files(
    the_bucket: s3fs.S3FileSystem, files: List[str], validated_target_params: dict
) -> dict:
    """reads all files into memory

    raises:
        - NotImplementedError; if we encounter a reader
        type we haven't defined yet.
    """
    records = {}

    logger.info("Now reading data to be validated and de-duped.")
    for file in files:
        if (
            validated_target_params["reader"].value == "pandas"
        ):  # we need to call value as we're using an Enum
            try:
                df = pd.read_csv(the_bucket.open(file))
                file_last_modified = the_bucket.info(file).get("LastModified")

                df["file_last_modified"] = file_last_modified

                records[file] = {
                    "data": df,
                    "last_modified_at": file_last_modified,
                }
                logger.info(f"Loaded {file} with {len(df)} rows using pandas")
            except Exception as e:
                logger.error(f"original error type: {type(e).__name__}")
                logger.error(f"original error message: {e}")
                raise

        else:
            raise NotImplementedError(
                f"{validated_target_params['reader']} is not yet implemented as a reader."
            )

    return records

and here are the corresponding tests:

@pytest.mark.vcr()
def test_s3_connect_get_files(validated_target_params) -> None:
    """test s3 connection and file retrieval"""
    the_bucket, files = s3_connect_get_files(validated_target_params)
    assert isinstance(the_bucket, S3FileSystem)
    assert isinstance(files, list)


@pytest.mark.vcr()
def test_read_files_pandas(validated_target_params) -> None:
    """test reading files using pandas"""
    the_bucket, files = s3_connect_get_files(validated_target_params)
    records = read_files(the_bucket, files, validated_target_params)
    assert isinstance(records, dict)
    assert len(records) == len(files)
    assert all(isinstance(df["data"], pd.DataFrame) for df in records.values())

If I comment out test_read_files_pandas, it runs fine and all tests pass. If I keep it in, it inevitably fails, like this:

E           botocore.exceptions.IncompleteReadError: 0 read, but total bytes expected is 6163243.

.venv/lib/python3.11/site-packages/aiobotocore/response.py:125: IncompleteReadError

I am new to pytest-recording, and to be honest, not the best test writer ever. So, I do apologise if I've made a stupid mistake, and would greatly appreciate any pointers as to how to either get these tests to pass, or modify my functions.

本文标签： pythonpytestrecordingVCR for S3 IncompleteReadError (but only sometimes)Stack Overflow

版权声明：本文标题：python - pytest-recordingVCR for S3: IncompleteReadError (but only sometimes?) - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1736301606a1931237.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - pytest-recordingVCR for S3: IncompleteReadError (but only sometimes?) - Stack Overflow

更多相关文章