admin管理员组文章数量:1122846
Looking to use pytest-recording
in my tests that involve connecting to and downloading data from S3.
I import all the functions from the script I'm testing. This is using prod env vars, but only to test downloading and reading data from S3 (not uploading). In a REPL, the exact same code works fine -- I can connect to my S3 instance, read and download the data that is on it.
Now, in my test suite, I keep getting an IncompleteReadError
when running pytest --record-mode=once
. This happens regardless of whether I delete existing cassettes or not.
Here's the original functions I'm testing:
def s3_connect_get_files(
validated_target_params: dict,
) -> Tuple[s3fs.S3FileSystem, List[str]]:
"""
connects to our s3 instance, returning
our bucket s3fs object and a list of the files
in the target infile directory specified in
our data model for the target.
returns:
- Tuple(the_bucket (s3fs obj), files (list) )
raises:
- FileNotFoundError if we can't find the dir
on our s3
"""
try:
the_bucket = s3fs.S3FileSystem(
key=AWS_ACCESS_KEY_ID,
secret=AWS_SECRET_ACCESS_KEY,
client_kwargs={"endpoint_url": validated_target_params["endpoint_url"]},
)
files = the_bucket.glob(
f"{validated_target_params['in_path']}/"
f"{validated_target_params['glob_pattern']}"
)
return the_bucket, files
except FileNotFoundError as e:
logger.error("could not connect to s3. check credentials!")
logger.error(f"original error type: {type(e).__name__}")
logger.error(f"original error message: {e}")
raise
def read_files(
the_bucket: s3fs.S3FileSystem, files: List[str], validated_target_params: dict
) -> dict:
"""reads all files into memory
raises:
- NotImplementedError; if we encounter a reader
type we haven't defined yet.
"""
records = {}
logger.info("Now reading data to be validated and de-duped.")
for file in files:
if (
validated_target_params["reader"].value == "pandas"
): # we need to call value as we're using an Enum
try:
df = pd.read_csv(the_bucket.open(file))
file_last_modified = the_bucket.info(file).get("LastModified")
df["file_last_modified"] = file_last_modified
records[file] = {
"data": df,
"last_modified_at": file_last_modified,
}
logger.info(f"Loaded {file} with {len(df)} rows using pandas")
except Exception as e:
logger.error(f"original error type: {type(e).__name__}")
logger.error(f"original error message: {e}")
raise
else:
raise NotImplementedError(
f"{validated_target_params['reader']} is not yet implemented as a reader."
)
return records
and here are the corresponding tests:
@pytest.mark.vcr()
def test_s3_connect_get_files(validated_target_params) -> None:
"""test s3 connection and file retrieval"""
the_bucket, files = s3_connect_get_files(validated_target_params)
assert isinstance(the_bucket, S3FileSystem)
assert isinstance(files, list)
@pytest.mark.vcr()
def test_read_files_pandas(validated_target_params) -> None:
"""test reading files using pandas"""
the_bucket, files = s3_connect_get_files(validated_target_params)
records = read_files(the_bucket, files, validated_target_params)
assert isinstance(records, dict)
assert len(records) == len(files)
assert all(isinstance(df["data"], pd.DataFrame) for df in records.values())
If I comment out test_read_files_pandas
, it runs fine and all tests pass. If I keep it in, it inevitably fails, like this:
E botocore.exceptions.IncompleteReadError: 0 read, but total bytes expected is 6163243.
.venv/lib/python3.11/site-packages/aiobotocore/response.py:125: IncompleteReadError
I am new to pytest-recording
, and to be honest, not the best test writer ever. So, I do apologise if I've made a stupid mistake, and would greatly appreciate any pointers as to how to either get these tests to pass, or modify my functions.
本文标签: pythonpytestrecordingVCR for S3 IncompleteReadError (but only sometimes)Stack Overflow
版权声明:本文标题:python - pytest-recordingVCR for S3: IncompleteReadError (but only sometimes?) - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1736301606a1931237.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论