admin管理员组

文章数量:1395014

I have a Python script, that basically looks like this:

import mypackage

# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()

# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)

I was trying to write a test for the create_the_dataframe function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.

Although I can live with this, I am very curious to understand why this is the case?

I have a Python script, that basically looks like this:

import mypackage

# this function generates always the same pandas.DataFrame
df = mypackage.create_the_dataframe()

# write the DataFrame to xlsx and csv
df.to_excel("the_dataframe_as.xlsx", index=False, engine="openpyxl")
df.to_csv("the_dataframe_as.csv", index=False)

I was trying to write a test for the create_the_dataframe function. So I checked the hash of the resulting xlsx and csv files and found that for two different runs of the script, the hash and file size of the resulting xlsx file changes. The hash for the csv remains the same.

Although I can live with this, I am very curious to understand why this is the case?

Share Improve this question edited Mar 27 at 9:58 d4tm4x asked Mar 27 at 8:56 d4tm4xd4tm4x 5884 silver badges18 bronze badges 2
  • 1 Have you tried changing/setting stuffs in the "to_excel" method? Perhaps setting the engine (i.e. defining it instead of leaving it blank) might work! pandas.pydata./docs/reference/api/… – user24758287 Commented Mar 27 at 9:15
  • I pinned it to engine="openpyxl" with the result being the same. So this seems more like an openpyxl topic. I'll update the question. – d4tm4x Commented Mar 27 at 9:57
Add a comment  | 

1 Answer 1

Reset to default 3

XLSX files contain metadata like the creation timestamp, which change with every newly written file. Plaintext CSV files do not contain such variable metadata, and thus their contents are entirely predictable.

本文标签: pythonNondeterministic behaviour of openpyxlStack Overflow