admin管理员组文章数量:1336632
I want to to decompress raw data from a file in an exotic format, but I know that the compression method is the same that is used in a ZIP file (PKZIP).
In the file the PK\03\04 signature is missing. After that the data more or less fits the PKZIP header specs:
/
- 2 bytes - version = 0x0014 (I don't know if it's meaningful)
- 2 byte flags = 0
- 2 bytes compression method = 0x0008 ("deflated" according to ZIP docs)
- random 4 bytes (modification times)
- random 4 bytes (should be the CRC32)
- 4 bytes of valid compressed size
- 4 bytes of valid uncompressed size
- file name length = 0x14
- extra field length = 0
- file name - 20 random bytes
Then the raw compressed data, and after that the End Record that looks damaged in a similar way. After adding the signature and valid file name characters and saving the buffer to a file, I was able to decompress it with 7zip. It showed an error dialog, but produced an uncompressed file. The resulting file contained the expected data.
I know that there is always one compressed file and the compression method is fixed. The file name is not important, so I guess it should be possible to process only the compressed data bytes after the header, ignoring the End Record as well.
Which Python package provides such functionality?
I want to ignore the ZIP headers and pass only the compressed data buffer to some function in Python (possibly specifying the compression method and some flags), and get the uncompressed data buffer back. No CRC check, no file names.
I want to to decompress raw data from a file in an exotic format, but I know that the compression method is the same that is used in a ZIP file (PKZIP).
In the file the PK\03\04 signature is missing. After that the data more or less fits the PKZIP header specs:
https://docs.fileformat/compression/zip/
- 2 bytes - version = 0x0014 (I don't know if it's meaningful)
- 2 byte flags = 0
- 2 bytes compression method = 0x0008 ("deflated" according to ZIP docs)
- random 4 bytes (modification times)
- random 4 bytes (should be the CRC32)
- 4 bytes of valid compressed size
- 4 bytes of valid uncompressed size
- file name length = 0x14
- extra field length = 0
- file name - 20 random bytes
Then the raw compressed data, and after that the End Record that looks damaged in a similar way. After adding the signature and valid file name characters and saving the buffer to a file, I was able to decompress it with 7zip. It showed an error dialog, but produced an uncompressed file. The resulting file contained the expected data.
I know that there is always one compressed file and the compression method is fixed. The file name is not important, so I guess it should be possible to process only the compressed data bytes after the header, ignoring the End Record as well.
Which Python package provides such functionality?
I want to ignore the ZIP headers and pass only the compressed data buffer to some function in Python (possibly specifying the compression method and some flags), and get the uncompressed data buffer back. No CRC check, no file names.
Share Improve this question edited Nov 19, 2024 at 18:08 Mark Adler 113k15 gold badges132 silver badges174 bronze badges asked Nov 19, 2024 at 16:37 PanJanekPanJanek 6,6852 gold badges36 silver badges42 bronze badges 1 |1 Answer
Reset to default 0If the compression method is 8, then you can use Python's zlib module, passing wbits=-15
to either zlib.decompress()
or zlib.decompressobj()
.
本文标签: How to decompress raw PKZIP data without zip header in pythonStack Overflow
版权声明:本文标题:How to decompress raw PKZIP data without zip header in python? - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1742412809a2470129.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
zlib.crc32()
in that same module. – Mark Adler Commented Nov 19, 2024 at 18:48