admin管理员组

文章数量:1336632

I want to to decompress raw data from a file in an exotic format, but I know that the compression method is the same that is used in a ZIP file (PKZIP).

In the file the PK\03\04 signature is missing. After that the data more or less fits the PKZIP header specs:

/

  1. 2 bytes - version = 0x0014 (I don't know if it's meaningful)
  2. 2 byte flags = 0
  3. 2 bytes compression method = 0x0008 ("deflated" according to ZIP docs)
  4. random 4 bytes (modification times)
  5. random 4 bytes (should be the CRC32)
  6. 4 bytes of valid compressed size
  7. 4 bytes of valid uncompressed size
  8. file name length = 0x14
  9. extra field length = 0
  10. file name - 20 random bytes

Then the raw compressed data, and after that the End Record that looks damaged in a similar way. After adding the signature and valid file name characters and saving the buffer to a file, I was able to decompress it with 7zip. It showed an error dialog, but produced an uncompressed file. The resulting file contained the expected data.

I know that there is always one compressed file and the compression method is fixed. The file name is not important, so I guess it should be possible to process only the compressed data bytes after the header, ignoring the End Record as well.

Which Python package provides such functionality?

I want to ignore the ZIP headers and pass only the compressed data buffer to some function in Python (possibly specifying the compression method and some flags), and get the uncompressed data buffer back. No CRC check, no file names.

I want to to decompress raw data from a file in an exotic format, but I know that the compression method is the same that is used in a ZIP file (PKZIP).

In the file the PK\03\04 signature is missing. After that the data more or less fits the PKZIP header specs:

https://docs.fileformat/compression/zip/

  1. 2 bytes - version = 0x0014 (I don't know if it's meaningful)
  2. 2 byte flags = 0
  3. 2 bytes compression method = 0x0008 ("deflated" according to ZIP docs)
  4. random 4 bytes (modification times)
  5. random 4 bytes (should be the CRC32)
  6. 4 bytes of valid compressed size
  7. 4 bytes of valid uncompressed size
  8. file name length = 0x14
  9. extra field length = 0
  10. file name - 20 random bytes

Then the raw compressed data, and after that the End Record that looks damaged in a similar way. After adding the signature and valid file name characters and saving the buffer to a file, I was able to decompress it with 7zip. It showed an error dialog, but produced an uncompressed file. The resulting file contained the expected data.

I know that there is always one compressed file and the compression method is fixed. The file name is not important, so I guess it should be possible to process only the compressed data bytes after the header, ignoring the End Record as well.

Which Python package provides such functionality?

I want to ignore the ZIP headers and pass only the compressed data buffer to some function in Python (possibly specifying the compression method and some flags), and get the uncompressed data buffer back. No CRC check, no file names.

Share Improve this question edited Nov 19, 2024 at 18:08 Mark Adler 113k15 gold badges132 silver badges174 bronze badges asked Nov 19, 2024 at 16:37 PanJanekPanJanek 6,6852 gold badges36 silver badges42 bronze badges 1
  • I recommend that you nevertheless do a CRC check. You can use zlib.crc32() in that same module. – Mark Adler Commented Nov 19, 2024 at 18:48
Add a comment  | 

1 Answer 1

Reset to default 0

If the compression method is 8, then you can use Python's zlib module, passing wbits=-15 to either zlib.decompress() or zlib.decompressobj().

本文标签: How to decompress raw PKZIP data without zip header in pythonStack Overflow