python - Maintain file size in pdf conversion - Stack Overflow

IT技术

更新时间：2025-01-311

admin管理员组
文章数量:1182736

I need to convert a pdf into a list of images, do some manipulation there and then re-convert it into a pdf. It turns out that I am not able to recover the original size of the pdf pre-conversion while keeping a similar quality. This is the function that I am using.

import cv2
import fitz

def pdf_to_images_and_back(pdf_path: str, zoom = 2):

    matrix = fitz.Matrix(zoom, zoom)

    new_pdf_document = fitz.open()
    with fitz.open(pdf_path) as pdf_document:
        for page_number in range(len(pdf_document)):
            page = pdf_document.load_page(page_number)
            pix = page.get_pixmap(matrix=matrix)
            img = np.frombuffer(pix.samples, dtype=np.uint8).reshape(
                pix.height, pix.width, pix.n
            )
            if pix.n == 3: 
                img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
            cv2.imwrite(f"test_im_{page_number}.png", img)
            page = new_pdf_document.new_page()
            rect = fitz.Rect(0, 0, img.shape[1] / zoom, img.shape[0] / zoom)
            page.insert_image(rect, filename=f"test_im_{page_number}.png")

    new_pdf_document.save("output_pdf.pdf", deflate=True, garbage=3)

With a zoom of 1 the output has a comparable size to the original pdf, but the quality is way worse. With a zoom of 2 the quality stays about the same, but the size almost doubles. Therefore, I was wondering what I could do differently to maintain both quality and size.

Appreciate any insights or hints!

本文标签： pythonMaintain file size in pdf conversionStack Overflow

版权声明：本文标题：python - Maintain file size in pdf conversion - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.betaflare.com/web/1738271777a2072299.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

编程频道|软件玩家 - 软件改变生活！

python - Maintain file size in pdf conversion - Stack Overflow

更多相关文章