admin管理员组

文章数量:1182736

I need to convert a pdf into a list of images, do some manipulation there and then re-convert it into a pdf. It turns out that I am not able to recover the original size of the pdf pre-conversion while keeping a similar quality. This is the function that I am using.

import cv2
import fitz

def pdf_to_images_and_back(pdf_path: str, zoom = 2):

    matrix = fitz.Matrix(zoom, zoom)

    new_pdf_document = fitz.open()
    with fitz.open(pdf_path) as pdf_document:
        for page_number in range(len(pdf_document)):
            page = pdf_document.load_page(page_number)
            pix = page.get_pixmap(matrix=matrix)
            img = np.frombuffer(pix.samples, dtype=np.uint8).reshape(
                pix.height, pix.width, pix.n
            )
            if pix.n == 3: 
                img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
            cv2.imwrite(f"test_im_{page_number}.png", img)
            page = new_pdf_document.new_page()
            rect = fitz.Rect(0, 0, img.shape[1] / zoom, img.shape[0] / zoom)
            page.insert_image(rect, filename=f"test_im_{page_number}.png")

    new_pdf_document.save("output_pdf.pdf", deflate=True, garbage=3)

With a zoom of 1 the output has a comparable size to the original pdf, but the quality is way worse. With a zoom of 2 the quality stays about the same, but the size almost doubles. Therefore, I was wondering what I could do differently to maintain both quality and size.

Appreciate any insights or hints!

本文标签: pythonMaintain file size in pdf conversionStack Overflow