admin管理员组文章数量:1182736
I need to convert a pdf into a list of images, do some manipulation there and then re-convert it into a pdf. It turns out that I am not able to recover the original size of the pdf pre-conversion while keeping a similar quality. This is the function that I am using.
import cv2
import fitz
def pdf_to_images_and_back(pdf_path: str, zoom = 2):
matrix = fitz.Matrix(zoom, zoom)
new_pdf_document = fitz.open()
with fitz.open(pdf_path) as pdf_document:
for page_number in range(len(pdf_document)):
page = pdf_document.load_page(page_number)
pix = page.get_pixmap(matrix=matrix)
img = np.frombuffer(pix.samples, dtype=np.uint8).reshape(
pix.height, pix.width, pix.n
)
if pix.n == 3:
img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imwrite(f"test_im_{page_number}.png", img)
page = new_pdf_document.new_page()
rect = fitz.Rect(0, 0, img.shape[1] / zoom, img.shape[0] / zoom)
page.insert_image(rect, filename=f"test_im_{page_number}.png")
new_pdf_document.save("output_pdf.pdf", deflate=True, garbage=3)
With a zoom of 1 the output has a comparable size to the original pdf, but the quality is way worse. With a zoom of 2 the quality stays about the same, but the size almost doubles. Therefore, I was wondering what I could do differently to maintain both quality and size.
Appreciate any insights or hints!
本文标签: pythonMaintain file size in pdf conversionStack Overflow
版权声明:本文标题:python - Maintain file size in pdf conversion - Stack Overflow 内容由网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://www.betaflare.com/web/1738271777a2072299.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论