How to remove inline images from a pdf (to reduce the file size) #4622
Replies: 2 comments
-
Interesting question!
You can use a rather hacky approach to remove inline images, namely by "editing" the appearance stream. The following loops through a page's import pymupdf
doc = pymupdf.open("input.pdf")
for page in doc:
page.clean_contents(sanitize=False) # recommended to ensure a predictable stream formatting
xref = page.get_contents()[0] # there is only one now because of the clean
contents = bytearray(doc.xref_stream(xref)) # modifiable version
p0 = 0 # start of b"BI " - begin of inline image
modified = False # switch showing we encountered an image
while True:
p0 = contents.find(b"BI ", p0) # look for image start
if p0 < 0:
break
p1 = contents.find(b" EI", p0) # look for end of image
if p1 < 0: # should not happen!
print("stream format error - stopping")
break
contents[p0 + 3, p1] = b"" # remove image content
modified = True
p0 += 3 # step behind "BI " marker
if modified:
doc.update_stream(xref, contents) I hope it works! Didn't test it. |
Beta Was this translation helpful? Give feedback.
-
You can also try to use redactions for image removal: import pymupdf
doc = pymupdf.open("input.pdf")
for page in doc:
page.add_redact_annot(page.rect)
page.apply_redactions(
# remove images
images=pymupdf.PDF_REDACT_IMAGE_REMOVE,
# don't touch graphics
graphics=pymupdf.PDF_REDACT_LINE_ART_NONE,
# don't touch text
text=pymupdf.PDF_REDACT_TEXT_NONE,
)
doc.ez_save("images_stripped.pdf") I think this removes all images - whether or not inline.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am able to extract most images with page.get_images(), and we can use the xrefs to delete the images (e.g. for the purpose of reducing the overall pdf file size).
We can also access inline images using page.get_text("dict")["blocks"] (if the block["type"] = 1). How would we also delete these images from the pdf data to reduce the file size?
Beta Was this translation helpful? Give feedback.
All reactions