-
Notifications
You must be signed in to change notification settings - Fork 600
Dealing with Embedded Files
Jorj X. McKie edited this page Mar 6, 2018
·
7 revisions
Since MuPDF v1.11, PyMuPDF with its v1.11.0 can deal with embedded files.
This feature (PDF 1.4 format) allows attaching arbitrary data or files to PDF documents. With PyMuPDF, such embedded data can be added, deleted, extracted and modified.
Here is a script that packs a bunch of files into a new PDF.
import fitz
import os
sdir = "D:/Jorj/Wissen/Spektrum"
flist = os.listdir(sdir)
flist.sort()
doc = fitz.open()
page = doc.newPage()
text = ["This file contains the following documents:", ""]
for f in flist:
if not (f.endswith(".pdf") and f.startswith("sdw_2017")):
continue
text.append(f)
buffer = open(os.path.join(sdir, f), "rb").read()
doc.embeddedFileAdd(buffer, f)
page.insertText(fitz.Point(50,100), text)
doc.save("embedded.pdf", garbage=4, deflate=True)
Most PDF viewers will offer to display an embedded file:
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance