-
Notifications
You must be signed in to change notification settings - Fork 599
Dealing with Embedded Files
The recipe below the line can now be implemented much easier usign the new (v.1.16.8) cammand line interface of PyMuPDF. Also note that MuPDF does no longer support embedded files since after their version 1.14, so PyMuPDF is unique in that respect. The following command of the fitz module deal with embedded files:
-
python -m fitz embed-info
- list embedded files (long and short versions) -
python -m fitz embed-add
- insert a new file -
python -m fitz embed-del
- delete a file -
python -m fitz embed-upd
- update a file -
python -m fitz embed-extract
- extract data of a file -
python -m fitz embed-copy
- copy embedded files between two PDFs
Please see the documentation for details or request help (-h
) from the command line.
Since MuPDF v1.11, PyMuPDF with its v1.11.0 can deal with embedded files.
This feature (PDF 1.4 format) allows attaching arbitrary data or files to PDF documents. With PyMuPDF, such embedded data can be added, deleted, extracted and modified.
Here is a script that packs a bunch of files into a new PDF.
import fitz
import os
sdir = "D:/Jorj/Wissen/Spektrum"
flist = os.listdir(sdir)
flist.sort()
doc = fitz.open()
page = doc.newPage()
text = ["This file contains the following documents:", ""]
for f in flist:
if not (f.endswith(".pdf") and f.startswith("sdw_2017")):
continue
text.append(f)
buffer = open(os.path.join(sdir, f), "rb").read()
doc.embeddedFileAdd(buffer, f)
page.insertText(fitz.Point(50,100), text)
doc.save("embedded.pdf", garbage=4, deflate=True)
Most PDF viewers will offer to display an embedded file:
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance