Skip to content

Dealing with Embedded Files

Jorj X. McKie edited this page Nov 19, 2019 · 7 revisions

The recipe below the line can now be implemented much easier usign the new (v.1.16.8) cammand line interface of PyMuPDF. Also note that MuPDF does no longer support embedded files since after their version 1.14, so PyMuPDF is unique in that respect. The following command of the fitz module deal with embedded files:

  • python -m fitz embed-info - list embedded files (long and short versions)
  • python -m fitz embed-add - insert a new file
  • python -m fitz embed-del - delete a file
  • python -m fitz embed-upd - update a file
  • python -m fitz embed-extract - extract data of a file
  • python -m fitz embed-copy - copy embedded files between two PDFs

Please see the documentation for details or request help (-h) from the command line.


Since MuPDF v1.11, PyMuPDF with its v1.11.0 can deal with embedded files.

This feature (PDF 1.4 format) allows attaching arbitrary data or files to PDF documents. With PyMuPDF, such embedded data can be added, deleted, extracted and modified.

Here is a script that packs a bunch of files into a new PDF.

import fitz
import os
sdir = "D:/Jorj/Wissen/Spektrum"

flist = os.listdir(sdir)
flist.sort()
doc = fitz.open()
page = doc.newPage()
text = ["This file contains the following documents:", ""]
for f in flist:
    if not (f.endswith(".pdf") and f.startswith("sdw_2017")):
        continue
    text.append(f)
    buffer = open(os.path.join(sdir, f), "rb").read()
    doc.embeddedFileAdd(buffer, f)

page.insertText(fitz.Point(50,100), text)
doc.save("embedded.pdf", garbage=4, deflate=True)

Most PDF viewers will offer to display an embedded file:

resulting PDF

Clone this wiki locally