Skip to content

Inserting Pages from other PDFs

Jorj X. McKie edited this page Jan 17, 2019 · 10 revisions

Method insertPDF()

Method fitz.Document.insertPDF() allows you to insert page ranges from another PDF document. Usage looks like this:

doc1 = fitz.open("file1.pdf") # must be a PDF
doc2 = fitz.open("file2.pdf") # must be a PDF
doc1.insertPDF(doc2,          # cannot be the same object as doc1
               from_page=n,   # first page to copy, default: 0
               to_page=m,     # last page to copy, default: last page
               start_at=k,    # target location in doc1, default: at end
               rotate=deg,    # rotate copied pages
               links=True)    # also copy links & annotations

Except doc2, all parameters are optional.

Remarks

This makes available the MuPDF CLI tool mutool merge to Python. In technical PDF terms, for every page object, /Contents, /Resources, /MediaBox, /CropBox, /BleedBox, /TrimBox, /ArtBox, /Rotate, /UserUnit, /Annots are copied.

Bookmarks / outlines of doc2 are not copied. But the TOC structure of doc1 will remain intact with the copy operation.

In PyMuPDF we have extended the copy scope in the following way:

  1. Annotations (and Links) are copied if they point to pages in the copy range or to some outside resource.
  2. Optionally rotate copied pages.
  3. doc1 and doc2 must not be the same object, but may be the same file (opened twice under different objects)

Obviously, from_page may equal to_page - then only one page is copied.

Less obvious: if you specify from_page > to_page (!), then the same range is copied, but back to front.

It is quite easy to create joined tables of content (TOC) when concatenating complete files - see below. For a more sophisticated solution look at this example. It can join arbitrary ranges of PDF files together with their respective TOC pieces.

Examples

This will concatenate two PDFs, also joining their tables of content:

len1 = len(doc1)                      # number of doc1 pages
toc1 = doc1.getToC(False)             # full TOC of doc1
toc2 = doc2.getToC(False)             # full TOC of doc2
for bm in toc2:                       # bookmarks of doc2 ...
    bm[2] += len1                     # need increased page numbers
toc = toc1 + toc2                     # concatenate full TOC's
doc1.insertPDF(doc2)                  # concatenate PDFs
doc1.setToC(toc)                      # new TOC

Copy pages 10 to 20 from some PDF, but rotated, in reversed order and in front of the doc1 pages:

doc1.insertPDF(doc2, from_page=20, to_page=10,
               start_at=0, rotate=-90)

This snippet will create a new PDF from the last page of a bunch of input files. Please especially note how we specify those last pages:

>>> import fitz
>>> flist=["1.pdf","2.pdf","3.pdf","4.pdf",]
>>> doc=fitz.open()
>>> for f in flist:
	x=fitz.open(f)
	doc.insertPDF(x, from_page=len(x)-1, to_page=len(x)-1, rotate=90)
        x.close()
>>> doc.save("out.pdf", deflate=True, garbage=3)
Clone this wiki locally