Skip to content

Inserting Pages from other PDFs

Jorj X. McKie edited this page Aug 4, 2016 · 10 revisions

Method insertPDF()

Method fitz.Document.insertPDF() allows you to insert page ranges from another PDF document. Usage looks like this:

doc1 = fitz.open("file1.pdf") # must be a PDF
doc2 = fitz.open("file2.pdf") # must be a PDF
doc1.insertPDF(doc2,
               from_page=n,   # first page to copied, defaults to 0
               to_page=m,     # last page to be copied, defaults to doc2.pageCount-1
               start_at=k,    # first page number in doc1, defaults to doc1.pageCount
               rotate=deg,    # rotate copied pages by multiples of 90 degrees
               links=True)    # also copy links of copied pages

Remarks

This makes available the MuPDF CLI tool mutool merge to Python. In technical PDF terms, for every page objects /Contents, /Resources, /MediaBox, /CropBox, /BleedBox, /TrimBox, /ArtBox, /Rotate, /UserUnit are copied.

Annotations are not copied, because they may reference pages outside the copy range.

Likewise bookmarks / outlines of doc2 are not copied. But the TOC structure of doc1 will be kept intact with the copy operation.

In PyMuPDF we have extended the copy scope in the following way:

  1. Links are copied if they point to pages in the copied range or to some outside resource.
  2. Table of contents (i.e. bookmarks) are copied, if complete documents are joined (see examples).
  3. Optionally rotate copied pages.

Obviously, from_page may equal to_page - then only one page is copied.

Less obvious: if you specify from_page > to_page (!), then the same range is copied, but back to front.

Examples

This will concatenate two PDFs, including their tables of content:

doc1.insertPDF(doc2)

Copy pages 10 to 20 from some PDF, but rotated and in reversed order in front of doc1 pages:

doc1.insertPDF(doc2, from_page = 20, to_page = 10,
               start_at = 0, rotate = -90)
Clone this wiki locally