-
Notifications
You must be signed in to change notification settings - Fork 614
Inserting Pages from other PDFs
Method fitz.Document.insertPDF()
allows you to insert page ranges from another PDF document. Usage looks like this:
doc1 = fitz.open("file1.pdf") # must be a PDF
doc2 = fitz.open("file2.pdf") # must be a PDF
doc1.insertPDF(doc2, # cannot be the same object as doc1
from_page=n, # first page to copy, default: 0
to_page=m, # last page to copy, default: last page
start_at=k, # target location in doc1, default: at end
rotate=deg, # rotate copied pages
links=True) # also copy links & annotations
Except doc2
, all parameters are optional.
This makes available the MuPDF CLI tool mutool merge
to Python. In technical PDF terms, for every page object, /Contents, /Resources, /MediaBox, /CropBox, /BleedBox, /TrimBox, /ArtBox, /Rotate, /UserUnit, /Annots are copied.
Bookmarks / outlines of doc2 are not copied. But the TOC structure of doc1 will remain intact with the copy operation.
In PyMuPDF we have extended the copy scope in the following way:
- Annotations (and Links) are copied if they point to pages in the copy range or to some outside resource.
- Optionally rotate copied pages.
-
doc1
anddoc2
must not be the same object, but may be the same file (opened twice under different objects)
Obviously, from_page
may equal to_page
- then only one page is copied.
Less obvious: if you specify from_page > to_page (!), then the same range is copied, but back to front.
It is quite easy to create joined tables of content (TOC) when concatenating complete files - see below. For a more sophisticated solution look at this example. It can join arbitrary ranges of PDF files together with their respective TOC pieces.
This will concatenate two PDFs, also joining their tables of content:
len1 = len(doc1) # number of doc1 pages
toc1 = doc1.getToC(False) # full TOC of doc1
toc2 = doc2.getToC(False) # full TOC of doc2
for bm in toc2: # bookmarks of doc2 ...
bm[2] += len1 # need increased page numbers
toc = toc1 + toc2 # concatenate full TOC's
doc1.insertPDF(doc2) # concatenate PDFs
doc1.setToC(toc) # new TOC
Copy pages 10 to 20 from some PDF, but rotated, in reversed order and in front of the doc1 pages:
doc1.insertPDF(doc2, from_page=20, to_page=10,
start_at=0, rotate=-90)
This snippet will create a new PDF from the last page of a bunch of input files. Please especially note how we specify those last pages:
>>> import fitz
>>> flist=["1.pdf","2.pdf","3.pdf","4.pdf",]
>>> doc=fitz.open()
>>> for f in flist:
x=fitz.open(f)
doc.insertPDF(x, from_page=len(x)-1, to_page=len(x)-1, rotate=90)
x.close()
>>> doc.save("out.pdf", deflate=True, garbage=3)
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance