Releases: pymupdf/PyMuPDF
Interesting new features and several fixes
Fixes:
Implemented enhancement requests:
-
#855, which allows font subsetting using package fontTools
-
#870, which allows
convert_to_pdf
method also for PDF documents. -
#843,
Document.tobytes()
(formerlyDocument.write()
) now also support linearized output. Plus several extensions / improvements around supporting Python fileobjects. -
Added new methods to quickly determine whether a PDF has annotations or links.
-
Extended the
Document.scrub()
method with a new parameter, which allows to also remove page thumbnails. -
Added methods to directly inquire and set values in PDF objects - without the need to manipulating PDF object sources in an unwieldy way - see methods
Document.xref_set_key()
/Document.xref_get_key()
.
Continued the process of changing the naming convention for class methods and attributes to "snake_case"
. As announced before, this is a tedious, error-prone process, and requires special care to maintain a high backlevel support for existing scripts.
In future versions - probably synchronously to MuPDF v1.19.0 - we will remove definitions of old names, but a method for re-activating old aliases will remain available.
Bug Fixes and some new features
The recent introduction of "Discussions" by Github has been very motivating for our users.
Based on their feedback, several enhancement have been implemented.
Here is a selection:
- Most Python functions now have typing / annotation support .
- For PDF table-of-contents items, colors are now supported (reading and writing)
- PDF page label support for reading and writing
- Support personalized tagging of new annotations, fields and links for easier selection of relevant objects.
There also is a number of fixes - please consult the documentation.
Minor fixes, improved font metrics handling
Font metrics handling has been improved: text box writing now observes the relevant font properties when determining line heights.
In this course a new option has been introduced, which allows getting text bboxes (glyphs, spans, text search quads, etc.) that more exactly wrap the text only - as opposed to always returning line height bboxes.
Fixes:
Better Optional Content support
Introducing PDF Optional Content
New features for text searching and more
This resolves
and removes the hit_max parameter from text searching. In addition, hyphenated words around line breaks are still found.
The use of the clip
parameter in text searches and text extractions now only includes characters whose bboxes are fully contained in the clip rctangle.
Important fixes, some improvements for drawing extraction
Support MuPDF v1.18.0
This version fixes the following issues:
- #519 - method
Page.cleanContents()
should no longer destroy the PDF page's appearance. In earlier versions, this upstream bug occurred in rare cases. - #675 - unsuccessful storage allocations (e.g. for extremely large pixmaps), could occasionally lead to interpreter crashes. This should now always be prevented (fingers crossed).
- #668 - the specification of line dashes in PDF is now correctly documented.
- #669 - fixed a major cause of memory leakage in method
Document.insertPDF
.
The following new features or improvements are included:
- Text extraction method
Page.getText()
now also works for annotations:Annot.getText()
. - Text from within a rectangle can now be extracted directly via
Page.getTextbox(rect)
. This may obsolete extra scripts in many cases. - When applying redactions on PDF pages, the handling of images can now be fine-controlled via a new parameter.
- The DPI (resolution) of PNG images created from pixmaps is now automatically set from the
Pixmap.xres
andPixmap.yres
values.