You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: PyMuPDF is a Python binding for the PDF rendering library MuPDF
11
11
Description:
12
-
Release date: October 18, 2020
12
+
Release date: October 7, 2020
13
13
14
14
Authors
15
15
=======
@@ -20,7 +20,7 @@ Description:
20
20
Introduction
21
21
============
22
22
23
-
This is **version 1.18.1 of PyMuPDF**, a Python binding for `MuPDF <http://mupdf.com/>`_ - "a lightweight PDF and XPS viewer".
23
+
This is **version 1.18.2 of PyMuPDF**, a Python binding for `MuPDF <http://mupdf.com/>`_ - "a lightweight PDF and XPS viewer".
24
24
25
25
MuPDF can access files in PDF, XPS, OpenXPS, epub, comic and fiction book formats, and it is known for both, its top performance and high rendering quality.
@@ -14,7 +14,7 @@ On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [ - "a lightweight PDF, XPS, and E-book viewer".
17
+
This is **version 1.18.2 of PyMuPDF**, a Python binding with support for [MuPDF 1.18.*](http://mupdf.com/) - "a lightweight PDF, XPS, and E-book viewer".
18
18
19
19
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
Copy file name to clipboardExpand all lines: docs/annot.rst
+1-1
Original file line number
Diff line number
Diff line change
@@ -90,7 +90,7 @@ There is a parent-child relationship between an annotation and its page. If the
90
90
91
91
*(New in 1.18.0)*
92
92
93
-
Retrieves the content of the annotation in a variety of formats -- much like the same method for :ref:`Page`.. This currently only delivers relevant data for annotation types 'FreeText' and 'Stamp'. Other type will return an empty string (or equivalent objects).
93
+
Retrieves the content of the annotation in a variety of formats -- much like the same method for :ref:`Page`.. This currently only delivers relevant data for annotation types 'FreeText' and 'Stamp'. Other types return an empty string (or equivalent objects).
94
94
95
95
:arg str opt: the desired format - one of the following values. Please note that this method works exactly like the same-named method of :ref:`Page`.
Copy file name to clipboardExpand all lines: docs/changes.rst
+16-1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,21 @@
1
1
Change Logs
2
2
===============
3
3
4
+
Changes in Version 1.18.2
5
+
---------------------------
6
+
This version contains some interesting improvements for text searching: any number of search hits is now returned thanks to the removal of the **hit_max** parameter. The new **clip** parameter in addition allows to restrict the search area. Searching now detects hyphenations at line breaks and accordingly finds hyphenated words.
7
+
8
+
* **Fixed** issue `#575 <https://github.com/pymupdf/PyMuPDF/issues/575>`_: if using ``quads=False`` in text searching, then overlapping rectangles on the same line are joined. Previously, parts of the search string, which belonged to different "marked content" items, each generated their own rectangle -- just as if occurring on separate lines.
9
+
* **Added** :attr:`Document.isRepaired`, which is true if the PDF was repaired on open.
10
+
* **Added** :meth:`Document.setXmlMetadata` which either updates or creates PDF XML metadata. Implements issue `#691 <https://github.com/pymupdf/PyMuPDF/issues/691>`_.
11
+
* **Added** :meth:`Document.getXmlMetadata` returns PDF XML metadata.
12
+
* **Changed** creation of PDF documents: they will now always carry a PDF identification (``/ID`` field) in the document trailer. Implements issue `#691 <https://github.com/pymupdf/PyMuPDF/issues/691>`_.
13
+
* **Changed** :meth:`Page.searchFor`: a new parameter ``clip`` is accepted to restrict the search to this rectangle. Correspondingly, the attribute :attr:`TextPage.rect` is now respected by :meth:`TextPage.search`.
14
+
* **Changed** parameter ``hit_max`` in :meth:`Page.searchFor` and :meth:`TextPage.search` is now obsolete: methods will return all hits.
15
+
* **Changed** character **selection criteria** in :meth:`Page.getText`: a character is now considered to be part of a ``clip`` if its bbox is fully contained. Before this, a non-empty intersection was sufficient.
16
+
* **Changed** :meth:`Document.scrub` to support a new option `redact_images`. This addresses issue `#697 <https://github.com/pymupdf/PyMuPDF/issues/697>`_.
17
+
18
+
4
19
Changes in Version 1.18.1
5
20
---------------------------
6
21
* **Fixed** issue `#692 <https://github.com/pymupdf/PyMuPDF/issues/692>`_. PyMuPDF now detects and recovers from more cyclic resource dependencies in PDF pages and for the first time reports them in the MuPDF warnings store.
@@ -11,7 +26,7 @@ Changes in Version 1.18.1
11
26
12
27
Changes in Version 1.18.0
13
28
---------------------------
14
-
This is first PyMuPDF version supporting MuPDF v1.18. The goal here is on extending PyMuPDF's own functionality -- apart from bug fixing. Subsequent PyMuPDF patches may address features new in MuPDF.
29
+
This is the first PyMuPDF version supporting MuPDF v1.18. The focus here is on extending PyMuPDF's own functionality -- apart from bug fixing. Subsequent PyMuPDF patches may address features new in MuPDF.
15
30
16
31
* **Fixed** issue `#519 <https://github.com/pymupdf/PyMuPDF/issues/519>`_. This upstream bug occurred occasionally for some pages only and seems to be fixed now: page layout should no longer be ruined in these cases.
Copy file name to clipboardExpand all lines: docs/document.rst
+30-5
Original file line number
Diff line number
Diff line change
@@ -49,6 +49,7 @@ For details on **embedded files** refer to Appendix 3.
49
49
:meth:`Document.getSigFlags` PDF only: determine signature state
50
50
:meth:`Document.getToC` create a table of contents
51
51
:meth:`Document.getTOC` alias of previous
52
+
:meth:`Document.getXmlMetadata` PDF only: read the XML metadata
52
53
:meth:`Document.insertPage` PDF only: insert a new page
53
54
:meth:`Document.insertPDF` PDF only: insert pages from another PDF
54
55
:meth:`Document.layout` re-paginate the document (if supported)
@@ -76,6 +77,7 @@ For details on **embedded files** refer to Appendix 3.
76
77
:meth:`Document.setTOC_item` PDF only: change a single TOC item
77
78
:meth:`Document.setToC` PDF only: set the table of contents (TOC)
78
79
:meth:`Document.setTOC` PDF only: alias of previous
80
+
:meth:`Document.setXmlMetadata` PDF only: create or update document XML metadata
79
81
:meth:`Document.updateObject` PDF only: replace object source
80
82
:meth:`Document.updateStream` PDF only: replace stream source
81
83
:meth:`Document.write` PDF only: writes document to memory
@@ -90,6 +92,7 @@ For details on **embedded files** refer to Appendix 3.
90
92
:attr:`Document.isFormPDF` is this a Form PDF?
91
93
:attr:`Document.isPDF` is this a PDF?
92
94
:attr:`Document.isReflowable` is this a reflowable document?
95
+
:attr:`Document.isRepaired` PDF only: has this PDF been repaired during open?
93
96
:attr:`Document.lastLocation` (chapter, pno) of last page
94
97
:attr:`Document.metadata` metadata
95
98
:attr:`Document.name` filename of document
@@ -513,10 +516,23 @@ For details on **embedded files** refer to Appendix 3.
513
516
514
517
.. method:: setMetadata(m)
515
518
516
-
PDF only: Sets or updates the metadata of the document as specified in *m*, a Python dictionary. As with :meth:`select`, these changes become permanent only when you save the document. Incremental save is supported.
519
+
PDF only: Sets or updates the metadata of the document as specified in *m*, a Python dictionary.
517
520
518
521
:arg dict m: A dictionary with the same keys as *metadata* (see below). All keys are optional. A PDF's format and encryption method cannot be set or changed and will be ignored. If any value should not contain data, do not specify its key or set the value to *None*. If you use *{}* all metadata information will be cleared to the string *"none"*. If you want to selectively change only some values, modify a copy of *doc.metadata* and use it as the argument. Arbitrary unicode values are possible if specified as UTF-8-encoded.
519
522
523
+
.. method:: getXmlMetadata()
524
+
525
+
PDF only: Get the document XML metadata.
526
+
527
+
:rtype: str
528
+
:returns: XML metadata of the document. Empty string if not present or not a PDF.
529
+
530
+
.. method:: setXmlMetadata(xml)
531
+
532
+
PDF only: Sets or updates XML metadata of the document.
533
+
534
+
:arg str xml: the new XML metadata. Should be XML syntax, however no checking is done by this method and any string is accepted.
535
+
520
536
.. method:: setToC(toc, collapse=1)
521
537
522
538
.. method:: setTOC(toc, collapse=1)
@@ -529,7 +545,7 @@ For details on **embedded files** refer to Appendix 3.
529
545
530
546
:arg sequence toc:
531
547
532
-
A Python sequence (list or tuple) with **all bookmark entries** that should form the new table of contents. Output variants of :meth:`getToC` are acceptable. To completely remove the table of contents specify an empty sequence or None. Each item must be a list with the following format.
548
+
A list or tuple with **all bookmark entries** that should form the new table of contents. Output variants of :meth:`getToC` are acceptable. To completely remove the table of contents specify an empty sequence or None. Each item must be a list with the following format.
533
549
534
550
* [lvl, title, page [, dest]] where
535
551
@@ -592,7 +608,7 @@ For details on **embedded files** refer to Appendix 3.
592
608
593
609
Check whether the document can be saved incrementally. Use it to choose the right option without encountering exceptions.
PDF only: *(New in v1.16.14)* Remove potentially sensitive data from the PDF. This function is inspired by the similar "Sanitize" function in Adobe Acrobat products. The process is configurable by a number of options, which are all *True* by default.
598
614
@@ -603,6 +619,7 @@ For details on **embedded files** refer to Appendix 3.
Search for "text" on page number "pno". Works exactly like the corresponding :meth:`Page.searchFor`. Any integer -inf < pno < pageCount is acceptable.
670
687
@@ -1054,7 +1071,7 @@ For details on **embedded files** refer to Appendix 3.
1054
1071
1055
1072
*False* if this is not a PDF or has no form fields, otherwise the number of root form fields (fields with no ancestors).
1056
1073
1057
-
Changed in version 1.16.4 Returns the total number of (root) form fields.
1074
+
*(Changed in version 1.16.4)* Returns the total number of (root) form fields.
1058
1075
1059
1076
:type: bool,int
1060
1077
@@ -1064,6 +1081,14 @@ For details on **embedded files** refer to Appendix 3.
1064
1081
1065
1082
:type: bool
1066
1083
1084
+
.. attribute:: isRepaired
1085
+
1086
+
*(New in v1.18.2)*
1087
+
1088
+
*True* if PDF has been repaired during open (because of major structure issues). Always *False* for non-PDF documents. If true, more details have been stored in ``TOOLS.mupdf_warnings()``, and :meth:`Document.can_save_incrementally` will return *False*.
1089
+
1090
+
:type: bool
1091
+
1067
1092
.. attribute:: needsPass
1068
1093
1069
1094
Indicates whether the document is password-protected against access. This indicator remains unchanged -- **even after the document has been authenticated**. Precludes incremental saves if true.
1. Updating an annotation, which has been created by some other software, via a PyMuPDF script.
1862
-
2. Creating an annotation with PyMuPDF and later changing it using some other PDF application.
1859
+
1. **Updating** an annotation with PyMuPDF which was created by some other software.
1860
+
2. **Creating** an annotation with PyMuPDF and later changing it with some other software.
1863
1861
1864
-
In both cases you may experience unintended changes like a different annotation icon or text font, the fill color or line dashing have disappeared, line end symbols have changed their size or even have disappeared too, etc.
1862
+
In both cases you may experience unintended changes, like a different annotation icon or text font, the fill color or line dashing have disappeared, line end symbols have changed their size or even have disappeared too, etc.
1865
1863
1866
1864
Cause
1867
1865
^^^^^^
1868
-
Annotation maintenance is handled differently by each PDF maintenance application (if it is supported at all). For any given PDF application, some annotation types may not be supported at all or only partly, or some details may be handled in a different way than with another application.
1866
+
Annotation maintenance is handled differently by each PDF maintenance application. Some annotation types may not be supported, or not be supported fully or some details may be handled in a different way than in another application. **There is no standard.**
1869
1867
1870
1868
Almost always a PDF application also comes with its own icons (file attachments, sticky notes and stamps) and its own set of supported text fonts. For example:
1871
1869
1872
1870
* (Py-) MuPDF only supports these 5 basic fonts for 'FreeText' annotations: Helvetica, Times-Roman, Courier, ZapfDingbats and Symbol -- no italics / no bold variations. When changing a 'FreeText' annotation created by some other app, its font will probably not be recognized nor accepted and be replaced by Helvetica.
1873
1871
1874
-
* PyMuPDF fully supports the PDF text markers, but these types cannot be updated with Adobe Acrobat Reader.
1872
+
* PyMuPDF supports all PDF text markers (highlight, underline, strikeout, squiggly), but these types cannot be updated with Adobe Acrobat Reader.
1875
1873
1876
1874
In most cases there also exists limited support for line dashing which causes existing dashes to be replaced by straight lines. For example:
0 commit comments