Skip to content

Commit 071c56d

Browse files
committed
upload v1.18.2
1 parent 1cc6d70 commit 071c56d

18 files changed

+581
-178
lines changed

PKG-INFO

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Metadata-Version: 1.1
22
Name: PyMuPDF
3-
Version: 1.18.1
3+
Version: 1.18.2
44
Author: Jorj McKie
55
Author-email: [email protected]
66
Maintainer: Jorj McKie
@@ -9,7 +9,7 @@ Home-page: https://github.com/pymupdf/PyMuPDF
99
Download-url: https://github.com/pymupdf/PyMuPDF
1010
Summary: PyMuPDF is a Python binding for the PDF rendering library MuPDF
1111
Description:
12-
Release date: October 18, 2020
12+
Release date: October 7, 2020
1313

1414
Authors
1515
=======
@@ -20,7 +20,7 @@ Description:
2020
Introduction
2121
============
2222

23-
This is **version 1.18.1 of PyMuPDF**, a Python binding for `MuPDF <http://mupdf.com/>`_ - "a lightweight PDF and XPS viewer".
23+
This is **version 1.18.2 of PyMuPDF**, a Python binding for `MuPDF <http://mupdf.com/>`_ - "a lightweight PDF and XPS viewer".
2424

2525
MuPDF can access files in PDF, XPS, OpenXPS, epub, comic and fiction book formats, and it is known for both, its top performance and high rendering quality.
2626

README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
1-
# PyMuPDF 1.18.1
1+
# PyMuPDF 1.18.2
22

33
![logo](https://github.com/pymupdf/PyMuPDF/blob/master/demo/pymupdf.jpg)
44

5-
Release date: October 18, 2020
5+
Release date: October 27, 2020
66

77
**Travis-CI:** [![Build Status](https://travis-ci.org/JorjMcKie/py-mupdf.svg?branch=master)](https://travis-ci.org/JorjMcKie/py-mupdf)
88

@@ -14,7 +14,7 @@ On **[PyPI](https://pypi.org/project/PyMuPDF)** since August 2016: [![](https://
1414

1515
# Introduction
1616

17-
This is **version 1.18.1 of PyMuPDF**, a Python binding with support for [MuPDF 1.18.*](http://mupdf.com/) - "a lightweight PDF, XPS, and E-book viewer".
17+
This is **version 1.18.2 of PyMuPDF**, a Python binding with support for [MuPDF 1.18.*](http://mupdf.com/) - "a lightweight PDF, XPS, and E-book viewer".
1818

1919
MuPDF can access files in PDF, XPS, OpenXPS, CBZ, EPUB and FB2 (e-books) formats, and it is known for its top performance and high rendering quality.
2020

docs/annot.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ There is a parent-child relationship between an annotation and its page. If the
9090

9191
*(New in 1.18.0)*
9292

93-
Retrieves the content of the annotation in a variety of formats -- much like the same method for :ref:`Page`.. This currently only delivers relevant data for annotation types 'FreeText' and 'Stamp'. Other type will return an empty string (or equivalent objects).
93+
Retrieves the content of the annotation in a variety of formats -- much like the same method for :ref:`Page`.. This currently only delivers relevant data for annotation types 'FreeText' and 'Stamp'. Other types return an empty string (or equivalent objects).
9494

9595
:arg str opt: the desired format - one of the following values. Please note that this method works exactly like the same-named method of :ref:`Page`.
9696

docs/changes.rst

+16-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,21 @@
11
Change Logs
22
===============
33

4+
Changes in Version 1.18.2
5+
---------------------------
6+
This version contains some interesting improvements for text searching: any number of search hits is now returned thanks to the removal of the **hit_max** parameter. The new **clip** parameter in addition allows to restrict the search area. Searching now detects hyphenations at line breaks and accordingly finds hyphenated words.
7+
8+
* **Fixed** issue `#575 <https://github.com/pymupdf/PyMuPDF/issues/575>`_: if using ``quads=False`` in text searching, then overlapping rectangles on the same line are joined. Previously, parts of the search string, which belonged to different "marked content" items, each generated their own rectangle -- just as if occurring on separate lines.
9+
* **Added** :attr:`Document.isRepaired`, which is true if the PDF was repaired on open.
10+
* **Added** :meth:`Document.setXmlMetadata` which either updates or creates PDF XML metadata. Implements issue `#691 <https://github.com/pymupdf/PyMuPDF/issues/691>`_.
11+
* **Added** :meth:`Document.getXmlMetadata` returns PDF XML metadata.
12+
* **Changed** creation of PDF documents: they will now always carry a PDF identification (``/ID`` field) in the document trailer. Implements issue `#691 <https://github.com/pymupdf/PyMuPDF/issues/691>`_.
13+
* **Changed** :meth:`Page.searchFor`: a new parameter ``clip`` is accepted to restrict the search to this rectangle. Correspondingly, the attribute :attr:`TextPage.rect` is now respected by :meth:`TextPage.search`.
14+
* **Changed** parameter ``hit_max`` in :meth:`Page.searchFor` and :meth:`TextPage.search` is now obsolete: methods will return all hits.
15+
* **Changed** character **selection criteria** in :meth:`Page.getText`: a character is now considered to be part of a ``clip`` if its bbox is fully contained. Before this, a non-empty intersection was sufficient.
16+
* **Changed** :meth:`Document.scrub` to support a new option `redact_images`. This addresses issue `#697 <https://github.com/pymupdf/PyMuPDF/issues/697>`_.
17+
18+
419
Changes in Version 1.18.1
520
---------------------------
621
* **Fixed** issue `#692 <https://github.com/pymupdf/PyMuPDF/issues/692>`_. PyMuPDF now detects and recovers from more cyclic resource dependencies in PDF pages and for the first time reports them in the MuPDF warnings store.
@@ -11,7 +26,7 @@ Changes in Version 1.18.1
1126

1227
Changes in Version 1.18.0
1328
---------------------------
14-
This is first PyMuPDF version supporting MuPDF v1.18. The goal here is on extending PyMuPDF's own functionality -- apart from bug fixing. Subsequent PyMuPDF patches may address features new in MuPDF.
29+
This is the first PyMuPDF version supporting MuPDF v1.18. The focus here is on extending PyMuPDF's own functionality -- apart from bug fixing. Subsequent PyMuPDF patches may address features new in MuPDF.
1530

1631
* **Fixed** issue `#519 <https://github.com/pymupdf/PyMuPDF/issues/519>`_. This upstream bug occurred occasionally for some pages only and seems to be fixed now: page layout should no longer be ruined in these cases.
1732

docs/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@
4040
# built documents.
4141
#
4242
# The full version, including alpha/beta/rc tags.
43-
release = "1.18.1"
43+
release = "1.18.2"
4444

4545
# The short X.Y version
4646
version = release

docs/document.rst

+30-5
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ For details on **embedded files** refer to Appendix 3.
4949
:meth:`Document.getSigFlags` PDF only: determine signature state
5050
:meth:`Document.getToC` create a table of contents
5151
:meth:`Document.getTOC` alias of previous
52+
:meth:`Document.getXmlMetadata` PDF only: read the XML metadata
5253
:meth:`Document.insertPage` PDF only: insert a new page
5354
:meth:`Document.insertPDF` PDF only: insert pages from another PDF
5455
:meth:`Document.layout` re-paginate the document (if supported)
@@ -76,6 +77,7 @@ For details on **embedded files** refer to Appendix 3.
7677
:meth:`Document.setTOC_item` PDF only: change a single TOC item
7778
:meth:`Document.setToC` PDF only: set the table of contents (TOC)
7879
:meth:`Document.setTOC` PDF only: alias of previous
80+
:meth:`Document.setXmlMetadata` PDF only: create or update document XML metadata
7981
:meth:`Document.updateObject` PDF only: replace object source
8082
:meth:`Document.updateStream` PDF only: replace stream source
8183
:meth:`Document.write` PDF only: writes document to memory
@@ -90,6 +92,7 @@ For details on **embedded files** refer to Appendix 3.
9092
:attr:`Document.isFormPDF` is this a Form PDF?
9193
:attr:`Document.isPDF` is this a PDF?
9294
:attr:`Document.isReflowable` is this a reflowable document?
95+
:attr:`Document.isRepaired` PDF only: has this PDF been repaired during open?
9396
:attr:`Document.lastLocation` (chapter, pno) of last page
9497
:attr:`Document.metadata` metadata
9598
:attr:`Document.name` filename of document
@@ -513,10 +516,23 @@ For details on **embedded files** refer to Appendix 3.
513516

514517
.. method:: setMetadata(m)
515518

516-
PDF only: Sets or updates the metadata of the document as specified in *m*, a Python dictionary. As with :meth:`select`, these changes become permanent only when you save the document. Incremental save is supported.
519+
PDF only: Sets or updates the metadata of the document as specified in *m*, a Python dictionary.
517520

518521
:arg dict m: A dictionary with the same keys as *metadata* (see below). All keys are optional. A PDF's format and encryption method cannot be set or changed and will be ignored. If any value should not contain data, do not specify its key or set the value to *None*. If you use *{}* all metadata information will be cleared to the string *"none"*. If you want to selectively change only some values, modify a copy of *doc.metadata* and use it as the argument. Arbitrary unicode values are possible if specified as UTF-8-encoded.
519522

523+
.. method:: getXmlMetadata()
524+
525+
PDF only: Get the document XML metadata.
526+
527+
:rtype: str
528+
:returns: XML metadata of the document. Empty string if not present or not a PDF.
529+
530+
.. method:: setXmlMetadata(xml)
531+
532+
PDF only: Sets or updates XML metadata of the document.
533+
534+
:arg str xml: the new XML metadata. Should be XML syntax, however no checking is done by this method and any string is accepted.
535+
520536
.. method:: setToC(toc, collapse=1)
521537

522538
.. method:: setTOC(toc, collapse=1)
@@ -529,7 +545,7 @@ For details on **embedded files** refer to Appendix 3.
529545

530546
:arg sequence toc:
531547

532-
A Python sequence (list or tuple) with **all bookmark entries** that should form the new table of contents. Output variants of :meth:`getToC` are acceptable. To completely remove the table of contents specify an empty sequence or None. Each item must be a list with the following format.
548+
A list or tuple with **all bookmark entries** that should form the new table of contents. Output variants of :meth:`getToC` are acceptable. To completely remove the table of contents specify an empty sequence or None. Each item must be a list with the following format.
533549

534550
* [lvl, title, page [, dest]] where
535551

@@ -592,7 +608,7 @@ For details on **embedded files** refer to Appendix 3.
592608

593609
Check whether the document can be saved incrementally. Use it to choose the right option without encountering exceptions.
594610

595-
.. method:: scrub(attached_files=True, clean_pages=True, embedded_files=True, hidden_text=True, javascript=True, metadata=True, redactions=True, remove_links=True, reset_fields=True, reset_responses=True, xml_metadata=True)
611+
.. method:: scrub(attached_files=True, clean_pages=True, embedded_files=True, hidden_text=True, javascript=True, metadata=True, redactions=True, redact_images=0, remove_links=True, reset_fields=True, reset_responses=True, xml_metadata=True)
596612

597613
PDF only: *(New in v1.16.14)* Remove potentially sensitive data from the PDF. This function is inspired by the similar "Sanitize" function in Adobe Acrobat products. The process is configurable by a number of options, which are all *True* by default.
598614

@@ -603,6 +619,7 @@ For details on **embedded files** refer to Appendix 3.
603619
:arg bool javascript: Remove JavaScript sources.
604620
:arg bool metadata: Remove PDF standard metadata.
605621
:arg bool redactions: Apply redaction annotations.
622+
:arg int redact_images: how to handle images if applying redactions. One of 0 (ignore), 1 (blank out overlaps) or 2 (remove).
606623
:arg bool remove_links: Remove all links.
607624
:arg bool reset_fields: Reset all form fields to their defaults.
608625
:arg bool reset_responses: Remove all responses from all annotations.
@@ -664,7 +681,7 @@ For details on **embedded files** refer to Appendix 3.
664681
:rtype: bytes
665682
:returns: a bytes object containing the complete document.
666683

667-
.. method:: searchPageFor(pno, text, hit_max=16, quads=False)
684+
.. method:: searchPageFor(pno, text, quads=False)
668685

669686
Search for "text" on page number "pno". Works exactly like the corresponding :meth:`Page.searchFor`. Any integer -inf < pno < pageCount is acceptable.
670687

@@ -1054,7 +1071,7 @@ For details on **embedded files** refer to Appendix 3.
10541071

10551072
*False* if this is not a PDF or has no form fields, otherwise the number of root form fields (fields with no ancestors).
10561073

1057-
Changed in version 1.16.4 Returns the total number of (root) form fields.
1074+
*(Changed in version 1.16.4)* Returns the total number of (root) form fields.
10581075

10591076
:type: bool,int
10601077

@@ -1064,6 +1081,14 @@ For details on **embedded files** refer to Appendix 3.
10641081

10651082
:type: bool
10661083

1084+
.. attribute:: isRepaired
1085+
1086+
*(New in v1.18.2)*
1087+
1088+
*True* if PDF has been repaired during open (because of major structure issues). Always *False* for non-PDF documents. If true, more details have been stored in ``TOOLS.mupdf_warnings()``, and :meth:`Document.can_save_incrementally` will return *False*.
1089+
1090+
:type: bool
1091+
10671092
.. attribute:: needsPass
10681093

10691094
Indicates whether the document is password-protected against access. This indicator remains unchanged -- **even after the document has been authenticated**. Precludes incremental saves if true.

docs/faq.rst

+13-15
Original file line numberDiff line numberDiff line change
@@ -728,15 +728,12 @@ There is a standard search function to search for arbitrary text on a page: :met
728728

729729
This method has advantages and drawbacks. Pros are
730730

731-
* the search string can contain blanks and wrap across lines
732-
* upper or lower cases are treated equal
731+
* The search string can contain blanks and wrap across lines
732+
* Upper or lower case characters are treated equal
733+
* Word hyphenation at line ends is detected and resolved
733734
* return may also be a list of :ref:`Quad` objects to precisely locate text that is **not parallel** to either axis.
734735

735-
Disadvantages:
736-
737-
* you cannot determine the number of found items beforehand: if *hit_max* items are returned you do not know whether you have missed any.
738-
739-
But you have other options::
736+
But you also have other options::
740737

741738
import sys
742739
import fitz
@@ -1580,8 +1577,9 @@ This deals with splitting up pages of a PDF in arbitrary pieces. For example, yo
15801577

15811578
# that's it, save output file
15821579
doc.save("poster-" + src.name,
1583-
garbage = 3, # eliminate duplicate objects
1584-
deflate = True) # compress stuff where possible
1580+
garbage=3, # eliminate duplicate objects
1581+
deflate=True, # compress stuff where possible
1582+
)
15851583

15861584

15871585
This shows what happens to an input page:
@@ -1652,7 +1650,7 @@ This deals with joining PDF pages to form a new PDF with pages each combining tw
16521650
spage.number) # input page number
16531651

16541652
# by all means, save new file using garbage collection and compression
1655-
doc.save("4up-" + infile, garbage = 3, deflate = True)
1653+
doc.save("4up-" + infile, garbage=3, deflate=True)
16561654

16571655
Example effect:
16581656

@@ -1858,20 +1856,20 @@ Problem
18581856
^^^^^^^^^
18591857
There are two scenarios:
18601858

1861-
1. Updating an annotation, which has been created by some other software, via a PyMuPDF script.
1862-
2. Creating an annotation with PyMuPDF and later changing it using some other PDF application.
1859+
1. **Updating** an annotation with PyMuPDF which was created by some other software.
1860+
2. **Creating** an annotation with PyMuPDF and later changing it with some other software.
18631861

1864-
In both cases you may experience unintended changes like a different annotation icon or text font, the fill color or line dashing have disappeared, line end symbols have changed their size or even have disappeared too, etc.
1862+
In both cases you may experience unintended changes, like a different annotation icon or text font, the fill color or line dashing have disappeared, line end symbols have changed their size or even have disappeared too, etc.
18651863

18661864
Cause
18671865
^^^^^^
1868-
Annotation maintenance is handled differently by each PDF maintenance application (if it is supported at all). For any given PDF application, some annotation types may not be supported at all or only partly, or some details may be handled in a different way than with another application.
1866+
Annotation maintenance is handled differently by each PDF maintenance application. Some annotation types may not be supported, or not be supported fully or some details may be handled in a different way than in another application. **There is no standard.**
18691867

18701868
Almost always a PDF application also comes with its own icons (file attachments, sticky notes and stamps) and its own set of supported text fonts. For example:
18711869

18721870
* (Py-) MuPDF only supports these 5 basic fonts for 'FreeText' annotations: Helvetica, Times-Roman, Courier, ZapfDingbats and Symbol -- no italics / no bold variations. When changing a 'FreeText' annotation created by some other app, its font will probably not be recognized nor accepted and be replaced by Helvetica.
18731871

1874-
* PyMuPDF fully supports the PDF text markers, but these types cannot be updated with Adobe Acrobat Reader.
1872+
* PyMuPDF supports all PDF text markers (highlight, underline, strikeout, squiggly), but these types cannot be updated with Adobe Acrobat Reader.
18751873

18761874
In most cases there also exists limited support for line dashing which causes existing dashes to be replaced by straight lines. For example:
18771875

0 commit comments

Comments
 (0)