Skip to content

Commit a95e492

Browse files
committed
New Document method "select" and documentation changes
1 parent 82fb1fc commit a95e492

31 files changed

+1455
-1906
lines changed

doc/PyMuPDF.pdf

4.71 KB
Binary file not shown.

doc/html/.doctrees/changes.doctree

1.2 KB
Binary file not shown.

doc/html/.doctrees/document.doctree

14 KB
Binary file not shown.

doc/html/.doctrees/environment.pickle

1.47 KB
Binary file not shown.

doc/html/.doctrees/matrix.doctree

148 Bytes
Binary file not shown.

doc/html/.doctrees/pixmap.doctree

407 Bytes
Binary file not shown.

doc/html/.doctrees/rect.doctree

-441 Bytes
Binary file not shown.

doc/html/_sources/changes.txt

+5-3
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ Changes in these bindings compared to version 1.8.0 are the following:
1010
* New methods ``getRectArea()`` for both ``fitz.Rect`` and ``fitz.IRect``
1111
* Pixmaps can now be created directly from files using the new constructor ``fitz.Pixmap(filename)``. All of the following image formats covered by MuPDF are thus also supported as inputs for pixmaps: BMP, JPEG, JXR, PNG, GIF, TIFF.
1212
* The Pixmap constructor ``fitz.Pixmap(data, len(data))`` has been extended accordingly to support the above image formats as well (not just PNG as it did in version 1.8.0).
13-
* Various improvements and new members in our demo and examples collections have been applied or added. Perhaps most prominently: ``PDF_display`` now supports scrolling with the mouse wheel, and there is a new example program ``wxTableExtract`` which allows to graphically identify and extract table data in documents.
1413
* ``fitz.Rect`` objects can now be created with all possible combinations of points and coordinates.
15-
* PyMuPDF classes and methods now all contain __doc__ strings, which were mostly automatically created by SWIG. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
14+
* PyMuPDF classes and methods now all contain __doc__ strings, most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
1615
* A new method of ``fitz.Document.getPermits()`` returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.
17-
* The identity matrix ``fitz.Identity`` is now **immutable**.
16+
* The identity matrix ``fitz.Identity`` is now **immutable**.
17+
* The new method ``fitz.Document.select(list)`` removes all pages from an open document that are not contained in the list.
18+
* Various improvements and new members in our demo and examples collections. Perhaps most prominently: ``PDF_display`` now supports scrolling with the mouse wheel, and there is a new example program ``wxTableExtract`` which allows to graphically identify and extract table data in documents.
19+
* ``fitz.open()`` is an alias of ``fitz.Document()``.

doc/html/_sources/document.txt

+60-3
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,11 @@ Document
1010

1111
This class represents a document. It can be constructed from a file or from memory. See below for details.
1212

13-
=============================== ================================================
13+
Since version 1.9.0 there exists an alias ``open`` for this class.
14+
15+
=============================== =====================================================
1416
**Method / Attribute** **Short Description**
15-
=============================== ================================================
17+
=============================== =====================================================
1618
:meth:`Document.authenticate` decrypts the document
1719
:meth:`Document.loadPage` reads a page
1820
:meth:`Document.save` saves a copy of the document
@@ -21,14 +23,15 @@ This class represents a document. It can be constructed from a file or from memo
2123
:meth:`Document.getPageText` extracts the text of a page by its number
2224
:meth:`Document.getPermits` show permissions to access the document
2325
:meth:`Document.close` closes the document
26+
:meth:`Document.select` selects pages from a document, discards the rest
2427
:attr:`Document.isClosed` has document been closed?
2528
:attr:`Document.outline` first `Outline` item
2629
:attr:`Document.name` filename of document
2730
:attr:`Document.needsPass` require password to access data?
2831
:attr:`Document.isEncrypted` is document still encrypted?
2932
:attr:`Document.pageCount` the document's number of pages
3033
:attr:`Document.metadata` the document's meta data
31-
=============================== ================================================
34+
=============================== =====================================================
3235

3336
**Class API**
3437

@@ -133,6 +136,16 @@ This class represents a document. It can be constructed from a file or from memo
133136

134137
:rtype: dict
135138

139+
.. method:: select(list)
140+
141+
Retains only those pages in the document that occur in the list. Empty lists or elements outside the range 0 to ``doc.pageCount - 1`` will cause a ``ValueError``. For more details see remarks at the bottom or this chapter. Only PDF documents are supported by this method.
142+
143+
:param `list`: A list of integers (zero-based) naming the pages to be included. Pages not occurring in the list will be deleted (from memory) and become unavailable until the document is reopened. Page numbers can occur multiple times and be in any order.
144+
145+
:type `list`: list
146+
147+
:rtype: int
148+
136149
.. method:: save(outfile, garbage=0, clean=0, deflate=0, incremental=0, ascii=0, expand=0, linear=0)
137150

138151
Saves a copy of the document under the name ``outfile``. Include path specifications as necessary. Only PDF documents are supported by this function. Internally the document may have changed. E.g. after a successfull ``authenticate``, a decrypted copy will be saved, and, in addition (even without any of the optional parameters), some basic cleaning of the document data could also have occurred, e.g. broken xref tables have been corrected and incremental changes have been resolved.
@@ -223,3 +236,47 @@ This class represents a document. It can be constructed from a file or from memo
223236
Contains the number of pages of the document. May return 0 for documents with no pages.
224237

225238
:rtype: int
239+
240+
Remarks on ``select()``
241+
------------------------
242+
243+
Page numbers in the list need not be unique nor be in any particular sequence. This makes the method a versatile utility to e.g. select only even or odd pages, re-arrange a document from back to front, duplicate it, and so forth. In combination with text extraction you can also omit / include pages with no text or certain text, etc.
244+
245+
You can execute several selections in a row. The document structure will be kept updated.
246+
247+
Any of those actions will become permanent only with a ``doc.save()``. Do not forget to specify the ``garbage=3`` option to eventually reduce the resulting document's size.
248+
249+
It should also be noted, that this method **preserves all links, annotations and bookmarks** that are still valid. In other words: deleting pages only deletes references that would otherwise point to nowhere.
250+
251+
Examples
252+
---------
253+
254+
Create a document copy with only pages containing text:
255+
::
256+
import fitz
257+
doc = fitz.open("any.pdf")
258+
r = list(range(doc.pageCount)) # list of all pages
259+
for i in list(range(doc.pageCount)):
260+
txt = doc.getPageText(i) # get the page's text
261+
if not txt: # nothing there
262+
r.remove(i) # remove page number from list
263+
doc.select(r) # apply the list
264+
doc.save("out.pdf", garbage=3) # save the resulting PDF
265+
266+
267+
Concatenate a document with itself:
268+
::
269+
import fitz
270+
doc = fitz.open("any.pdf")
271+
r = list(range(doc.pageCount))
272+
r += r # turn PDF into a copy of itself
273+
doc.select(r)
274+
doc.save("out.pdf", garbage=3)
275+
276+
Create document copy in reverse page order:
277+
::
278+
import fitz
279+
doc = fitz.open("any.pdf")
280+
r = list(range(doc.pageCount-1, -1, -1))
281+
doc.select(r)
282+
doc.save("out.pdf", garbage=3)

doc/html/_sources/matrix.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Since all points or pixels live in a two-dimensional space, one column vector of
1717
It should be noted, that
1818

1919
* the below methods are just convenience functions - everything they do, can also be achieved by directly manipulating ``[a,b,c,d,e,f]``
20-
* all manipulations can be combined - you can construct a matrix that does a rotate **and** a shear **and** a scale **and** a shift, etc. in one go. If you want to do this, have a look at the **remarks** further down or at the Adobe manual.
20+
* all manipulations can be combined - you can construct a matrix that does a rotate **and** a shear **and** a scale **and** a shift, etc. in one go. If you however choose to do this, do have a look at the **remarks** further down or at the Adobe manual.
2121

2222

2323
========================= ============================
@@ -131,8 +131,8 @@ It should be noted, that
131131

132132
:type: float
133133

134-
**Remarks**
135-
134+
Remarks
135+
--------
136136
Obviously, changes of matrix properties and execution of matrix methods can be combined, i.e. executed consecutively. This is done by multiplying the respective matrices.
137137

138138
Matrix multiplications are **not commutative**, i.e. execution sequence determines the result: a **shift - rotate** does in general not equal a **rotate - shift**. So it can become quite unclear which result a transformation will yield. E.g. if you apply ``preRotate(x)`` to an arbitrary matrix ``[a,b,c,d,e,f]`` you will get matrix ``[a*cos(x)+c*sin(x), b*cos(x)+d*sin(x), -a*sin(x)+c*cos(x), -b*sin(x)+d*cos(x), e, f]``.

doc/html/_sources/pixmap.txt

+3-3
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Please have a look at the **example** section to see some pixmap usage "at work"
6464
This constructor creates a (non-empty) pixmap from file ``filename``, which is assumed to contain a supported image.
6565

6666
:param `filename`: Path / name of the file. The origin of the resulting pixmap is (0,0).
67-
:type `data`: string
67+
:type `filename`: string
6868

6969
.. method:: __init__(self, data, len)
7070

@@ -244,8 +244,8 @@ Please have a look at the **example** section to see some pixmap usage "at work"
244244

245245
:rtype: bool
246246

247-
Supported Pixmap Image Types
248-
-----------------------------
247+
Supported Pixmap Construction Image Types
248+
-------------------------------------------
249249
Support includes the following file types: BMP, JPEG, GIF, TIFF, JXR, and PNG.
250250

251251
Details on Saving Images with ``writeImage()``

doc/html/_sources/rect.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ A rectangle is called "finite" if x0 <= x1 and y0 <= y1 is true, else "infinite"
5555
Transforms ``Rect`` with a :ref:`Matrix`.
5656

5757
:param `m`: The matrix to be used for the transformation.
58-
:param `m`: :ref:`Matrix`
58+
:type `m`: :ref:`Matrix`
5959
:rtype: :ref:`Rect`
6060

6161
.. method:: getRectArea(unit = 'pt')

doc/html/app2.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -236,7 +236,7 @@ <h3>Navigation</h3>
236236
</div>
237237
<div class="footer" role="contentinfo">
238238
&copy; Copyright 2016, Ruikai Liu, Jorj McKie.
239-
Last updated on 26. Apr 2016.
239+
Last updated on 29. Apr 2016.
240240
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.1.
241241
</div>
242242
</body>

doc/html/changes.html

+5-3
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,13 @@ <h1>Changes in Version 1.9.0<a class="headerlink" href="#changes-in-version-1-9-
8585
<li>New methods <code class="docutils literal"><span class="pre">getRectArea()</span></code> for both <code class="docutils literal"><span class="pre">fitz.Rect</span></code> and <code class="docutils literal"><span class="pre">fitz.IRect</span></code></li>
8686
<li>Pixmaps can now be created directly from files using the new constructor <code class="docutils literal"><span class="pre">fitz.Pixmap(filename)</span></code>. All of the following image formats covered by MuPDF are thus also supported as inputs for pixmaps: BMP, JPEG, JXR, PNG, GIF, TIFF.</li>
8787
<li>The Pixmap constructor <code class="docutils literal"><span class="pre">fitz.Pixmap(data,</span> <span class="pre">len(data))</span></code> has been extended accordingly to support the above image formats as well (not just PNG as it did in version 1.8.0).</li>
88-
<li>Various improvements and new members in our demo and examples collections have been applied or added. Perhaps most prominently: <code class="docutils literal"><span class="pre">PDF_display</span></code> now supports scrolling with the mouse wheel, and there is a new example program <code class="docutils literal"><span class="pre">wxTableExtract</span></code> which allows to graphically identify and extract table data in documents.</li>
8988
<li><code class="docutils literal"><span class="pre">fitz.Rect</span></code> objects can now be created with all possible combinations of points and coordinates.</li>
90-
<li>PyMuPDF classes and methods now all contain __doc__ strings, which were mostly automatically created by SWIG. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.</li>
89+
<li>PyMuPDF classes and methods now all contain __doc__ strings, most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.</li>
9190
<li>A new method of <code class="docutils literal"><span class="pre">fitz.Document.getPermits()</span></code> returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.</li>
9291
<li>The identity matrix <code class="docutils literal"><span class="pre">fitz.Identity</span></code> is now <strong>immutable</strong>.</li>
92+
<li>The new method <code class="docutils literal"><span class="pre">fitz.Document.select(list)</span></code> removes all pages from an open document that are not contained in the list.</li>
93+
<li>Various improvements and new members in our demo and examples collections. Perhaps most prominently: <code class="docutils literal"><span class="pre">PDF_display</span></code> now supports scrolling with the mouse wheel, and there is a new example program <code class="docutils literal"><span class="pre">wxTableExtract</span></code> which allows to graphically identify and extract table data in documents.</li>
94+
<li><code class="docutils literal"><span class="pre">fitz.open()</span></code> is an alias of <code class="docutils literal"><span class="pre">fitz.Document()</span></code>.</li>
9395
</ul>
9496
</div>
9597

@@ -116,7 +118,7 @@ <h3>Navigation</h3>
116118
</div>
117119
<div class="footer" role="contentinfo">
118120
&copy; Copyright 2016, Ruikai Liu, Jorj McKie.
119-
Last updated on 27. Apr 2016.
121+
Last updated on 30. Apr 2016.
120122
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.1.
121123
</div>
122124
</body>

doc/html/classes.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ <h3>Navigation</h3>
123123
</div>
124124
<div class="footer" role="contentinfo">
125125
&copy; Copyright 2016, Ruikai Liu, Jorj McKie.
126-
Last updated on 28. Apr 2016.
126+
Last updated on 30. Apr 2016.
127127
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.1.
128128
</div>
129129
</body>

0 commit comments

Comments
 (0)