You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/html/_sources/changes.txt
+5-3
Original file line number
Diff line number
Diff line change
@@ -10,8 +10,10 @@ Changes in these bindings compared to version 1.8.0 are the following:
10
10
* New methods ``getRectArea()`` for both ``fitz.Rect`` and ``fitz.IRect``
11
11
* Pixmaps can now be created directly from files using the new constructor ``fitz.Pixmap(filename)``. All of the following image formats covered by MuPDF are thus also supported as inputs for pixmaps: BMP, JPEG, JXR, PNG, GIF, TIFF.
12
12
* The Pixmap constructor ``fitz.Pixmap(data, len(data))`` has been extended accordingly to support the above image formats as well (not just PNG as it did in version 1.8.0).
13
-
* Various improvements and new members in our demo and examples collections have been applied or added. Perhaps most prominently: ``PDF_display`` now supports scrolling with the mouse wheel, and there is a new example program ``wxTableExtract`` which allows to graphically identify and extract table data in documents.
14
13
* ``fitz.Rect`` objects can now be created with all possible combinations of points and coordinates.
15
-
* PyMuPDF classes and methods now all contain __doc__ strings, which were mostly automatically created by SWIG. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
14
+
* PyMuPDF classes and methods now all contain __doc__ strings, most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
16
15
* A new method of ``fitz.Document.getPermits()`` returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.
17
-
* The identity matrix ``fitz.Identity`` is now **immutable**.
16
+
* The identity matrix ``fitz.Identity`` is now **immutable**.
17
+
* The new method ``fitz.Document.select(list)`` removes all pages from an open document that are not contained in the list.
18
+
* Various improvements and new members in our demo and examples collections. Perhaps most prominently: ``PDF_display`` now supports scrolling with the mouse wheel, and there is a new example program ``wxTableExtract`` which allows to graphically identify and extract table data in documents.
19
+
* ``fitz.open()`` is an alias of ``fitz.Document()``.
@@ -133,6 +136,16 @@ This class represents a document. It can be constructed from a file or from memo
133
136
134
137
:rtype: dict
135
138
139
+
.. method:: select(list)
140
+
141
+
Retains only those pages in the document that occur in the list. Empty lists or elements outside the range 0 to ``doc.pageCount - 1`` will cause a ``ValueError``. For more details see remarks at the bottom or this chapter. Only PDF documents are supported by this method.
142
+
143
+
:param `list`: A list of integers (zero-based) naming the pages to be included. Pages not occurring in the list will be deleted (from memory) and become unavailable until the document is reopened. Page numbers can occur multiple times and be in any order.
Saves a copy of the document under the name ``outfile``. Include path specifications as necessary. Only PDF documents are supported by this function. Internally the document may have changed. E.g. after a successfull ``authenticate``, a decrypted copy will be saved, and, in addition (even without any of the optional parameters), some basic cleaning of the document data could also have occurred, e.g. broken xref tables have been corrected and incremental changes have been resolved.
@@ -223,3 +236,47 @@ This class represents a document. It can be constructed from a file or from memo
223
236
Contains the number of pages of the document. May return 0 for documents with no pages.
224
237
225
238
:rtype: int
239
+
240
+
Remarks on ``select()``
241
+
------------------------
242
+
243
+
Page numbers in the list need not be unique nor be in any particular sequence. This makes the method a versatile utility to e.g. select only even or odd pages, re-arrange a document from back to front, duplicate it, and so forth. In combination with text extraction you can also omit / include pages with no text or certain text, etc.
244
+
245
+
You can execute several selections in a row. The document structure will be kept updated.
246
+
247
+
Any of those actions will become permanent only with a ``doc.save()``. Do not forget to specify the ``garbage=3`` option to eventually reduce the resulting document's size.
248
+
249
+
It should also be noted, that this method **preserves all links, annotations and bookmarks** that are still valid. In other words: deleting pages only deletes references that would otherwise point to nowhere.
250
+
251
+
Examples
252
+
---------
253
+
254
+
Create a document copy with only pages containing text:
255
+
::
256
+
import fitz
257
+
doc = fitz.open("any.pdf")
258
+
r = list(range(doc.pageCount)) # list of all pages
259
+
for i in list(range(doc.pageCount)):
260
+
txt = doc.getPageText(i) # get the page's text
261
+
if not txt: # nothing there
262
+
r.remove(i) # remove page number from list
263
+
doc.select(r) # apply the list
264
+
doc.save("out.pdf", garbage=3) # save the resulting PDF
Copy file name to clipboardExpand all lines: doc/html/_sources/matrix.txt
+3-3
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ Since all points or pixels live in a two-dimensional space, one column vector of
17
17
It should be noted, that
18
18
19
19
* the below methods are just convenience functions - everything they do, can also be achieved by directly manipulating ``[a,b,c,d,e,f]``
20
-
* all manipulations can be combined - you can construct a matrix that does a rotate **and** a shear **and** a scale **and** a shift, etc. in one go. If you want to do this, have a look at the **remarks** further down or at the Adobe manual.
20
+
* all manipulations can be combined - you can construct a matrix that does a rotate **and** a shear **and** a scale **and** a shift, etc. in one go. If you however choose to do this, do have a look at the **remarks** further down or at the Adobe manual.
Obviously, changes of matrix properties and execution of matrix methods can be combined, i.e. executed consecutively. This is done by multiplying the respective matrices.
137
137
138
138
Matrix multiplications are **not commutative**, i.e. execution sequence determines the result: a **shift - rotate** does in general not equal a **rotate - shift**. So it can become quite unclear which result a transformation will yield. E.g. if you apply ``preRotate(x)`` to an arbitrary matrix ``[a,b,c,d,e,f]`` you will get matrix ``[a*cos(x)+c*sin(x), b*cos(x)+d*sin(x), -a*sin(x)+c*cos(x), -b*sin(x)+d*cos(x), e, f]``.
Copy file name to clipboardExpand all lines: doc/html/changes.html
+5-3
Original file line number
Diff line number
Diff line change
@@ -85,11 +85,13 @@ <h1>Changes in Version 1.9.0<a class="headerlink" href="#changes-in-version-1-9-
85
85
<li>New methods <codeclass="docutils literal"><spanclass="pre">getRectArea()</span></code> for both <codeclass="docutils literal"><spanclass="pre">fitz.Rect</span></code> and <codeclass="docutils literal"><spanclass="pre">fitz.IRect</span></code></li>
86
86
<li>Pixmaps can now be created directly from files using the new constructor <codeclass="docutils literal"><spanclass="pre">fitz.Pixmap(filename)</span></code>. All of the following image formats covered by MuPDF are thus also supported as inputs for pixmaps: BMP, JPEG, JXR, PNG, GIF, TIFF.</li>
87
87
<li>The Pixmap constructor <codeclass="docutils literal"><spanclass="pre">fitz.Pixmap(data,</span><spanclass="pre">len(data))</span></code> has been extended accordingly to support the above image formats as well (not just PNG as it did in version 1.8.0).</li>
88
-
<li>Various improvements and new members in our demo and examples collections have been applied or added. Perhaps most prominently: <codeclass="docutils literal"><spanclass="pre">PDF_display</span></code> now supports scrolling with the mouse wheel, and there is a new example program <codeclass="docutils literal"><spanclass="pre">wxTableExtract</span></code> which allows to graphically identify and extract table data in documents.</li>
89
88
<li><codeclass="docutils literal"><spanclass="pre">fitz.Rect</span></code> objects can now be created with all possible combinations of points and coordinates.</li>
90
-
<li>PyMuPDF classes and methods now all contain __doc__ strings, which were mostly automatically created by SWIG. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.</li>
89
+
<li>PyMuPDF classes and methods now all contain __doc__ strings, most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.</li>
91
90
<li>A new method of <codeclass="docutils literal"><spanclass="pre">fitz.Document.getPermits()</span></code> returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.</li>
92
91
<li>The identity matrix <codeclass="docutils literal"><spanclass="pre">fitz.Identity</span></code> is now <strong>immutable</strong>.</li>
92
+
<li>The new method <codeclass="docutils literal"><spanclass="pre">fitz.Document.select(list)</span></code> removes all pages from an open document that are not contained in the list.</li>
93
+
<li>Various improvements and new members in our demo and examples collections. Perhaps most prominently: <codeclass="docutils literal"><spanclass="pre">PDF_display</span></code> now supports scrolling with the mouse wheel, and there is a new example program <codeclass="docutils literal"><spanclass="pre">wxTableExtract</span></code> which allows to graphically identify and extract table data in documents.</li>
94
+
<li><codeclass="docutils literal"><spanclass="pre">fitz.open()</span></code> is an alias of <codeclass="docutils literal"><spanclass="pre">fitz.Document()</span></code>.</li>
0 commit comments