pymupdf
diff --git a/‎doc/PyMuPDF.pdf
4.71 KB b/‎doc/PyMuPDF.pdf
4.71 KB
diff --git a/‎doc/html/.doctrees/changes.doctree
1.2 KB b/‎doc/html/.doctrees/changes.doctree
1.2 KB
diff --git a/‎doc/html/.doctrees/document.doctree
14 KB b/‎doc/html/.doctrees/document.doctree
14 KB
diff --git a/‎doc/html/.doctrees/environment.pickle
1.47 KB b/‎doc/html/.doctrees/environment.pickle
1.47 KB
diff --git a/‎doc/html/.doctrees/matrix.doctree
148 Bytes b/‎doc/html/.doctrees/matrix.doctree
148 Bytes
diff --git a/‎doc/html/.doctrees/pixmap.doctree
407 Bytes b/‎doc/html/.doctrees/pixmap.doctree
407 Bytes
diff --git a/‎doc/html/.doctrees/rect.doctree
-441 Bytes b/‎doc/html/.doctrees/rect.doctree
-441 Bytes
diff --git a/‎doc/html/_sources/changes.txt
+5-3 b/‎doc/html/_sources/changes.txt
+5-3
diff --git a/‎doc/html/_sources/document.txt
+60-3 b/‎doc/html/_sources/document.txt
+60-3
diff --git a/‎doc/html/_sources/matrix.txt
+3-3 b/‎doc/html/_sources/matrix.txt
+3-3
diff --git a/‎doc/html/_sources/pixmap.txt
+3-3 b/‎doc/html/_sources/pixmap.txt
+3-3
diff --git a/‎doc/html/_sources/rect.txt
+1-1 b/‎doc/html/_sources/rect.txt
+1-1
diff --git a/‎doc/html/app2.html
+1-1 b/‎doc/html/app2.html
+1-1
diff --git a/‎doc/html/changes.html
+5-3 b/‎doc/html/changes.html
+5-3
diff --git a/‎doc/html/classes.html
+1-1 b/‎doc/html/classes.html
+1-1
@@ -10,8 +10,10 @@ Changes in these bindings compared to version 1.8.0 are the following:
 * New methods ``getRectArea()`` for both ``fitz.Rect`` and ``fitz.IRect``
 * Pixmaps can now be created directly from files using the new constructor ``fitz.Pixmap(filename)``. All of the following image formats covered by MuPDF are thus also supported as inputs for pixmaps: BMP, JPEG, JXR, PNG, GIF, TIFF.
 * The Pixmap constructor ``fitz.Pixmap(data, len(data))`` has been extended accordingly to support the above image formats as well (not just PNG as it did in version 1.8.0).
-* Various improvements and new members in our demo and examples collections have been applied or added. Perhaps most prominently: ``PDF_display`` now supports scrolling with the mouse wheel, and there is a new example program ``wxTableExtract`` which allows to graphically identify and extract table data in documents.
 * ``fitz.Rect`` objects can now be created with all possible combinations of points and coordinates.
-* PyMuPDF classes and methods now all contain  __doc__ strings, which were mostly automatically created by SWIG. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
+* PyMuPDF classes and methods now all contain  __doc__ strings,  most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.
 * A new method of ``fitz.Document.getPermits()`` returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.
-* The identity matrix ``fitz.Identity`` is now **immutable**.
+* The identity matrix ``fitz.Identity`` is now **immutable**.
+* The new method ``fitz.Document.select(list)`` removes all pages from an open document that are not contained in the list.
+* Various improvements and new members in our demo and examples collections. Perhaps most prominently: ``PDF_display`` now supports scrolling with the mouse wheel, and there is a new example program ``wxTableExtract`` which allows to graphically identify and extract table data in documents.
+* ``fitz.open()`` is an alias of ``fitz.Document()``.
@@ -10,9 +10,11 @@ Document
 
 This class represents a document. It can be constructed from a file or from memory. See below for details.
 
-=============================== ================================================
+Since version 1.9.0 there exists an alias ``open`` for this class.
+
+=============================== =====================================================
 **Method / Attribute**          **Short Description**
-=============================== ================================================
+=============================== =====================================================
 :meth:`Document.authenticate`   decrypts the document
 :meth:`Document.loadPage`       reads a page
 :meth:`Document.save`           saves a copy of the document
@@ -21,14 +23,15 @@ This class represents a document. It can be constructed from a file or from memo
 :meth:`Document.getPageText`    extracts the text of a page by its number
 :meth:`Document.getPermits`     show permissions to access the document
 :meth:`Document.close`          closes the document
+:meth:`Document.select`         selects pages from a document, discards the rest
 :attr:`Document.isClosed`       has document been closed?
 :attr:`Document.outline`        first `Outline` item
 :attr:`Document.name`           filename of document
 :attr:`Document.needsPass`      require password to access data?
 :attr:`Document.isEncrypted`    is document still encrypted?
 :attr:`Document.pageCount`      the document's number of pages
 :attr:`Document.metadata`       the document's meta data
-=============================== ================================================
+=============================== =====================================================
 
 **Class API**
 
@@ -133,6 +136,16 @@ This class represents a document. It can be constructed from a file or from memo
 
       :rtype: dict
 
+    .. method:: select(list)
+
+      Retains only those pages in the document that occur in the list. Empty lists or elements outside the range 0 to ``doc.pageCount - 1`` will cause a ``ValueError``. For more details see remarks at the bottom or this chapter. Only PDF documents are supported by this method.
+
+      :param `list`: A list of integers (zero-based) naming the pages to be included. Pages not occurring in the list will be deleted (from memory) and become unavailable until the document is reopened. Page numbers can occur multiple times and be in any order.
+
+      :type `list`: list
+
+      :rtype: int
+
     .. method:: save(outfile, garbage=0, clean=0, deflate=0, incremental=0, ascii=0, expand=0, linear=0)
 
       Saves a copy of the document under the name ``outfile``. Include path specifications as necessary. Only PDF documents are supported by this function. Internally the document may have changed. E.g. after a successfull ``authenticate``, a decrypted copy will be saved, and, in addition (even without any of the optional parameters), some basic cleaning of the document data could also have occurred, e.g. broken xref tables have been corrected and incremental changes have been resolved.
@@ -223,3 +236,47 @@ This class represents a document. It can be constructed from a file or from memo
       Contains the number of pages of the document. May return 0 for documents with no pages.
 
       :rtype: int
+
+Remarks on ``select()``
+------------------------
+
+Page numbers in the list need not be unique nor be in any particular sequence. This makes the method a versatile utility to e.g. select only even or odd pages, re-arrange a document from back to front, duplicate it, and so forth. In combination with text extraction you can also omit / include pages with no text or certain text, etc.
+
+You can execute several selections in a row. The document structure will be kept updated.
+
+Any of those actions will become permanent only with a ``doc.save()``. Do not forget to specify the ``garbage=3`` option to eventually reduce the resulting document's size.
+
+It should also be noted, that this method **preserves all links, annotations and bookmarks** that are still valid. In other words: deleting pages only deletes references that would otherwise point to nowhere.
+
+Examples
+---------
+
+Create a document copy with only pages containing text:
+::
+ import fitz
+ doc = fitz.open("any.pdf")
+ r = list(range(doc.pageCount))                 # list of all pages
+ for i in list(range(doc.pageCount)):
+     txt = doc.getPageText(i)                   # get the page's text
+     if not txt:                                # nothing there
+         r.remove(i)                            # remove page number from list
+ doc.select(r)                                  # apply the list
+ doc.save("out.pdf", garbage=3)                 # save the resulting PDF
+
+
+Concatenate a document with itself:
+::
+ import fitz
+ doc = fitz.open("any.pdf")
+ r = list(range(doc.pageCount))
+ r += r                                         # turn PDF into a copy of itself
+ doc.select(r)
+ doc.save("out.pdf", garbage=3)
+
+Create document copy in reverse page order:
+::
+ import fitz
+ doc = fitz.open("any.pdf")
+ r = list(range(doc.pageCount-1, -1, -1))
+ doc.select(r)
+ doc.save("out.pdf", garbage=3)
@@ -17,7 +17,7 @@ Since all points or pixels live in a two-dimensional space, one column vector of
 It should be noted, that
 
     * the below methods are just convenience functions - everything they do, can also be achieved by directly manipulating ``[a,b,c,d,e,f]``
-    * all manipulations can be combined - you can construct a matrix that does a rotate **and** a shear **and** a scale **and** a shift, etc. in one go. If you want to do this, have a look at the **remarks** further down or at the Adobe manual.
+    * all manipulations can be combined - you can construct a matrix that does a rotate **and** a shear **and** a scale **and** a shift, etc. in one go. If you however choose to do this, do have a look at the **remarks** further down or at the Adobe manual.
 
 
 ========================= ============================
@@ -131,8 +131,8 @@ It should be noted, that
 
       :type: float
 
-**Remarks**
-
+Remarks
+--------
 Obviously, changes of matrix properties and execution of matrix methods can be combined, i.e. executed consecutively. This is done by multiplying the respective matrices.
 
 Matrix multiplications are **not commutative**, i.e. execution sequence determines the result: a **shift - rotate** does in general not equal a **rotate - shift**. So it can become quite unclear which result a transformation will yield. E.g. if you apply ``preRotate(x)`` to an arbitrary matrix ``[a,b,c,d,e,f]`` you will get matrix ``[a*cos(x)+c*sin(x), b*cos(x)+d*sin(x), -a*sin(x)+c*cos(x), -b*sin(x)+d*cos(x), e, f]``.
 
@@ -64,7 +64,7 @@ Please have a look at the **example** section to see some pixmap usage "at work"
       This constructor creates a (non-empty) pixmap from file ``filename``, which is assumed to contain a supported image.
 
       :param `filename`: Path / name of the file. The origin of the resulting pixmap is (0,0).
-      :type `data`: string
+      :type `filename`: string
 
    .. method:: __init__(self, data, len)
 
@@ -244,8 +244,8 @@ Please have a look at the **example** section to see some pixmap usage "at work"
 
       :rtype: bool
 
-Supported Pixmap Image Types
------------------------------
+Supported Pixmap Construction Image Types
+-------------------------------------------
 Support includes the following file types: BMP, JPEG, GIF, TIFF, JXR, and PNG.
 
 Details on Saving Images with ``writeImage()``
 
@@ -55,7 +55,7 @@ A rectangle is called "finite" if x0 <= x1 and y0 <= y1 is true, else "infinite"
       Transforms ``Rect`` with a :ref:`Matrix`.
 
       :param `m`: The matrix to be used for the transformation.
-      :param `m`: :ref:`Matrix`
+      :type `m`: :ref:`Matrix`
       :rtype: :ref:`Rect`
 
    .. method:: getRectArea(unit = 'pt')
 
@@ -236,7 +236,7 @@ <h3>Navigation</h3>
     </div>
     <div class="footer" role="contentinfo">
         &copy; Copyright 2016, Ruikai Liu, Jorj McKie.
-      Last updated on 26. Apr 2016.
+      Last updated on 29. Apr 2016.
       Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.1.
     </div>
   </body>
 
@@ -85,11 +85,13 @@ <h1>Changes in Version 1.9.0<a class="headerlink" href="#changes-in-version-1-9-
 <li>New methods <code class="docutils literal"><span class="pre">getRectArea()</span></code> for both <code class="docutils literal"><span class="pre">fitz.Rect</span></code> and <code class="docutils literal"><span class="pre">fitz.IRect</span></code></li>
 <li>Pixmaps can now be created directly from files using the new constructor <code class="docutils literal"><span class="pre">fitz.Pixmap(filename)</span></code>. All of the following image formats covered by MuPDF are thus also supported as inputs for pixmaps: BMP, JPEG, JXR, PNG, GIF, TIFF.</li>
 <li>The Pixmap constructor <code class="docutils literal"><span class="pre">fitz.Pixmap(data,</span> <span class="pre">len(data))</span></code> has been extended accordingly to support the above image formats as well (not just PNG as it did in version 1.8.0).</li>
-<li>Various improvements and new members in our demo and examples collections have been applied or added. Perhaps most prominently: <code class="docutils literal"><span class="pre">PDF_display</span></code> now supports scrolling with the mouse wheel, and there is a new example program <code class="docutils literal"><span class="pre">wxTableExtract</span></code> which allows to graphically identify and extract table data in documents.</li>
 <li><code class="docutils literal"><span class="pre">fitz.Rect</span></code> objects can now be created with all possible combinations of points and coordinates.</li>
-<li>PyMuPDF classes and methods now all contain  __doc__ strings, which were mostly automatically created by SWIG. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.</li>
+<li>PyMuPDF classes and methods now all contain  __doc__ strings,  most of them created by SWIG automatically. While the PyMuPDF documentation certainly is more detailed, this feature should help a lot when programming in Python-aware IDEs.</li>
 <li>A new method of <code class="docutils literal"><span class="pre">fitz.Document.getPermits()</span></code> returns the permissions associated with the current access to the document (print, edit, annotate, copy), as a Python dictionary.</li>
 <li>The identity matrix <code class="docutils literal"><span class="pre">fitz.Identity</span></code> is now <strong>immutable</strong>.</li>
+<li>The new method <code class="docutils literal"><span class="pre">fitz.Document.select(list)</span></code> removes all pages from an open document that are not contained in the list.</li>
+<li>Various improvements and new members in our demo and examples collections. Perhaps most prominently: <code class="docutils literal"><span class="pre">PDF_display</span></code> now supports scrolling with the mouse wheel, and there is a new example program <code class="docutils literal"><span class="pre">wxTableExtract</span></code> which allows to graphically identify and extract table data in documents.</li>
+<li><code class="docutils literal"><span class="pre">fitz.open()</span></code> is an alias of <code class="docutils literal"><span class="pre">fitz.Document()</span></code>.</li>
 </ul>
 </div>
 
@@ -116,7 +118,7 @@ <h3>Navigation</h3>
     </div>
     <div class="footer" role="contentinfo">
         &copy; Copyright 2016, Ruikai Liu, Jorj McKie.
-      Last updated on 27. Apr 2016.
+      Last updated on 30. Apr 2016.
       Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.1.
     </div>
   </body>
 
@@ -123,7 +123,7 @@ <h3>Navigation</h3>
     </div>
     <div class="footer" role="contentinfo">
         &copy; Copyright 2016, Ruikai Liu, Jorj McKie.
-      Last updated on 28. Apr 2016.
+      Last updated on 30. Apr 2016.
       Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.4.1.
     </div>
   </body>