You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:arg bool full: whether to also include the invoker's :data:`xref` (which is zero if directly referenced by the page).
444
+
:arg bool full: whether to also include the referencer's :data:`xref`. If *True*, the returned items are one entry longer. Use this option if you need to know, whether the page directly references the font. In this case the last entry is 0. If the font is referenced by an ``/XObject`` of the page, you will find its :data:`xref` here.
445
445
446
446
:rtype: list
447
447
448
448
:returns: a list of fonts referenced by this page. Each entry looks like
@@ -457,7 +457,7 @@ For details on **embedded files** refer to Appendix 3.
457
457
* **basefont** (*str*) is the base font name,
458
458
* **name** (*str*) is the symbolic name, by which the font is referenced
459
459
* **encoding** (*str*) the font's character encoding if different from its built-in encoding (:ref:`AdobeManual`, p. 414):
460
-
* **invoker** (*int* optional) the :data:`xref` of the invoker. Zero if directly referenced by the page. Only present if *full=True*.
460
+
* **referencer** (*int* optional) the :data:`xref` of the referencer. Zero if directly referenced by the page, otherwise the xref of an XObject. Only present if *full=True*.
461
461
462
462
Example::
463
463
@@ -469,7 +469,7 @@ For details on **embedded files** refer to Appendix 3.
.. note:: This list has no duplicate entries: the combination of :data:`xref` and *name* is unique. But by themselves, each of the two may occur multiple times. Duplicate *name* entries indicate the presence of "Form XObjects" on the page, e.g. generated by :meth:`Page.showPDFpage`.
472
+
.. note:: This list has no duplicate entries: the combination of :data:`xref`, *name* and *referencer* is unique.
Copy file name to clipboardExpand all lines: docs/faq.rst
+2-45
Original file line number
Diff line number
Diff line change
@@ -560,7 +560,7 @@ This script will take a document filename and generate a text file from all of i
560
560
561
561
The document can be any supported type like PDF, XPS, etc.
562
562
563
-
The script works as a command line tool which expects the document filename supplied as a parameter. It generates one text file named "filename.txt" in the script directory. Text of pages is separated by a line "-----"::
563
+
The script works as a command line tool which expects the document filename supplied as a parameter. It generates one text file named "filename.txt" in the script directory. Text of pages is separated by a form feed character::
564
564
565
565
import sys, fitz
566
566
fname = sys.argv[1] # get document filename
@@ -588,50 +588,7 @@ See the following two section for examples and further explanations.
Please refer to the script `textboxtract.py <https://github.com/pymupdf/PyMuPDF-Utilities/blob/master/examples/textboxtract.py>`_.
592
-
593
-
It demonstrates ways to extract text contained in the following red rectangle,
594
-
595
-
.. image:: images/img-textboxtract.png
596
-
:scale:75
597
-
598
-
.. highlight:: text
599
-
600
-
by using more or less restrictive conditions to find the relevant words::
601
-
602
-
Select the words strictly contained in rectangle
603
-
------------------------------------------------
604
-
Die Altersübereinstimmung deutete darauf hin,
605
-
engen, nur 50 Millionen Jahre großen
606
-
Gesteinshagel auf den Mond traf und dabei
607
-
hinterließ – einige größer als Frankreich.
608
-
es sich um eine letzte, infernalische Welle
609
-
Geburt des Sonnensystems. Daher tauften die
610
-
das Ereignis »lunare Katastrophe«. Später
611
-
die Bezeichnung Großes Bombardement durch.
612
-
613
-
Or, more forgiving, respectively::
614
-
615
-
Select the words intersecting the rectangle
616
-
-------------------------------------------
617
-
Die Altersübereinstimmung deutete darauf hin, dass
618
-
einem engen, nur 50 Millionen Jahre großen Zeitfenster
619
-
ein Gesteinshagel auf den Mond traf und dabei unzählige
620
-
Krater hinterließ – einige größer als Frankreich. Offenbar
621
-
handelte es sich um eine letzte, infernalische Welle nach
622
-
der Geburt des Sonnensystems. Daher tauften die Caltech-
623
-
Forscher das Ereignis »lunare Katastrophe«. Später setzte
624
-
sich die Bezeichnung Großes Bombardement durch.
625
-
626
-
The latter output also includes words *intersecting* the rectangle.
627
-
628
-
.. highlight:: python
629
-
630
-
What if your **rectangle spans across more than one page**? Follow this recipe:
631
-
632
-
* Create a common list of all words of all pages which your rectangle intersects.
633
-
* When adding word items to this common list, increase their **y-coordinates** by the accumulated height of all previous pages.
634
-
591
+
There is now (v1.18.0) more than one way to achieve this. We therefore have created a `folder <https://github.com/pymupdf/PyMuPDF-Utilities/tree/master/textbox-extraction>`_ in the PyMuPDF-Utilities repository specifically dealing with this topic.
0 commit comments