Feature/etree xml parser by lukavdplas · Pull Request #38 · CentreForDigitalHumanities/ianalyzer-readers

lukavdplas · 2025-09-25T15:43:31Z

Defines an alternative XML reader based on lxml (without Beautiful soup). The class is very simple; extractors are defined using XPath. The unit tests include some examples.

This reader is generally much faster than the XMLReader, but offers fewer features. (The speed gain is mostly in the initial parsing of the XML file.)

This was built as a proof-of-concept, but we're not planning to merge it currently. We don't want to replace the XMLReader with this one (due to the lack of features), or have two readers side-by-side.

The tests currently fail for Python 3.9, due to a compatibility issue in the test itself; the reader should work fine in Python 3.9.

lukavdplas added 6 commits August 11, 2025 15:26

draft etree-based xml reader

c4f68b6

use lxml

62ae39e

add data_from_bytes / data_from_response implementation

00157ec

simplify etree reader

31850da

use iterfind for etree reader

978d600

add docstring

64a8cba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/etree xml parser#38

Feature/etree xml parser#38
lukavdplas wants to merge 6 commits intodevelopfrom
feature/etree-xml-parser

lukavdplas commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lukavdplas commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant