Skip to content

Conversation

@Akida31
Copy link
Contributor

@Akida31 Akida31 commented Aug 15, 2025

See #119 which is superseeded by this.

I squashed the commits (if you want it to more split, please say so).
It would be nice, if you would not squash the two commits as the new benchmarks are rather large and make the other commit less nice to view (but this is not super important).

This adds some "real world" benchmarks, which are taken from
https://github.com/AndreasMadsen/htmlparser-benchmark/tree/master/files.
The main abstraction is the new function `Emitter::move_position`.
The `CallbackEmitter` uses this to track positions and spans.
For that, the tokenizer has to do some more work and emit more events,
so that the `Emitter` can track the position accurately.

An alternative was to add a `Reader` parameter to all emitter methods,
which can be used by `CallbackEmitter` to get spans from the reader.
We decided for this approach as it introduces less breaking changes and
clutter for the users which don't want to consume spans.
@untitaker untitaker merged commit f59729e into untitaker:main Aug 15, 2025
8 checks passed
@untitaker
Copy link
Owner

0.8.0 is out, thanks!

@Akida31
Copy link
Contributor Author

Akida31 commented Aug 15, 2025

Thank you for this cool library and the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants