Add spans #119

Akida31 · 2025-05-29T10:55:00Z

This adds, among other things, a reader parameter to all emitter methods which can be used by CallbackEmitter to get spans from the reader.
So this is definitely a breaking change :/

You can find some more unsquashed history at https://github.com/Akida31/html5gum/tree/add-spans-wip.

Fixes #10

I don't know whats up with the perf, but here are the results on my machine:

Perf before

file_0050_10
  Instructions:            98680574 (+0.000150%)
  L1 Accesses:            143669548 (+0.000120%)
  L2 Accesses:               129594 (+0.000772%)
  RAM Accesses:               20500 (+0.068339%)
  Estimated Cycles:       145035018 (+0.000460%)

file_0050_100
  Instructions:           985844244 (+0.000015%)
  L1 Accesses:           1433894540 (+0.000012%)
  L2 Accesses:              1132699 (+0.000088%)
  RAM Accesses:              327045 (+0.004281%)
  Estimated Cycles:      1451004610 (+0.000046%)

file_6D30_10
  Instructions:            74567938 (+0.000198%)
  L1 Accesses:            106933311 (+0.000161%)
  L2 Accesses:               127795 (+0.000783%)
  RAM Accesses:               15152 (+0.092482%)
  Estimated Cycles:       108102606 (+0.000617%)

file_6D30_100
  Instructions:           744849941 (+0.000020%)
  L1 Accesses:           1067130228 (+0.000016%)
  L2 Accesses:              1272665 (+0.000079%)
  RAM Accesses:              123103 (+0.011374%)
  Estimated Cycles:      1077802158 (+0.000062%)

data_state_10
  Instructions:                2994 (-4.710376%)
  L1 Accesses:                 3980 (-4.142582%)
  L2 Accesses:                   25 (-3.846154%)
  RAM Accesses:                 120 (-10.44776%)
  Estimated Cycles:            8305 (-7.434240%)

data_state_10000
  Instructions:              729819 (-0.020275%)
  L1 Accesses:              1011308 (-0.017005%)
  L2 Accesses:                  157 (-0.632911%)
  RAM Accesses:                 613 (-2.232855%)
  Estimated Cycles:         1033548 (-0.064493%)

tagopen_10
  Instructions:               13326 (-1.098412%)
  L1 Accesses:                19097 (-0.892625%)
  L2 Accesses:                   35 (-2.777778%)
  RAM Accesses:                 170 (-7.608696%)
  Estimated Cycles:           25222 (-2.576384%)

tagopen_10000
  Instructions:            10948522 (-0.001352%)
  L1 Accesses:             15999230 (-0.001075%)
  L2 Accesses:                  383 (-0.260417%)
  RAM Accesses:                 653 (-2.098951%)
  Estimated Cycles:        16024000 (-0.004162%)

tagopenclose_10
  Instructions:               24346 (-0.604230%)
  L1 Accesses:                35038 (-0.488498%)
  L2 Accesses:                   38 (-2.564103%)
  RAM Accesses:                 216 (-6.086957%)
  Estimated Cycles:           42788 (-1.534921%)

tagopenclose_10000
  Instructions:            21512766 (-0.000688%)
  L1 Accesses:             31382068 (-0.000548%)
  L2 Accesses:                 1287 (-0.077640%)
  RAM Accesses:                1325 (-1.045556%)
  Estimated Cycles:        31434878 (-0.002122%)

comment_10
  Instructions:               16027 (-0.914992%)
  L1 Accesses:                22798 (-0.748803%)
  L2 Accesses:                   31 (-3.125000%)
  RAM Accesses:                 192 (-6.796117%)
  Estimated Cycles:           29673 (-2.198418%)

comment_10000
  Instructions:            13509947 (-0.001095%)
  L1 Accesses:             19578814 (-0.000878%)
  L2 Accesses:                 1439 (-0.069444%)
  RAM Accesses:                1456 (-0.952381%)
  Estimated Cycles:        19636969 (-0.003397%)

Perf after

file_0050_10
  Instructions:            99232522 (+0.559479%)
  L1 Accesses:            143559098 (-0.076758%)
  L2 Accesses:               116425 (-10.16104%)
  RAM Accesses:               20523 (+0.180611%)
  Estimated Cycles:       144859528 (-0.120539%)

file_0050_100
  Instructions:           991280373 (+0.551434%)
  L1 Accesses:           1432704377 (-0.082990%)
  L2 Accesses:              1001641 (-11.57034%)
  RAM Accesses:              327075 (+0.013454%)
  Estimated Cycles:      1449160207 (-0.127066%)

file_6D30_10
  Instructions:            75412472 (+1.132771%)
  L1 Accesses:            107836618 (+0.844901%)
  L2 Accesses:                99384 (-22.23109%)
  RAM Accesses:               15175 (+0.244418%)
  Estimated Cycles:       108864663 (+0.705560%)

file_6D30_100
  Instructions:           753281589 (+1.132013%)
  L1 Accesses:           1076146640 (+0.844938%)
  L2 Accesses:               989469 (-22.25214%)
  RAM Accesses:              122650 (-0.356653%)
  Estimated Cycles:      1085386735 (+0.703770%)

data_state_10
  Instructions:                3075 (-11.58712%)
  L1 Accesses:                 4074 (-11.26116%)
  L2 Accesses:                   17 (-37.03704%)
  RAM Accesses:                 118 (-16.90141%)
  Estimated Cycles:            8289 (-14.51114%)

data_state_10000
  Instructions:              760090 (+4.078718%)
  L1 Accesses:              1061575 (+4.907112%)
  L2 Accesses:                  146 (-8.176101%)
  RAM Accesses:                 611 (-3.779528%)
  Estimated Cycles:         1083690 (+4.710519%)

tagopen_10
  Instructions:               13291 (-3.758146%)
  L1 Accesses:                18788 (-4.668155%)
  L2 Accesses:                   28 (-24.32432%)
  RAM Accesses:                 172 (-10.41667%)
  Estimated Cycles:           24948 (-6.256341%)

tagopen_10000
  Instructions:            10818837 (-1.188866%)
  L1 Accesses:             15599539 (-2.501912%)
  L2 Accesses:                  377 (-2.077922%)
  RAM Accesses:                 656 (-2.814815%)
  Estimated Cycles:        15624384 (-2.502323%)

tagopenclose_10
  Instructions:               24149 (-2.742650%)
  L1 Accesses:                34421 (-3.444697%)
  L2 Accesses:                   34 (-15.00000%)
  RAM Accesses:                 211 (-11.34454%)
  Estimated Cycles:           41976 (-4.986532%)

tagopenclose_10000
  Instructions:            21173129 (-1.580984%)
  L1 Accesses:             30622431 (-2.422508%)
  L2 Accesses:                 1282 (-0.543057%)
  RAM Accesses:                1321 (-1.930215%)
  Estimated Cycles:        30675076 (-2.421385%)

comment_10
  Instructions:               15617 (-5.414572%)
  L1 Accesses:                21934 (-6.300995%)
  L2 Accesses:                   29 (-12.12121%)
  RAM Accesses:                 189 (-11.68224%)
  Estimated Cycles:           28694 (-7.629410%)

comment_10000
  Instructions:            12980287 (-3.923961%)
  L1 Accesses:             18599145 (-5.006684%)
  L2 Accesses:                 1441 (No change)
  RAM Accesses:                1454 (-1.623816%)
  Estimated Cycles:        18657240 (-4.995937%)

untitaker · 2025-05-29T11:00:53Z

neat, I'll try to review this on the weekend. do you have a usecase for this or just did it out of curiosity?

Akida31 · 2025-05-29T11:13:23Z

I want to extract complete subtrees of the given html as-is.
While I could rebuild them from the emitted events of the tokenizer (or dom for other parsers) it is quite nice to don't do that and take the input directly (for example the tl parser wasn't able to rebuild the html for large svgs correctly).

Additionally, being able to give better error messages with annotate-snippets is quite nice :)

untitaker

one comment, i think i found the perf issue.

untitaker · 2025-06-01T23:05:30Z

src/reader.rs

+        }
+    }
+
+    fn try_read_string(&mut self, s: &[u8], case_sensitive: bool) -> Result<bool, Self::Error> {


since read_until is not implemented, the SpanReader will not perform well for large contiguous blocks of text such as <p>aaaaa.... aaaaa</p>

I wonder if instead of having a span reader that tracks its position, changes to the position could be emitted as "events" similar to start tags, etc:

trait Emitter { fn advance_position(&mut self, offset: usize) {} }

if the method is empty, then the compiler should be able to optimize away all overhead

i think it might be simpler, but i am also fine with this one

I did not find a way to implement read_until for SpanReader since I made it also return &'b Self (to fix some borrowing issues in machine).

untitaker · 2025-06-01T23:09:36Z

I don't know whats up with the perf, but here are the results on my machine:

is that overhead added even to use without spans? I think this is a real problem (i can't/won't want to pay for spans when i don't need them myself), but not sure why this could happen. after all you mostly just added generics and reader args

Akida31 · 2025-06-02T18:33:25Z

I tried a different implementation, based on your suggestion.
You can find the commits here and the wip commits here.
Note that I had to add a few more methods to Emitter, to get the span starts for CharacterReference to work out.

I'm not sure which approach I like more, what is your opinion on that?

Something, which maybe should be changed is to merge Emitter::advance_position and Emitter::previous_position (but I'll do that if you prefer this approach).

The last commit adds a ForwardEmitter. Since I added some default methods, it was quite easy to forget to override them. Therefore ForwardEmitter abstracts over DefaultEmitter and Html5everEmitter and ties the forwarding to the declaration of Emitter.

I don't know whats up with the perf, but here are the results on my machine:

is that overhead added even to use without spans? I think this is a real problem (i can't/won't want to pay for spans when i don't need them myself), but not sure why this could happen. after all you mostly just added generics and reader args

This are the perf results without spans.
But I think this is weird, if I understand the benchmark correctly, the perf improves with both approaches (which should be wrong; but less cycles should be good?).
I'd appreciate if you could run the benchmark yourself for both approaches against the baseline.

untitaker · 2025-06-21T13:35:59Z

I re-ran the benchmarks and I am not sure how i concluded there is a massive perf regression. forget about my previous comment.

i like add-spans-2 more than this PR because the breaking changes to emitter trait are gone with the exception of start_open_tag (IMO could be defaulted to noop)

But I think this is weird, if I understand the benchmark correctly, the perf improves with both approaches (which should be wrong; but less cycles should be good?).

I see a 100-instructions regression on data_state_10:

data_state_10
  Instructions:                3200 (-7.460960%)
  L1 Accesses:                 4269 (-6.871728%)
  L2 Accesses:                   28 (-9.677419%)
  RAM Accesses:                 125 (-7.407407%)
  Estimated Cycles:            8784 (-7.185123%)

from 3109 instructions on main to this on add-spans-2. here is the baseline i have on main:

toggle

data_state_10
  Instructions:                3109 (No change)
  L1 Accesses:                 4198 (No change)
  L2 Accesses:                   19 (No change)
  RAM Accesses:                 120 (No change)
  Estimated Cycles:            8493 (No change)

data_state_10000
  Instructions:              730047 (No change)
  L1 Accesses:              1011766 (No change)
  L2 Accesses:                  126 (No change)
  RAM Accesses:                 610 (No change)
  Estimated Cycles:         1033746 (No change)

tagopen_10
  Instructions:               13487 (No change)
  L1 Accesses:                19554 (No change)
  L2 Accesses:                   27 (No change)
  RAM Accesses:                 165 (No change)
  Estimated Cycles:           25464 (No change)

tagopen_10000
  Instructions:            10991245 (No change)
  L1 Accesses:             16222180 (No change)
  L2 Accesses:                  359 (No change)
  RAM Accesses:                 644 (No change)
  Estimated Cycles:        16246515 (No change)

tagopenclose_10
  Instructions:               24790 (No change)
  L1 Accesses:                36035 (No change)
  L2 Accesses:                   29 (No change)
  RAM Accesses:                 211 (No change)
  Estimated Cycles:           43565 (No change)

tagopenclose_10000
  Instructions:            21817986 (No change)
  L1 Accesses:             32107518 (No change)
  L2 Accesses:                 1261 (No change)
  RAM Accesses:                1315 (No change)
  Estimated Cycles:        32159848 (No change)

comment_10
  Instructions:               16376 (No change)
  L1 Accesses:                23418 (No change)
  L2 Accesses:                   23 (No change)
  RAM Accesses:                 194 (No change)
  Estimated Cycles:           30323 (No change)

comment_10000
  Instructions:            13750138 (No change)
  L1 Accesses:             19989200 (No change)
  L2 Accesses:                 1416 (No change)
  RAM Accesses:                1454 (No change)
  Estimated Cycles:        20047170 (No change)

I'd be ok with a PR like add-spans-2. maybe you want to make a double-take on what the breaking changes are there, but i don't see any.

it would also be cool to try and understand where this really minor perf regression comes from. for this i believe one would have to diff the ASM to really understand what's going on. but it's optional for merging IMO, because i do not see a perf regression in my own usage (https://github.com/untitaker/hyperlink)

Akida31 · 2025-06-25T07:09:58Z

We can go forward with add-spans-2. Should I force push this PR or create a new one?
Even when making start_open_tag a default method, there are still some breaking changes. See the output of cargo-semver-checks:

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/constructible_struct_adds_field.ron

Failed in:
field StartTag.span in ~/html5gum/src/emitters/default.rs:155
field StartTag.span in ~/html5gum/src/emitters/default.rs:155
field EndTag.span in ~/html5gum/src/emitters/default.rs:164
field EndTag.span in ~/html5gum/src/emitters/default.rs:164

--- failure trait_method_added: pub trait method added ---

Description:
A non-sealed public trait added a new method without a default implementation, which breaks downstream implementations of the trait
ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-item-no-default
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/trait_method_added.ron

Failed in:
trait method html5gum::emitters::Emitter::start_open_tag in file ~/html5gum/src/emitters/emitter.rs:78
trait method html5gum::Emitter::start_open_tag in file ~/html5gum/src/emitters/emitter.rs:78

--- failure trait_method_parameter_count_changed: pub trait method parameter count changed ---

Description:
A trait method now takes a different number of parameters.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#major-any-change-to-trait-item-signatures
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/trait_method_parameter_count_changed.ron

Failed in:
Callback::handle_event now takes 3 instead of 2 parameters, in file ~/html5gum/src/emitters/callback.rs:151

--- failure trait_requires_more_generic_type_params: trait now requires more generic type parameters ---

Description:
A trait now requires more generic type parameters than it used to. Uses of this trait that supplied the previously-required number of generic types will be broken. To fix this, consider supplying default values for newly-added generic types.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-parameter-no-default
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/trait_requires_more_generic_type_params.ron

Failed in:
trait Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
trait Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
trait StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
trait StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
trait EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160
trait EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160

--- failure type_requires_more_generic_type_params: type now requires more generic type parameters ---

Description:
A type now requires more generic type parameters than it used to. Uses of this type that supplied the previously-required number of generic types will be broken. To fix this, consider supplying default values for newly-added generic types.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-parameter-no-default
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/type_requires_more_generic_type_params.ron

Failed in:
Enum Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
Enum Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
Struct StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
Struct StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
Struct EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160
Struct EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160

 Summary semver requires new major version: 5 major and 0 minor checks failed

Since I introduced `ForwardingEmitter` I'm fine with giving `start_open_tag` a default body (because else using the default implementation breaks the spans). (I pushed a commit to add-spans-2 and another commit from the change I wanted to do as well)

On my machine, data_state_10 has only a regression of 65 cycles (2971 -> 3034), but improves comment_10 from 16004 to 14949 instructions. It could be, that the compiler is not smart enough to remove the calls which do nothing. But it could also be just some shuffling around by rustc/ llvm, since the input is slightly different.
If you know a good way to get a diff from the assembly, please let me know. The assembly files I get are really large and the diff between them has almost a million lines :/

Akida31 · 2025-08-15T09:35:41Z

Hey, I don't want to annoy you, I just want to ask briefly how/ if we can move this PR (or a new one) forward :)

untitaker · 2025-08-15T10:27:57Z

@Akida31 i think we can go forward with add-spans-2, don't care if you make a new PR or reuse this one. I don't mind the regressions

untitaker · 2025-08-15T13:28:03Z

superseded by #120

Akida31 added 3 commits May 29, 2025 11:49

add file benches

1abb802

attach spans to tokens

f7bdc09

fix clippy warnings

d22c46a

fix ci

c136d9c

untitaker reviewed Jun 1, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

Akida31 mentioned this pull request Aug 15, 2025

Add spans, take 2 #120

Merged

untitaker closed this Aug 15, 2025

Add spans #119

Add spans #119

Uh oh!

Conversation

Akida31 commented May 29, 2025

Uh oh!

untitaker commented May 29, 2025

Uh oh!

Akida31 commented May 29, 2025

Uh oh!

untitaker left a comment

Choose a reason for hiding this comment

Uh oh!

untitaker Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Akida31 Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

untitaker commented Jun 1, 2025

Uh oh!

Akida31 commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

untitaker commented Jun 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Akida31 commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Akida31 commented Aug 15, 2025

Uh oh!

untitaker commented Aug 15, 2025

Uh oh!

untitaker commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

untitaker Jun 1, 2025 •

edited

Loading

Akida31 commented Jun 2, 2025 •

edited

Loading

untitaker commented Jun 21, 2025 •

edited

Loading

Akida31 commented Jun 25, 2025 •

edited

Loading