Skip to content

Conversation

@Akida31
Copy link
Contributor

@Akida31 Akida31 commented May 29, 2025

This adds, among other things, a reader parameter to all emitter methods which can be used by CallbackEmitter to get spans from the reader.
So this is definitely a breaking change :/

You can find some more unsquashed history at https://github.com/Akida31/html5gum/tree/add-spans-wip.

Fixes #10

I don't know whats up with the perf, but here are the results on my machine:

Perf before
file_0050_10
  Instructions:            98680574 (+0.000150%)
  L1 Accesses:            143669548 (+0.000120%)
  L2 Accesses:               129594 (+0.000772%)
  RAM Accesses:               20500 (+0.068339%)
  Estimated Cycles:       145035018 (+0.000460%)

file_0050_100
  Instructions:           985844244 (+0.000015%)
  L1 Accesses:           1433894540 (+0.000012%)
  L2 Accesses:              1132699 (+0.000088%)
  RAM Accesses:              327045 (+0.004281%)
  Estimated Cycles:      1451004610 (+0.000046%)

file_6D30_10
  Instructions:            74567938 (+0.000198%)
  L1 Accesses:            106933311 (+0.000161%)
  L2 Accesses:               127795 (+0.000783%)
  RAM Accesses:               15152 (+0.092482%)
  Estimated Cycles:       108102606 (+0.000617%)

file_6D30_100
  Instructions:           744849941 (+0.000020%)
  L1 Accesses:           1067130228 (+0.000016%)
  L2 Accesses:              1272665 (+0.000079%)
  RAM Accesses:              123103 (+0.011374%)
  Estimated Cycles:      1077802158 (+0.000062%)

data_state_10
  Instructions:                2994 (-4.710376%)
  L1 Accesses:                 3980 (-4.142582%)
  L2 Accesses:                   25 (-3.846154%)
  RAM Accesses:                 120 (-10.44776%)
  Estimated Cycles:            8305 (-7.434240%)

data_state_10000
  Instructions:              729819 (-0.020275%)
  L1 Accesses:              1011308 (-0.017005%)
  L2 Accesses:                  157 (-0.632911%)
  RAM Accesses:                 613 (-2.232855%)
  Estimated Cycles:         1033548 (-0.064493%)

tagopen_10
  Instructions:               13326 (-1.098412%)
  L1 Accesses:                19097 (-0.892625%)
  L2 Accesses:                   35 (-2.777778%)
  RAM Accesses:                 170 (-7.608696%)
  Estimated Cycles:           25222 (-2.576384%)

tagopen_10000
  Instructions:            10948522 (-0.001352%)
  L1 Accesses:             15999230 (-0.001075%)
  L2 Accesses:                  383 (-0.260417%)
  RAM Accesses:                 653 (-2.098951%)
  Estimated Cycles:        16024000 (-0.004162%)

tagopenclose_10
  Instructions:               24346 (-0.604230%)
  L1 Accesses:                35038 (-0.488498%)
  L2 Accesses:                   38 (-2.564103%)
  RAM Accesses:                 216 (-6.086957%)
  Estimated Cycles:           42788 (-1.534921%)

tagopenclose_10000
  Instructions:            21512766 (-0.000688%)
  L1 Accesses:             31382068 (-0.000548%)
  L2 Accesses:                 1287 (-0.077640%)
  RAM Accesses:                1325 (-1.045556%)
  Estimated Cycles:        31434878 (-0.002122%)

comment_10
  Instructions:               16027 (-0.914992%)
  L1 Accesses:                22798 (-0.748803%)
  L2 Accesses:                   31 (-3.125000%)
  RAM Accesses:                 192 (-6.796117%)
  Estimated Cycles:           29673 (-2.198418%)

comment_10000
  Instructions:            13509947 (-0.001095%)
  L1 Accesses:             19578814 (-0.000878%)
  L2 Accesses:                 1439 (-0.069444%)
  RAM Accesses:                1456 (-0.952381%)
  Estimated Cycles:        19636969 (-0.003397%)
Perf after
file_0050_10
  Instructions:            99232522 (+0.559479%)
  L1 Accesses:            143559098 (-0.076758%)
  L2 Accesses:               116425 (-10.16104%)
  RAM Accesses:               20523 (+0.180611%)
  Estimated Cycles:       144859528 (-0.120539%)

file_0050_100
  Instructions:           991280373 (+0.551434%)
  L1 Accesses:           1432704377 (-0.082990%)
  L2 Accesses:              1001641 (-11.57034%)
  RAM Accesses:              327075 (+0.013454%)
  Estimated Cycles:      1449160207 (-0.127066%)

file_6D30_10
  Instructions:            75412472 (+1.132771%)
  L1 Accesses:            107836618 (+0.844901%)
  L2 Accesses:                99384 (-22.23109%)
  RAM Accesses:               15175 (+0.244418%)
  Estimated Cycles:       108864663 (+0.705560%)

file_6D30_100
  Instructions:           753281589 (+1.132013%)
  L1 Accesses:           1076146640 (+0.844938%)
  L2 Accesses:               989469 (-22.25214%)
  RAM Accesses:              122650 (-0.356653%)
  Estimated Cycles:      1085386735 (+0.703770%)

data_state_10
  Instructions:                3075 (-11.58712%)
  L1 Accesses:                 4074 (-11.26116%)
  L2 Accesses:                   17 (-37.03704%)
  RAM Accesses:                 118 (-16.90141%)
  Estimated Cycles:            8289 (-14.51114%)

data_state_10000
  Instructions:              760090 (+4.078718%)
  L1 Accesses:              1061575 (+4.907112%)
  L2 Accesses:                  146 (-8.176101%)
  RAM Accesses:                 611 (-3.779528%)
  Estimated Cycles:         1083690 (+4.710519%)

tagopen_10
  Instructions:               13291 (-3.758146%)
  L1 Accesses:                18788 (-4.668155%)
  L2 Accesses:                   28 (-24.32432%)
  RAM Accesses:                 172 (-10.41667%)
  Estimated Cycles:           24948 (-6.256341%)

tagopen_10000
  Instructions:            10818837 (-1.188866%)
  L1 Accesses:             15599539 (-2.501912%)
  L2 Accesses:                  377 (-2.077922%)
  RAM Accesses:                 656 (-2.814815%)
  Estimated Cycles:        15624384 (-2.502323%)

tagopenclose_10
  Instructions:               24149 (-2.742650%)
  L1 Accesses:                34421 (-3.444697%)
  L2 Accesses:                   34 (-15.00000%)
  RAM Accesses:                 211 (-11.34454%)
  Estimated Cycles:           41976 (-4.986532%)

tagopenclose_10000
  Instructions:            21173129 (-1.580984%)
  L1 Accesses:             30622431 (-2.422508%)
  L2 Accesses:                 1282 (-0.543057%)
  RAM Accesses:                1321 (-1.930215%)
  Estimated Cycles:        30675076 (-2.421385%)

comment_10
  Instructions:               15617 (-5.414572%)
  L1 Accesses:                21934 (-6.300995%)
  L2 Accesses:                   29 (-12.12121%)
  RAM Accesses:                 189 (-11.68224%)
  Estimated Cycles:           28694 (-7.629410%)

comment_10000
  Instructions:            12980287 (-3.923961%)
  L1 Accesses:             18599145 (-5.006684%)
  L2 Accesses:                 1441 (No change)
  RAM Accesses:                1454 (-1.623816%)
  Estimated Cycles:        18657240 (-4.995937%)

@untitaker
Copy link
Owner

neat, I'll try to review this on the weekend. do you have a usecase for this or just did it out of curiosity?

@Akida31
Copy link
Contributor Author

Akida31 commented May 29, 2025

I want to extract complete subtrees of the given html as-is.
While I could rebuild them from the emitted events of the tokenizer (or dom for other parsers) it is quite nice to don't do that and take the input directly (for example the tl parser wasn't able to rebuild the html for large svgs correctly).

Additionally, being able to give better error messages with annotate-snippets is quite nice :)

Copy link
Owner

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment, i think i found the perf issue.

}
}

fn try_read_string(&mut self, s: &[u8], case_sensitive: bool) -> Result<bool, Self::Error> {
Copy link
Owner

@untitaker untitaker Jun 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since read_until is not implemented, the SpanReader will not perform well for large contiguous blocks of text such as <p>aaaaa.... aaaaa</p>

I wonder if instead of having a span reader that tracks its position, changes to the position could be emitted as "events" similar to start tags, etc:

trait Emitter {
    fn advance_position(&mut self, offset: usize) {}
}

if the method is empty, then the compiler should be able to optimize away all overhead

i think it might be simpler, but i am also fine with this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find a way to implement read_until for SpanReader since I made it also return &'b Self (to fix some borrowing issues in machine).

@untitaker
Copy link
Owner

I don't know whats up with the perf, but here are the results on my machine:

is that overhead added even to use without spans? I think this is a real problem (i can't/won't want to pay for spans when i don't need them myself), but not sure why this could happen. after all you mostly just added generics and reader args

@Akida31
Copy link
Contributor Author

Akida31 commented Jun 2, 2025

I tried a different implementation, based on your suggestion.
You can find the commits here and the wip commits here.
Note that I had to add a few more methods to Emitter, to get the span starts for CharacterReference to work out.

I'm not sure which approach I like more, what is your opinion on that?

Something, which maybe should be changed is to merge Emitter::advance_position and Emitter::previous_position (but I'll do that if you prefer this approach).

The last commit adds a ForwardEmitter. Since I added some default methods, it was quite easy to forget to override them. Therefore ForwardEmitter abstracts over DefaultEmitter and Html5everEmitter and ties the forwarding to the declaration of Emitter.

I don't know whats up with the perf, but here are the results on my machine:

is that overhead added even to use without spans? I think this is a real problem (i can't/won't want to pay for spans when i don't need them myself), but not sure why this could happen. after all you mostly just added generics and reader args

This are the perf results without spans.
But I think this is weird, if I understand the benchmark correctly, the perf improves with both approaches (which should be wrong; but less cycles should be good?).
I'd appreciate if you could run the benchmark yourself for both approaches against the baseline.

@untitaker

This comment was marked as outdated.

@untitaker
Copy link
Owner

untitaker commented Jun 21, 2025

I re-ran the benchmarks and I am not sure how i concluded there is a massive perf regression. forget about my previous comment.

i like add-spans-2 more than this PR because the breaking changes to emitter trait are gone with the exception of start_open_tag (IMO could be defaulted to noop)

But I think this is weird, if I understand the benchmark correctly, the perf improves with both approaches (which should be wrong; but less cycles should be good?).

I see a 100-instructions regression on data_state_10:

data_state_10
  Instructions:                3200 (-7.460960%)
  L1 Accesses:                 4269 (-6.871728%)
  L2 Accesses:                   28 (-9.677419%)
  RAM Accesses:                 125 (-7.407407%)
  Estimated Cycles:            8784 (-7.185123%)

from 3109 instructions on main to this on add-spans-2. here is the baseline i have on main:

toggle
data_state_10
  Instructions:                3109 (No change)
  L1 Accesses:                 4198 (No change)
  L2 Accesses:                   19 (No change)
  RAM Accesses:                 120 (No change)
  Estimated Cycles:            8493 (No change)

data_state_10000
  Instructions:              730047 (No change)
  L1 Accesses:              1011766 (No change)
  L2 Accesses:                  126 (No change)
  RAM Accesses:                 610 (No change)
  Estimated Cycles:         1033746 (No change)

tagopen_10
  Instructions:               13487 (No change)
  L1 Accesses:                19554 (No change)
  L2 Accesses:                   27 (No change)
  RAM Accesses:                 165 (No change)
  Estimated Cycles:           25464 (No change)

tagopen_10000
  Instructions:            10991245 (No change)
  L1 Accesses:             16222180 (No change)
  L2 Accesses:                  359 (No change)
  RAM Accesses:                 644 (No change)
  Estimated Cycles:        16246515 (No change)

tagopenclose_10
  Instructions:               24790 (No change)
  L1 Accesses:                36035 (No change)
  L2 Accesses:                   29 (No change)
  RAM Accesses:                 211 (No change)
  Estimated Cycles:           43565 (No change)

tagopenclose_10000
  Instructions:            21817986 (No change)
  L1 Accesses:             32107518 (No change)
  L2 Accesses:                 1261 (No change)
  RAM Accesses:                1315 (No change)
  Estimated Cycles:        32159848 (No change)

comment_10
  Instructions:               16376 (No change)
  L1 Accesses:                23418 (No change)
  L2 Accesses:                   23 (No change)
  RAM Accesses:                 194 (No change)
  Estimated Cycles:           30323 (No change)

comment_10000
  Instructions:            13750138 (No change)
  L1 Accesses:             19989200 (No change)
  L2 Accesses:                 1416 (No change)
  RAM Accesses:                1454 (No change)
  Estimated Cycles:        20047170 (No change)

I'd be ok with a PR like add-spans-2. maybe you want to make a double-take on what the breaking changes are there, but i don't see any.

it would also be cool to try and understand where this really minor perf regression comes from. for this i believe one would have to diff the ASM to really understand what's going on. but it's optional for merging IMO, because i do not see a perf regression in my own usage (https://github.com/untitaker/hyperlink)

@Akida31
Copy link
Contributor Author

Akida31 commented Jun 25, 2025

We can go forward with add-spans-2. Should I force push this PR or create a new one?
Even when making start_open_tag a default method, there are still some breaking changes. See the output of cargo-semver-checks:

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/constructible_struct_adds_field.ron

Failed in:
field StartTag.span in ~/html5gum/src/emitters/default.rs:155
field StartTag.span in ~/html5gum/src/emitters/default.rs:155
field EndTag.span in ~/html5gum/src/emitters/default.rs:164
field EndTag.span in ~/html5gum/src/emitters/default.rs:164

--- failure trait_method_added: pub trait method added ---

Description:
A non-sealed public trait added a new method without a default implementation, which breaks downstream implementations of the trait
ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-item-no-default
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/trait_method_added.ron

Failed in:
trait method html5gum::emitters::Emitter::start_open_tag in file ~/html5gum/src/emitters/emitter.rs:78
trait method html5gum::Emitter::start_open_tag in file ~/html5gum/src/emitters/emitter.rs:78

--- failure trait_method_parameter_count_changed: pub trait method parameter count changed ---

Description:
A trait method now takes a different number of parameters.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#major-any-change-to-trait-item-signatures
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/trait_method_parameter_count_changed.ron

Failed in:
Callback::handle_event now takes 3 instead of 2 parameters, in file ~/html5gum/src/emitters/callback.rs:151

--- failure trait_requires_more_generic_type_params: trait now requires more generic type parameters ---

Description:
A trait now requires more generic type parameters than it used to. Uses of this trait that supplied the previously-required number of generic types will be broken. To fix this, consider supplying default values for newly-added generic types.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-parameter-no-default
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/trait_requires_more_generic_type_params.ron

Failed in:
trait Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
trait Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
trait StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
trait StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
trait EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160
trait EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160

--- failure type_requires_more_generic_type_params: type now requires more generic type parameters ---

Description:
A type now requires more generic type parameters than it used to. Uses of this type that supplied the previously-required number of generic types will be broken. To fix this, consider supplying default values for newly-added generic types.
ref: https://doc.rust-lang.org/cargo/reference/semver.html#trait-new-parameter-no-default
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.41.0/src/lints/type_requires_more_generic_type_params.ron

Failed in:
Enum Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
Enum Token (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:191
Struct StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
Struct StartTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:141
Struct EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160
Struct EndTag (0 -> 1 required generic types) in ~/html5gum/src/emitters/default.rs:160

 Summary semver requires new major version: 5 major and 0 minor checks failed
Since I introduced `ForwardingEmitter` I'm fine with giving `start_open_tag` a default body (because else using the default implementation breaks the spans). (I pushed a commit to add-spans-2 and another commit from the change I wanted to do as well)

On my machine, data_state_10 has only a regression of 65 cycles (2971 -> 3034), but improves comment_10 from 16004 to 14949 instructions. It could be, that the compiler is not smart enough to remove the calls which do nothing. But it could also be just some shuffling around by rustc/ llvm, since the input is slightly different.
If you know a good way to get a diff from the assembly, please let me know. The assembly files I get are really large and the diff between them has almost a million lines :/

@Akida31
Copy link
Contributor Author

Akida31 commented Aug 15, 2025

Hey, I don't want to annoy you, I just want to ask briefly how/ if we can move this PR (or a new one) forward :)

@untitaker
Copy link
Owner

@Akida31 i think we can go forward with add-spans-2, don't care if you make a new PR or reuse this one. I don't mind the regressions

@Akida31 Akida31 mentioned this pull request Aug 15, 2025
@untitaker
Copy link
Owner

superseded by #120

@untitaker untitaker closed this Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attach input location to tokens (add spans feature)

2 participants