Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expect-no-linked-resources Document-Policy to Speculative parsing #10718

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alexnj
Copy link

@alexnj alexnj commented Oct 23, 2024

User Agents have implemented speculative parsing of HTML to speculatively fetch resources that are present in the HTML markup, to speed up page loading. For the vast majority of pages on the Web that have resources declared in the HTML markup, the optimization is beneficial and the cost paid in determining such resources is a sound tradeoff. However, the following scenarios might result in a sub-optimal performance tradeoff vs. the explicit time spent parsing HTML for determining sub resources to fetch:

  • Pages that do not have any resources declared in the HTML.
  • Large HTML pages with minimal or no resource loads that could explicitly control preloading resources via other preload mechanisms available.

This proposal introduces a configuration point in Document Policy by the name expect-no-linked-resources to hint to a User Agent that it may choose to optimize out the time spent in such sub resource determination.

Read the complete Explainer and spec changes proposed that covers the changes in this PR.


/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/dom.html ( diff )
/index.html ( diff )
/infrastructure.html ( diff )
/parsing.html ( diff )
/references.html ( diff )
/structured-data.html ( diff )
/urls-and-fetching.html ( diff )

@domenic domenic added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser labels Oct 23, 2024
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an extra commit with small fixes I noticed during a final review.

Otherwise, this editorially LGTM.

I know this was discussed at the WebPerf WG and there was some general support from multiple implementers. And you're working on standards positions now. But if any implementers want to comment here, that'd be very welcome! I'll tag this as agenda+ to get some attention from the WHATNOT meeting crowd.

It's also noteworthy that this is the first document policy feature in HTML, and Document Policy itself is not yet integrated into HTML: https://wicg.github.io/document-policy/#integration-with-html . If this feature gets multi-implementer interest, than we should work on doing that integration sooner rather than later. /cc @clelland

@domenic domenic added the agenda+ To be discussed at a triage meeting label Oct 23, 2024
@smaug----
Copy link

Isn't this basically hacking around some implementation limitations in blink (and maybe in webkit)? Gecko doesn't in general need to do separate speculation passes.

@past past removed the agenda+ To be discussed at a triage meeting label Oct 24, 2024
@past past added the agenda+ To be discussed at a triage meeting label Nov 6, 2024
@alexnj
Copy link
Author

alexnj commented Nov 7, 2024

If the speculative parsing step isn't conducted as a separate step, there might trivial to no benefit for Gecko I'd imagine. If the documented implementation behavior still holds true, there might be a non-trivial benefit to Gecko for the speculation failure scenario (but only in cases where the web dev is able to hint to Gecko).

The HTML spec doesn't demand whether speculation should be conducted as a separate step, or inline with document parsing. The language of the spec seems written in a way that the implementation of parser vs. speculative scanner are independent — eg. Bytes pushed into the HTML parser's input byte stream must also be pushed into the speculative HTML parser's input byte stream — thereby making the separate cost incurred for speculation a possibility. I'd imagine that doing it inline with parsing might still be a trade off given that parser is specified to stop on encountering scripts. Or may be there's a way to continue tokenizing with the risk of discarding if DOM was indeed modified. Perhaps thats what Gecko does today.

I did a rough benchmark of medium to complex page sets with a fresh Chromium release build, and it seems to spend between ~70-100 ms in scanning the HTML spec, which I'd think is an extreme example given the nature of that page. The Web Bluetooth spec seems to take between 15-20 ms. For something rather simple, like CC 3.0 license, it seems to spend about 5 ms on average. These were measured on capable box, equivalent of an M1 Max Macbook, so I'd guess the gains might be much bigger for slower CPUs and hardware. My understanding today is that there's a non-trivial performance advantage to be had, depending on hardware, for pages that don't benefit from speculating resource URLs to fetch. The Origin Trial I ran in Chrome concurs with the same. The directive expect-no-linked-resources would provide a means for a web dev to assist the user agent in (avoiding) spending such resources.

So the open questions in my mind at the moment are:

  • Should Gecko measure the exact time spent on these pages to speculate, to confirm that there is no overhead (If it's not well established already)?
  • If there is trivial to no benefit to Gecko, would it still make sense to proceed to standardize for engines that implemented speculative parsing as an independent step as hinted in the HTML spec? Gecko could choose to not action the hint.

@annevk
Copy link
Member

annevk commented Nov 7, 2024

I think generally if the consensus is that engines could do more work to make it faster, we don't ask web developers to put in the work to make it faster. See priority of constituencies.

@domenic
Copy link
Member

domenic commented Nov 13, 2024

I think generally if the consensus is that engines could do more work to make it faster

It's not clear to me that this is the case. My understanding is that we have two different types of tokenizer + tree builder + parser + speculative parser architectures:

  • The Gecko one, which does more work as part of a single pass;
  • The WebKit/Blink one, in which the speculative parsing is done more separately.

The WebKit/Blink architecture benefits from the expect-no-linked-resources hint, whereas the Gecko architecture does not. But, we don't have any evidence that in general the Gecko architecture is superior to the WebKit/Blink one.

Stated another way, we have the following four scenarios:

  • GN: Gecko + no-hint
  • WN: WebKit/Blink + no-hint
  • GH: Gecko + hint
  • WH: WebKit/Blink + hint

We know that GN = GH, and WH > WN. But we don't have any information on the relationship between G and WN, or G and WH.

If G >= WN and G >= WH for all possible websites, then I agree that this feature is not very aligned with the priority of constituencies, and WebKit/Blink should move to the Gecko architecture since it is always faster.

But I suspect there are cases where WN > G, and especially that there are cases where WH > G. In that case, this feature adds value to the web, by allowing the combined forces of web developers (via the hint) and browser implementations (via the in-this-scenario-faster WebKit/Blink architecture) to speed up page loads beyond what's possible with just the Gecko architecture.

@alexnj
Copy link
Author

alexnj commented Nov 19, 2024

I think generally if the consensus is that engines could do more work to make it faster, we don't ask web developers to put in the work to make it faster. See priority of constituencies.

I do agree with the first statement. However, I do not think this is against the priority of constituencies.

This hint is very similar to link[preload] which is also an indicative signal like this in nature, from the page (or web developer), and it's all still in the best interest of users. Without the hint, on pages that would benefit from it, the UA would spend compute and resources wastefully, which is not in favor of the user or user experience. Much like the preload hint, the UA has no conclusive way to derive the same signal on its own.

@smaug----
Copy link

  • The Gecko one, which does more work as part of a single pass;

It doesn't really do more work, or that work is rather minimal. You know anyhow while parsing that you have links to other resources so it is cheap to reuse that information for speculative loads.

This hint is very similar to link[preload]

Well, it would be opposite of that.

The flag as defined in the pr would be bad for performance in case the page then does want to use explicit preloads. Speculative parsing could have started those loads way before the explicit preload would happen. (That is at least in Gecko which doesn't need the separate pass).

@past past removed the agenda+ To be discussed at a triage meeting label Nov 21, 2024
alexnj added a commit to alexnj/wpt that referenced this pull request Dec 10, 2024
This PR adds a tentative and optional test case for
whatwg/html#10718
domenic pushed a commit to web-platform-tests/wpt that referenced this pull request Dec 11, 2024
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Dec 13, 2024
…-no-linked-resources, a=testonly

Automatic update from web-platform-tests
Add WPT test for Document-Policy: expect-no-linked-resources

Add a tentative and optional test case for whatwg/html#10718.
--

wpt-commits: e2df4d406f762be154339148fe953629493c28c5
wpt-pr: 49617
i3roly pushed a commit to i3roly/firefox-dynasty that referenced this pull request Dec 14, 2024
…-no-linked-resources, a=testonly

Automatic update from web-platform-tests
Add WPT test for Document-Policy: expect-no-linked-resources

Add a tentative and optional test case for whatwg/html#10718.
--

wpt-commits: e2df4d406f762be154339148fe953629493c28c5
wpt-pr: 49617
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified-and-comments-removed that referenced this pull request Dec 16, 2024
…-no-linked-resources, a=testonly

Automatic update from web-platform-tests
Add WPT test for Document-Policy: expect-no-linked-resources

Add a tentative and optional test case for whatwg/html#10718.
--

wpt-commits: e2df4d406f762be154339148fe953629493c28c5
wpt-pr: 49617

UltraBlame original commit: b1a14a183e2577943fca32880cea619e42f2980f
gecko-dev-updater pushed a commit to marco-c/gecko-dev-comments-removed that referenced this pull request Dec 16, 2024
…-no-linked-resources, a=testonly

Automatic update from web-platform-tests
Add WPT test for Document-Policy: expect-no-linked-resources

Add a tentative and optional test case for whatwg/html#10718.
--

wpt-commits: e2df4d406f762be154339148fe953629493c28c5
wpt-pr: 49617

UltraBlame original commit: b1a14a183e2577943fca32880cea619e42f2980f
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Dec 16, 2024
…-no-linked-resources, a=testonly

Automatic update from web-platform-tests
Add WPT test for Document-Policy: expect-no-linked-resources

Add a tentative and optional test case for whatwg/html#10718.
--

wpt-commits: e2df4d406f762be154339148fe953629493c28c5
wpt-pr: 49617

UltraBlame original commit: b1a14a183e2577943fca32880cea619e42f2980f
ErichDonGubler pushed a commit to erichdongubler-mozilla/firefox that referenced this pull request Dec 19, 2024
…-no-linked-resources, a=testonly

Automatic update from web-platform-tests
Add WPT test for Document-Policy: expect-no-linked-resources

Add a tentative and optional test case for whatwg/html#10718.
--

wpt-commits: e2df4d406f762be154339148fe953629493c28c5
wpt-pr: 49617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser
Development

Successfully merging this pull request may close these issues.

5 participants