Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up shader,execution,expression tests with async pipelines #3125

Merged
merged 4 commits into from
Nov 2, 2023

Conversation

toji
Copy link
Member

@toji toji commented Nov 1, 2023

Some of these tests are among the longest running that we have, so hopefully this has a dramatic impact on runtimes. Profiling shows that almost all the time is being spent waiting for pipeline creation. By using createComputePipelineAsync, however, we ensure that multiple shaders in a batch can be compiling at once, which drops test run times significantly.

For example, the webgpu:shader,execution,expression,binary,af_matrix_subtraction:matrix:inputSource="const";cols=4;rows=4 test, one of the longest running in the entire CTS, was taking 5 minutes to run on my Windows machine. With this patch it completes in 46 seconds 80 seconds on the same device.

In general I'm seeing speedups of 5x-6x (sorry, it's only 4x. See comments below) on the tests that I've tried.

To demonstrate why, we can look at this chunk of a trace from the shorter webgpu:shader,execution,expression,binary,af_matrix_subtraction:matrix:inputSource="const";cols=2;rows=2 test, which was previously taking 8 seconds to run.

Screenshot 2023-11-01 104727

The blue chunks are all pipeline compiles, and you can see they dominate the runtime and are being performed serially. With this patch it completes in about 2 seconds and the tracing produces results like this:

Screenshot 2023-11-01 105615

Same amount of blue, but now more parallel!


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located in the test tree.
  • Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

@toji toji requested review from austinEng and ben-clayton November 1, 2023 18:00
Copy link
Contributor

@ben-clayton ben-clayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE. LGTM, but I think Austin should approve.

Copy link
Collaborator

@austinEng austinEng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

toji added 4 commits November 2, 2023 14:36
Some of these tests are among the longest running that we have, so
hopefully this has a dramatic impact on runtimes. Profiling shows
that almost all the time is being spent waiting for pipeline
creation. By using createComputePipelineAsync, however, we ensure
that multiple shaders in a batch can be compiling at once, which
drops test run times significantly.
@toji toji force-pushed the async-expression-pipelines branch from 57ae660 to cc35561 Compare November 2, 2023 21:37
@toji toji merged commit c17c672 into main Nov 2, 2023
1 check passed
@toji toji deleted the async-expression-pipelines branch November 2, 2023 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants