Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wgsl: Implement AbstractFloat matrix multiplcation tests #3446

Merged

Conversation

zoddicus
Copy link
Contributor

Issue #1626


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)
  • Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located in the test tree.
  • Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

@zoddicus zoddicus added the wgsl label Feb 28, 2024
@zoddicus zoddicus self-assigned this Feb 28, 2024
@zoddicus zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from 63dfbbe to 9e9521e Compare February 28, 2024 21:14
@zoddicus
Copy link
Contributor Author

PTAL, this depends on the implementation of dot, so all of the new code is in the later diff.

@@ -918,6 +918,11 @@
"webgpu:shader,execution,expression,binary,af_multiplication:scalar_vector:*": { "subcaseMS": 2025.534 },
"webgpu:shader,execution,expression,binary,af_multiplication:vector:*": { "subcaseMS": 710.667 },
"webgpu:shader,execution,expression,binary,af_multiplication:vector_scalar:*": { "subcaseMS": 2085.300 },
"webgpu:shader,execution,expression,binary,af_matrix_matrix_multiplication:matrix_matrix:*": { "subcaseMS": 0 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those binary files are huge. I'd be curious to get a sense of how long these take to run. My spidey-sense is telling me that we have too many permutations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a bit of time to at the case generation code. There isn't any obvious accidently using dense ranges vs sparse ones that I can see relative to the other floating point types.

Looking at the size of .bins, the AbstractFloat ones are actually slightly smaller than the equivalent f32 ones, propbably because though the size will be inflated due to 32 vs 64 bits, there are fewer total cases due to OOB being trimmed.

Time to generate the case cache doesn't appear to be significantly affected by whether or not this PR is present.

With a pre-built case cache, running locally on lava pipe,
-j 1 'webgpu:shader,execution,expression,binary,af_matrix_scalar_multiplication,*' takes 5m30.410694302s

and

-j 1 'webgpu:shader,execution,expression,binary,f32_matrix_scalar_multiplication,*' takes 14.364973476s

So yeah, AF is much much slower than f32, but I think that is because of the all of the extraction logic that needs to be run in the shader vs cache size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for doing investigations here.
5½ minutes is a huge amount of time for a single test case. I strongly suspect that's going to cause timeouts for our test runner and likely others. I think we're going to have to take a look at reducing the case permutations.

@zoddicus zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from 9e9521e to c362232 Compare March 5, 2024 18:50
import { makeCaseCache } from '../case_cache.js';

const sf = new StochasticFilter(0);
const filter_percentage = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document.

@ben-clayton
Copy link
Contributor

GitHub just threw my comment on the floor. Re-writing...

'finite',
FP.abstract.multiplicationMatrixMatrixInterval
)
.filter(c => filter_percentage <= crc32(c.input.toString()) % 100);
Copy link
Contributor

@ben-clayton ben-clayton Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be preferable to specify an absolute target value instead of a percentage. The target value could then be used to calculate the relative threshold (0xffffffff*n/count >= crc32(c.input.toString()) or something)

This could also be put in a reusable helper function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zoddicus zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from 8f702cf to c8800ee Compare March 6, 2024 18:37
@zoddicus
Copy link
Contributor Author

zoddicus commented Mar 6, 2024

PTAL, this should be ready for review

return cases;
}

return cases.filter(c => n * (0xffff_ffff / count) > crc32(c.input.toString()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: Should we xor-in another crc32 of a string parameter (i.e. cache key name)? I'm thinking that we'll get the same N-cases selected for different tests with the same parameters. XOR-ing in something unique to the test would prevent this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point

Copy link
Contributor

@ben-clayton ben-clayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving - although if you like the xor idea, it might be good to do that as part of this PR, to avoid bloating the git history.

@zoddicus zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from c8800ee to efa7b4f Compare March 6, 2024 19:10
@zoddicus zoddicus merged commit a93708e into gpuweb:main Mar 6, 2024
1 check passed
@zoddicus zoddicus deleted the addAbstractFloatMatrixMultiplicationTests branch March 6, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants