wgsl: Implement AbstractFloat matrix multiplcation tests #3446

zoddicus · 2024-02-28T21:14:33Z

Issue #1626

Requirements for PR author:

All missing test coverage is tracked with "TODO" or .unimplemented().
New helpers are /** documented */ and new helper files are found in helper_index.txt.
Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)
Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)

Requirements for reviewer sign-off:

Tests are properly located in the test tree.
Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

zoddicus · 2024-02-28T21:15:21Z

PTAL, this depends on the implementation of dot, so all of the new code is in the later diff.

ben-clayton · 2024-02-29T09:59:12Z

src/webgpu/listing_meta.json

@@ -918,6 +918,11 @@
  "webgpu:shader,execution,expression,binary,af_multiplication:scalar_vector:*": { "subcaseMS": 2025.534 },
  "webgpu:shader,execution,expression,binary,af_multiplication:vector:*": { "subcaseMS": 710.667 },
  "webgpu:shader,execution,expression,binary,af_multiplication:vector_scalar:*": { "subcaseMS": 2085.300 },
+  "webgpu:shader,execution,expression,binary,af_matrix_matrix_multiplication:matrix_matrix:*": { "subcaseMS": 0 },


Those binary files are huge. I'd be curious to get a sense of how long these take to run. My spidey-sense is telling me that we have too many permutations here.

Took a bit of time to at the case generation code. There isn't any obvious accidently using dense ranges vs sparse ones that I can see relative to the other floating point types.

Looking at the size of .bins, the AbstractFloat ones are actually slightly smaller than the equivalent f32 ones, propbably because though the size will be inflated due to 32 vs 64 bits, there are fewer total cases due to OOB being trimmed.

Time to generate the case cache doesn't appear to be significantly affected by whether or not this PR is present.

With a pre-built case cache, running locally on lava pipe,
-j 1 'webgpu:shader,execution,expression,binary,af_matrix_scalar_multiplication,*' takes 5m30.410694302s

and

-j 1 'webgpu:shader,execution,expression,binary,f32_matrix_scalar_multiplication,*' takes 14.364973476s

So yeah, AF is much much slower than f32, but I think that is because of the all of the extraction logic that needs to be run in the shader vs cache size.

Thank you for doing investigations here.
5½ minutes is a huge amount of time for a single test case. I strongly suspect that's going to cause timeouts for our test runner and likely others. I think we're going to have to take a look at reducing the case permutations.

ben-clayton · 2024-03-06T14:47:10Z

src/webgpu/shader/execution/expression/binary/af_matrix_matrix_multiplication.cache.ts

 import { makeCaseCache } from '../case_cache.js';

-const sf = new StochasticFilter(0);
+const filter_percentage = 50;


Please document.

ben-clayton · 2024-03-06T15:03:52Z

GitHub just threw my comment on the floor. Re-writing...

ben-clayton · 2024-03-06T15:06:13Z

src/webgpu/shader/execution/expression/binary/af_matrix_matrix_multiplication.cache.ts

+              'finite',
+              FP.abstract.multiplicationMatrixMatrixInterval
+            )
+            .filter(c => filter_percentage <= crc32(c.input.toString()) % 100);


It might be preferable to specify an absolute target value instead of a percentage. The target value could then be used to calculate the relative threshold (0xffffffff*n/count >= crc32(c.input.toString()) or something)

This could also be put in a reusable helper function.

zoddicus · 2024-03-06T18:40:06Z

PTAL, this should be ready for review

ben-clayton · 2024-03-06T18:43:30Z

src/webgpu/shader/execution/expression/case.ts

+    return cases;
+  }
+
+  return cases.filter(c => n * (0xffff_ffff / count) > crc32(c.input.toString()));


Idea: Should we xor-in another crc32 of a string parameter (i.e. cache key name)? I'm thinking that we'll get the same N-cases selected for different tests with the same parameters. XOR-ing in something unique to the test would prevent this.

That is a good point

ben-clayton

Approving - although if you like the xor idea, it might be good to do that as part of this PR, to avoid bloating the git history.

Issue gpuweb#1626

zoddicus added the wgsl label Feb 28, 2024

zoddicus requested review from dj2, amaiorano, ben-clayton and dneto0 February 28, 2024 21:14

zoddicus self-assigned this Feb 28, 2024

zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from 63dfbbe to 9e9521e Compare February 28, 2024 21:14

ben-clayton reviewed Feb 29, 2024

View reviewed changes

zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from 9e9521e to c362232 Compare March 5, 2024 18:50

ben-clayton reviewed Mar 6, 2024

View reviewed changes

zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from 8f702cf to c8800ee Compare March 6, 2024 18:37

ben-clayton reviewed Mar 6, 2024

View reviewed changes

ben-clayton approved these changes Mar 6, 2024

View reviewed changes

zoddicus added 3 commits March 6, 2024 13:48

wgsl: Implement AbstractFloat matrix multiplcation tests

2f24495

Issue gpuweb#1626

Add stochastic filtering to AF matrix * matrix cases

a2b2b9f

Use hash based filtering instead of stochastic

efa7b4f

zoddicus force-pushed the addAbstractFloatMatrixMultiplicationTests branch from c8800ee to efa7b4f Compare March 6, 2024 19:10

zoddicus merged commit a93708e into gpuweb:main Mar 6, 2024
1 check passed

zoddicus deleted the addAbstractFloatMatrixMultiplicationTests branch March 6, 2024 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wgsl: Implement AbstractFloat matrix multiplcation tests #3446

wgsl: Implement AbstractFloat matrix multiplcation tests #3446

zoddicus commented Feb 28, 2024

zoddicus commented Feb 28, 2024

ben-clayton Feb 29, 2024

zoddicus Feb 29, 2024

ben-clayton Mar 4, 2024

ben-clayton Mar 6, 2024

ben-clayton commented Mar 6, 2024

ben-clayton Mar 6, 2024 •

edited

Loading

zoddicus Mar 6, 2024

zoddicus commented Mar 6, 2024

ben-clayton Mar 6, 2024

zoddicus Mar 6, 2024

ben-clayton left a comment

wgsl: Implement AbstractFloat matrix multiplcation tests #3446

wgsl: Implement AbstractFloat matrix multiplcation tests #3446

Conversation

zoddicus commented Feb 28, 2024

zoddicus commented Feb 28, 2024

ben-clayton Feb 29, 2024

Choose a reason for hiding this comment

zoddicus Feb 29, 2024

Choose a reason for hiding this comment

ben-clayton Mar 4, 2024

Choose a reason for hiding this comment

ben-clayton Mar 6, 2024

Choose a reason for hiding this comment

ben-clayton commented Mar 6, 2024

ben-clayton Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

zoddicus Mar 6, 2024

Choose a reason for hiding this comment

zoddicus commented Mar 6, 2024

ben-clayton Mar 6, 2024

Choose a reason for hiding this comment

zoddicus Mar 6, 2024

Choose a reason for hiding this comment

ben-clayton left a comment

Choose a reason for hiding this comment

ben-clayton Mar 6, 2024 •

edited

Loading