Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compat: Refactor fwidth/Fine/Coarse for 0 storage buffers. #4128

Merged
merged 1 commit into from
Jan 9, 2025

Conversation

greggman
Copy link
Contributor

@greggman greggman commented Jan 2, 2025

Modified so this test doesn't use storage buffers by having it return values from a fragment shader as rgba32uint

@greggman greggman requested a review from jrprice January 2, 2025 22:29
pass.setPipeline(pipeline);
pass.setBindGroup(0, group);
for (let quad = 0; quad < cases.length / vectorWidth; quad++) {
for (let quad = 0; quad / vectorWidth; quad++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop condition seems wrong? Looks like the loop never executes because 0 / ... is 0 and that gets coerced to false...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha! that's embarrassing 😅

@greggman greggman force-pushed the fix-fwidth branch 2 times, most recently from 97afdbb to d43a1ce Compare January 8, 2025 21:13
@greggman
Copy link
Contributor Author

greggman commented Jan 8, 2025

Here's a fixed version. Sorry about the previous bug. I originally tried to make it work a quad at a time and I got it working but it was taking 10 seconds per case on my M1 mac allocating 1 buffer per quad to copy the texture and then mapping that 1 buffer etc.. I did some optimizations and got it down to 5 seconds but ran into another issue with that path. Eventually I realized I could do a bunch of quads at once and this is the result. I could optimize more as it's not always using all of the quads now. It uses all 256 quad (2x2 texel areas in a 512x2 texture) for f32 but less for each vectorized version (only 64 for vec4). But, it's fast enough as is, maybe 10% slower than before.

Copy link
Contributor

@jrprice jrprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Modified so this test doesn't use storage buffers by having
it return values from a fragment shader as rgba32uint
@greggman greggman enabled auto-merge (squash) January 9, 2025 02:20
@greggman greggman merged commit 077ffee into gpuweb:main Jan 9, 2025
1 check passed
@greggman greggman deleted the fix-fwidth branch January 10, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants