Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix device_pool refactor #3469

Merged
merged 1 commit into from
Mar 8, 2024
Merged

Fix device_pool refactor #3469

merged 1 commit into from
Mar 8, 2024

Conversation

greggman
Copy link
Contributor

@greggman greggman commented Mar 8, 2024

The old version made one device with no descriptor. If that fail then it stopped creating any devices period.

The previous refactor was bad in that if the any device failed it would just stop making devices.

This one is simplified. It just used the pool as a pool. There is no special handling for failed devices.

Note: I could check if no description is passed in and it fails then assume all other requests will fail. Should I add that in or leave it as is?


Requirements for PR author:

  • All missing test coverage is tracked with "TODO" or .unimplemented().
  • New helpers are /** documented */ and new helper files are found in helper_index.txt.
  • Test behaves as expected in a WebGPU implementation. (If not passing, explain above.)
  • Test have be tested with compatibility mode validation enabled and behave as expected. (If not passing, explain above.)

Requirements for reviewer sign-off:

  • Tests are properly located in the test tree.
  • Test descriptions allow a reader to "read only the test plans and evaluate coverage completeness", and accurately reflect the test code.
  • Tests provide complete coverage (including validation control cases). Missing coverage MUST be covered by TODOs.
  • Helpers and types promote readability and maintainability.

When landing this PR, be sure to make any necessary issue status updates.

The old version made one device with no descriptor.
If that fail then it stopped creating any devices period.

The previous refactor was bad in that if the any device
failed it would just stop making devices.

This one is simplified. It just used the pool as a pool.
There is no special handling for failed devices.
@greggman
Copy link
Contributor Author

greggman commented Mar 8, 2024

note: I tested by pasting this at top of device_pool.ts

GPU.prototype.requestAdapter = (function (origFn) {
  return async function (this: GPU, desc) {
    const adapter = await origFn.call(this, desc);
    if (adapter) {
      Object.defineProperty(adapter, 'features', {
        value: new Set([...adapter.features].filter(v => v !== 'shader-f16')),
      });
    }
    return adapter;
  };
})(GPU.prototype.requestAdapter);

GPUAdapter.prototype.requestDevice = (function (origFn) {
  return async function (this: GPUAdapter, desc) {
    if (desc && desc.requiredFeatures) {
      if ([...desc.requiredFeatures].includes('shader-f16')) {
        throw new Error('f16 unsupported');
      }
    }
    return await origFn.call(this, desc);
  };
})(GPUAdapter.prototype.requestDevice);

I then ran http://localhost:8080/standalone/?q=webgpu:shader,execution,expression,binary,f16_addition:*

Both before the bad change (works) and after the bad change (failed) and with this fix (works)

@ben-clayton
Copy link
Contributor

If you have dawn node set up, and you have a machine without f16 support, please can you see if the following correctly shows SKIPs?

tools/run run-cts 'webgpu:shader,execution,expression,binary,f16_subtraction:*'

@greggman
Copy link
Contributor Author

greggman commented Mar 8, 2024

I don't have a machine without f16 support. But the code I pasted in above repos the bug and after the fix correctly skips.

@greggman
Copy link
Contributor Author

greggman commented Mar 8, 2024

oh, maybe I can force swiftshader, assuming it doesn't support f16

@ben-clayton
Copy link
Contributor

Yeah, SwiftShader should SKIP.

@ben-clayton
Copy link
Contributor

Confirmed that this fixes the dawn/node issues by checking out your branch 👍🏼

Copy link
Contributor

@ben-clayton ben-clayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but would also be good to get @kainino0x 's thumbs up

@greggman
Copy link
Contributor Author

greggman commented Mar 8, 2024

I couldn't figure out how to run swiftshader on dawn.node on Mac. OTOH I can run the web version with the fallback adapter which uses swiftshader (need --enable-unsafe-webgpu)

Screenshot 2024-03-08 at 11 57 05

@ben-clayton
Copy link
Contributor

Build swiftshader, and set VK_ICD_FILENAMES to the vk_swiftshader_icd.json file in the output directory. Example:

VK_ICD_FILENAMES=path_to_vk_swiftshader_icd.json tools/run run-cts 'webgpu:shader,execution,expression,binary,f16_subtraction:*'

Copy link
Collaborator

@kainino0x kainino0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine. Honestly not sure why I added all that complex logic to begin with. I think all it does is make tests fail faster when webgpu isn't working. Hopefully I'm not forgetting something.

@kainino0x kainino0x merged commit f71a834 into gpuweb:main Mar 8, 2024
1 check passed
@greggman greggman deleted the fix-device-pool branch March 8, 2024 22:15
ben-clayton added a commit to ben-clayton/cts that referenced this pull request Mar 18, 2024
This reverts commits:
* f71a834.
* 0a68bf7.

Revert "Add AdapterLimitsGPUTest that sets all limits to the adapter's (gpuweb#3466)"

This reverts commit 0a68bf7.
ben-clayton added a commit to ben-clayton/cts that referenced this pull request Mar 18, 2024
This reverts commits:
* f71a834.
* 0a68bf7.

Revert "Add AdapterLimitsGPUTest that sets all limits to the adapter's (gpuweb#3466)"

This reverts commit 0a68bf7.
ben-clayton added a commit to ben-clayton/cts that referenced this pull request Mar 18, 2024
This reverts commits:
* f71a834.
* 0a68bf7.

These changes have been idenitified as causing a large collection of CTS failures with 'webgpu:web_platform,copyToTexture,ImageBitmap' (and possibly others).
ben-clayton added a commit to ben-clayton/cts that referenced this pull request Mar 18, 2024
This reverts commits:
* f71a834.
* 0a68bf7.

These changes have been idenitified as causing a large collection of CTS failures with 'webgpu:web_platform,copyToTexture,ImageBitmap' (and possibly others).
ben-clayton added a commit that referenced this pull request Mar 18, 2024
This reverts commits:
* f71a834.
* 0a68bf7.

These changes have been idenitified as causing a large collection of CTS failures with 'webgpu:web_platform,copyToTexture,ImageBitmap' (and possibly others).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants