Fix Intel iGPU Rendering #21475

tychedelia · 2025-10-09T03:02:31Z

A Debugging Story

In the above issue, the defect can be reproduced by running the 3d_scene example with an Intel adapter on Vulkan and observing the following output:

1. Indirect Draw Args

The first thing to check is what's being passed to vkCmdDrawIndexedIndirectCount in main_opaque_pass_3d. What we see is a little bizarre:

We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their first_instance id.

2. Reversed work item ordering

Trying to debug what could possibly be wrong with the shader input lead to observing the PreprocessWorkItem buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break:

// Working
work_items[0] = {input_index: 0, output_index: 0}
work_items[1] = {input_index: 1, output_index: 1}

// Broken
work_items[0] = {input_index: 1, output_index: 0}
work_items[1] = {input_index: 0, output_index: 1}

This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue.

3. Memory ordering?

My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write:

    // If we're the first mesh instance in this batch, write the index of our
    // `MeshInput` into the appropriate slot so that the indirect parameters
    // building shader can access it.
    if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) {
        indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index;
    }

Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug?

4. SPIR-V de-compilation

This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following:

  bool _247 = _236 == 0;
  uint _248 = _236 - 1;
  uint* _249 = &_221[_248].output_or_indirect_parameters_index;
  uint _250 = *_249;
  bool _251 = _250 != _246;
  bool _252 = _247 || _251;
  
  if(_252) {
    uint* _256 = &_227[_246].mesh_index;
    *_256 = _244;
  }

This looks wrong. The final condition _247 || _251 doesn't seem right. Checking and confirming, || in WGSL is supposed to short circuit. Instead, here, we unconditionally read from &_221[_248].output_or_indirect_parameters_index; AKA work_items[instance_index - 1]. In the event instance_index is 0 that means we are OOB. Uh oh!!

5. Vendor differences

I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug:

On the first thread in the workgroup where instance_index is 0, we will always OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. However, in the event that the first work item's input/output index is supposed to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes!

The fix is to make sure that we only read from the prior input when we are not the first item.

jf908

Tested it on 2 machines (Intel i5-6200U and Intel i7-12700H) and can confirm that this PR fixes the issue!

Excellent debugging, explanation makes sense, the shader logic is indeed the same which confirms this isn't a bug with the logic but something further down the line.

DGriffin91

Tested on i7-9700 CPU with UHD Graphics 630.

Fixes bevyengine#18904. See also: - gfx-rs/wgpu#4394 - gfx-rs/wgpu#7339 # A Debugging Story In the above issue, the defect can be reproduced by running the `3d_scene` example with an Intel adapter on Vulkan and observing the following output: <img width="1924" height="1126" alt="image" src="https://github.com/user-attachments/assets/c6a82873-7afc-4901-a812-dca46951b082" /> ## 1. Indirect Draw Args The first thing to check is what's being passed to `vkCmdDrawIndexedIndirectCount` in `main_opaque_pass_3d`. What we see is a little bizarre: <img width="895" height="130" alt="image" src="https://github.com/user-attachments/assets/675cc55d-e377-4e77-81f2-9b2720ccb188" /> We should see two separate meshes here for the circular base and the cube, but instead we see 3 items, two of which are identical with exception of their `first_instance` id. ## 2. Reversed work item ordering Trying to debug what could possibly be wrong with the shader input lead to observing the `PreprocessWorkItem` buffer that gets passed to mesh preprocessing. Here, there was a very conspicuous difference between when things would work and when they would break: ```sh // Working work_items[0] = {input_index: 0, output_index: 0} work_items[1] = {input_index: 1, output_index: 1} // Broken work_items[0] = {input_index: 1, output_index: 0} work_items[1] = {input_index: 0, output_index: 1} ``` This was very conspicuous and likely due to ECS query instability. However, the code looked like it should be robust enough to handle work items in any order. Further, this works just fine on Nvidia, so the ordering itself is clearly not the issue. ## 3. Memory ordering? My first assumption was that this must be some kind of weirdness with memory ordering or some other race only observable on Intel. This led me to the following write: ```wgsl // If we're the first mesh instance in this batch, write the index of our // `MeshInput` into the appropriate slot so that the indirect parameters // building shader can access it. if (instance_index == 0u || work_items[instance_index - 1].output_or_indirect_parameters_index != indirect_parameters_index) { indirect_parameters_gpu_metadata[indirect_parameters_index].mesh_index = input_index; } ``` Even though the logic looks totally fine and shouldn't require any synchronization, I tried making the write atomic, which didn't seem to help at all. My next step was to try to remove the condition and just unconditionally write. This could lead to some weirdness in the event of batch size N > 1, but I just wanted to see what happened. And,... it solved the bug? ## 4. SPIR-V de-compilation This made no sense to me why this would fix things, so I decided to decompile the shader and see what was going on, and found the following: ```c bool _247 = _236 == 0; uint _248 = _236 - 1; uint* _249 = &_221[_248].output_or_indirect_parameters_index; uint _250 = *_249; bool _251 = _250 != _246; bool _252 = _247 || _251; if(_252) { uint* _256 = &_227[_246].mesh_index; *_256 = _244; } ``` This looks wrong. The final condition `_247 || _251` doesn't seem right. Checking and confirming, [`||` in WGSL is supposed to short circuit](https://www.w3.org/TR/WGSL/#logical-expr). Instead, here, we unconditionally read from `&_221[_248].output_or_indirect_parameters_index;` AKA `work_items[instance_index - 1]`. In the event `instance_index` is 0 that means we are OOB. Uh oh!! ## 5. Vendor differences I'm not sure why this UB have any effect on Nvidia. But we can walk through the entire bug: On the first thread in the workgroup where `instance_index` is 0, we will *always* OOB read, which on Intel seems to cause the thread to terminate or otherwise return garbage that makes the condition fail or something else weird. *However*, in the event that the first work item's input/output index is *supposed* to be 0, everything happens to just work, since the zero-initialized memory of the GPU metadata is by chance correct. Thus, the bug only appears when things are sorted funny and a batch set with a non-zero input index appears at the front of the work items. Yikes! The fix is to make sure that we only read from the prior input when we are not the first item.

andyleiserson · 2025-10-29T20:59:00Z

Welp, I am really sorry that you had to debug this 😞.

Properly short circuit to prevent OOB read.

a371cdf

tychedelia added this to the 0.17.3 milestone Oct 9, 2025

tychedelia added C-Bug An unexpected or incorrect behavior A-Rendering Drawing game state to the screen S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Oct 9, 2025

github-project-automation bot added this to Rendering Oct 9, 2025

tychedelia added 2 commits October 8, 2025 20:03

Merge branch 'main' into 18904-intel-gpu-preprocess

a52eeeb

Clean-up.

061332e

jf908 approved these changes Oct 9, 2025

View reviewed changes

DGriffin91 approved these changes Oct 10, 2025

View reviewed changes

DGriffin91 mentioned this pull request Oct 10, 2025

Environment map missing on Intel IGP #21488

Open

tychedelia added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Oct 10, 2025

alice-i-cecile added this pull request to the merge queue Oct 13, 2025

Merged via the queue into bevyengine:main with commit b94662a Oct 13, 2025
38 checks passed

github-project-automation bot moved this to Done in Rendering Oct 13, 2025

beicause mentioned this pull request Nov 4, 2025

Indirect drawing causes AlphaMode::Blend meshes to flicker #21739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix Intel iGPU Rendering #21475

Fix Intel iGPU Rendering #21475

Uh oh!

tychedelia commented Oct 9, 2025

Uh oh!

jf908 left a comment

Uh oh!

DGriffin91 left a comment

Uh oh!

Uh oh!

andyleiserson commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Fix Intel iGPU Rendering #21475

Fix Intel iGPU Rendering #21475

Uh oh!

Conversation

tychedelia commented Oct 9, 2025

A Debugging Story

1. Indirect Draw Args

2. Reversed work item ordering

3. Memory ordering?

4. SPIR-V de-compilation

5. Vendor differences

Uh oh!

jf908 left a comment

Choose a reason for hiding this comment

Uh oh!

DGriffin91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andyleiserson commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants