Skip to content

Flesh out compute example #360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

tombh
Copy link

@tombh tombh commented Dec 31, 2020

Just a copy of wgpu-rs's 'hello-compute' example: https://github.com/gfx-rs/wgpu-rs/tree/v0.6/examples/hello-compute

I'm not sure if this is even possible at the moment? Not to mention that this is my first time using rust-gpu (which is such a great project). I get a compiler error:

RUST_BACKTRACE=full cargo run --bin example-runner-wgpu -- --shader Compute
   Compiling example-runner-wgpu v0.1.0 (/home/tombh/Software/rust-gpu/examples/runners/wgpu)
thread 'rustc' panicked at 'Failed to recover key for generics_of(b445f41739b0d39c-316db2fba229b0a9) with hash b445f41739b0d39c-316db2fba229b0a9', compiler/rustc_middle/src/ty/query/mod.rs:235:5
stack backtrace:
   0:     0x7f763a45a6b7 - std::backtrace_rs::backtrace::libunwind::trace::h746c3e9529d524bc
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
   1:     0x7f763a45a6b7 - std::backtrace_rs::backtrace::trace_unsynchronized::h86340908ff889faa
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7f763a45a6b7 - std::sys_common::backtrace::_print_fmt::h43f85f9b18230404
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x7f763a45a6b7 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc132ae1a5b5aa7cd
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x7f763a4ce5ac - core::fmt::write::hdf023a0036d2a25f
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/core/src/fmt/mod.rs:1078:17
   5:     0x7f763a44c6a2 - std::io::Write::write_fmt::h8580846154bcb66a
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/io/mod.rs:1519:15
   6:     0x7f763a45e3b5 - std::sys_common::backtrace::_print::h7ee55fed88d107a3
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x7f763a45e3b5 - std::sys_common::backtrace::print::h54a7d3e52a524177
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x7f763a45e3b5 - std::panicking::default_hook::{{closure}}::h60921e857bf55a40
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/panicking.rs:208:50
   9:     0x7f763a45df0a - std::panicking::default_hook::hf0f9afb1017317fc
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/panicking.rs:225:9
  10:     0x7f763ad16648 - rustc_driver::report_ice::hff78d76a39ffbb86
  11:     0x7f7628d8e9f6 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h85ca59ae85d698fe
                               at /home/tombh/.rustup/toolchains/nightly-2020-12-11-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1342:9
  12:     0x7f7628d7c41b - proc_macro::bridge::client::<impl proc_macro::bridge::Bridge>::enter::{{closure}}::{{closure}}::h54ab91f59daab13a
                               at /home/tombh/.rustup/toolchains/nightly-2020-12-11-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/proc_macro/src/bridge/client.rs:320:21
  13:     0x7f763a45ecb6 - std::panicking::rust_panic_with_hook::h8d66bf42b407aaea
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/panicking.rs:595:17
  14:     0x7f763a45e7d7 - std::panicking::begin_panic_handler::{{closure}}::hde71edcd925d0c5e
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/panicking.rs:497:13
  15:     0x7f763a45ab7c - std::sys_common::backtrace::__rust_end_short_backtrace::h8a3c7d6cea578919
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/sys_common/backtrace.rs:141:18
  16:     0x7f763a45e739 - rust_begin_unwind
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/panicking.rs:493:5
  17:     0x7f763a45e6eb - std::panicking::begin_panic_fmt::hee67ce14b77d0396
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/panicking.rs:435:5
  18:     0x7f763d6fb880 - rustc_middle::ty::query::try_load_from_on_disk_cache::{{closure}}::h0702bbc1f260c0e9
  19:     0x7f763d6fb7fb - rustc_middle::ty::query::try_load_from_on_disk_cache::hfa4775df5c5e0180
  20:     0x7f763c9d740e - rustc_query_system::dep_graph::graph::DepGraph<K>::exec_cache_promotions::h8caa69177622351e
  21:     0x7f763c9efce1 - rustc_middle::dep_graph::<impl rustc_query_system::dep_graph::DepKind for rustc_middle::dep_graph::dep_node::DepKind>::with_deps::hab5675af7737b3e6
  22:     0x7f763c992a8b - rustc_incremental::persist::save::save_in::h7f4f43356280dc0a
  23:     0x7f763c98de61 - rustc_data_structures::sync::join::h0120008f852881ce
  24:     0x7f763c9eed82 - rustc_middle::dep_graph::<impl rustc_query_system::dep_graph::DepKind for rustc_middle::dep_graph::dep_node::DepKind>::with_deps::h2893ce60eec57bb6
  25:     0x7f763c991fbf - rustc_incremental::persist::save::save_dep_graph::hffd3fb2ecc639a78
  26:     0x7f763c86e77a - rustc_codegen_ssa::base::finalize_tcx::h68106e2729498b3e
  27:     0x7f763b185087 - <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::CodegenBackend>::codegen_crate::h067fffb3870bc5b0
  28:     0x7f763af342ee - rustc_session::utils::<impl rustc_session::session::Session>::time::had158f21ec5bf4d1
  29:     0x7f763af767dc - rustc_interface::passes::QueryContext::enter::h40067ad7feabcbd0
  30:     0x7f763afcea63 - rustc_interface::queries::Queries::ongoing_codegen::h4fc36fc05972247d
  31:     0x7f763acbec59 - rustc_interface::queries::<impl rustc_interface::interface::Compiler>::enter::hd899306a06575d0c
  32:     0x7f763ad3fa07 - rustc_span::with_source_map::ha4e07ff263d0dc1d
  33:     0x7f763acbfe0b - rustc_interface::interface::create_compiler_and_run::h1d6d732867d1f489
  34:     0x7f763ad6ce50 - scoped_tls::ScopedKey<T>::set::h39c0aa543118d3f3
  35:     0x7f763ad73456 - std::sys_common::backtrace::__rust_begin_short_backtrace::h1e5aa72fb9cd6d86
  36:     0x7f763acc7cca - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc793837e985b77ce
  37:     0x7f763a46e6ea - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hea1090dbdcecbf5a
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/alloc/src/boxed.rs:1328:9
  38:     0x7f763a46e6ea - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8d5723d3912bd325
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/alloc/src/boxed.rs:1328:9
  39:     0x7f763a46e6ea - std::sys::unix::thread::Thread::new::thread_start::hc17a425ca2995724
                               at /rustc/d32c320d7eee56706486fef6be778495303afe9e/library/std/src/sys/unix/thread.rs:71:17
  40:     0x7f763a3823e9 - start_thread
  41:     0x7f763a29d293 - __GI___clone
  42:                0x0 - <unknown>

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.50.0-nightly (d32c320d7 2020-12-10) running on x86_64-unknown-linux-gnu

note: compiler flags: -C embed-bitcode=no -C debuginfo=2 -C incremental --crate-type lib --crate-type cdylib

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
end of query stack
error: could not compile `example-runner-wgpu`

@tombh tombh marked this pull request as draft January 3, 2021 15:36
@XAMPPRocky
Copy link
Member

I'm not sure if this is even possible at the moment? Not to mention that this is my first time using rust-gpu (which is such a great project). I get a compiler error:

Thank you for your PR! You should try building this on-top of #358, I was having a similar issue until I updated the nightly version.

@tombh tombh force-pushed the compute-full-example branch from 0431f10 to 912d35a Compare January 4, 2021 20:11
@tombh tombh force-pushed the compute-full-example branch from 912d35a to 2aaf73e Compare January 5, 2021 04:49
@tombh
Copy link
Author

tombh commented Jan 5, 2021

Yes, that did it, thanks. I also fixed the tests. But... I'm getting a segfault on the compute example (BTW the compute example isn't being tested, shall I add it to the tests?). The segfault seems to come from Vulkan, I suppose that backend could be an automatic choice for my machine? I should note that the compute example works fine when run from the wgpu repo.

Here's what minimal info can be gleaned from lldb:

* thread #1, name = 'example-runner-', stop reason = signal SIGSEGV: invalid address (fault address: 0xd0)
    frame #0: 0x00007ffff78305e5 libvulkan_intel.so`___lldb_unnamed_symbol4034$$libvulkan_intel.so + 21
libvulkan_intel.so`___lldb_unnamed_symbol4034$$libvulkan_intel.so:
->  0x7ffff78305e5 <+21>: movq   0xd0(%rdi), %rbx
    0x7ffff78305ec <+28>: movq   %fs:0x28, %rax
    0x7ffff78305f5 <+37>: movq   %rax, 0x38(%rsp)
    0x7ffff78305fa <+42>: xorl   %eax, %eax

And I hunted down the offending block to this in examples/runnners/wgpu/src/compute.rs:

    let compute_pipeline = device.create_compute_pipeline(&wgpu::ComputePipelineDescriptor {
        label: None,
        layout: Some(&pipeline_layout),
        compute_stage: wgpu::ProgrammableStageDescriptor {
            module: &cs_module,
            entry_point: "main_cs",
        },
    });

@XAMPPRocky
Copy link
Member

(BTW the compute example isn't being tested, shall I add it to the tests?)

Yes please, thank you.

cc @Jasper-Bekkers Are you able to tell if this is an upstream bug?

@Jasper-Bekkers
Copy link
Contributor

I'll check it out - I suspect that we're probably just running into a validation error instead, so it would be good to run with validation enabled to double check.

@Jasper-Bekkers
Copy link
Contributor

I'm getting some validation errors here at least #362 seems related

VALIDATION [UNASSIGNED-CoreValidation-Shader-InconsistentSpirv (7060244)] : Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x1c758258eb8, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: StorageBuffer OpVariable <id> '4[%storage]' has illegal type.
From Vulkan spec, section 14.5.2:
Variables identified with the StorageBuffer storage class are used to access transparent buffer backed resources. Such variables must be typed as OpTypeStruct, or an array of this type
  %storage = OpVariable %_ptr_StorageBuffer_uint StorageBuffer

@tombh
Copy link
Author

tombh commented Jan 5, 2021

That makes sense, thanks. How can I run the validator myself to check if I've fixed it?

Also, I added the compute test and now I see I suppose we have to install Vulkan in CI? https://github.com/EmbarkStudios/rust-gpu/pull/360/checks?check_run_id=1652063534#step:8:422

@Jasper-Bekkers
Copy link
Contributor

That makes sense, thanks. How can I run the validator myself to check if I've fixed it?

In the wgpu samples it should be enabled by default, but it would require you to have the VulkanSDK installed for them to run.

Also, I added the compute test and now I see I suppose we have to install Vulkan in CI? https://github.com/EmbarkStudios/rust-gpu/pull/360/checks?check_run_id=1652063534#step:8:422

We probably just shouldn't run that sample on CI since there are no GPUs in the CI machines for now.

@tombh
Copy link
Author

tombh commented Jan 6, 2021

Thanks. I'm on Arch and just installing yay -S vulkan-validation-layers, then recompiling, was enough to see the validation errors. However, fixing those just seems beyond me at the moment, I've tried a couple of things but just can't make any sense of it. Like I said, I copied all the code pretty much exactly from upstream https://github.com/gfx-rs/wgpu-rs/tree/v0.6/examples/hello-compute I suspect the problem is more likely in how I wrote the shader, as there's no precedent for a compute shader in this project yet.

You're right about the tests, it's not a simple matter of just installing Vulkan in CI, I assumed that because they have tests in gfx-rs/qgpu-rs then they use Vulkan in CI, but they don't. Though I wonder what you think about https://github.com/google/swiftshader It seems in theory shaders can be run without a GPU. Compute shaders are suitable for testing because they output simpler structures, not graphics, so it could be worth it to get coverage on this otherwise untested area of the code?

@XAMPPRocky
Copy link
Member

XAMPPRocky commented Jan 7, 2021

However, fixing those just seems beyond me at the moment, I've tried a couple of things but just can't make any sense of it. Like I said, I copied all the code pretty much exactly from upstream https://github.com/gfx-rs/wgpu-rs/tree/v0.6/examples/hello-compute I suspect the problem is more likely in how I wrote the shader, as there's no precedent for a compute shader in this project yet.

I believe at least the validation error that @Jasper-Bekkers included is a bug in how we're generating the code so it requires fixes on our end, if there are other validation errors please do post them here.

You're right about the tests, it's not a simple matter of just installing Vulkan in CI, I assumed that because they have tests in gfx-rs/qgpu-rs then they use Vulkan in CI, but they don't. Though I wonder what you think about https://github.com/google/swiftshader It seems in theory shaders can be run without a GPU. Compute shaders are suitable for testing because they output simpler structures, not graphics, so it could be worth it to get coverage on this otherwise untested area of the code?

We have an open issue about it here #136, and currently some support is lacking on the swift shader side. I think as long as we're ensuring the code compiles, that's fine for now.

@tombh
Copy link
Author

tombh commented Jan 10, 2021

Ok, I'll leave this PR open until the code generation issue is fixed, is there an issue that we can track about that? I'll append the full validation report at the end of this comment.

I'll disable the new test on this PR as well. It could be trivially re-enabled once #136 is done.

cargo run --bin example-runner-wgpu -- --shader Compute
   Compiling example-runner-wgpu v0.1.0 (/home/tombh/Software/rust-gpu/examples/runners/wgpu)
    Finished dev [unoptimized + debuginfo] target(s) in 13.56s
     Running `target/debug/example-runner-wgpu --shader Compute`
MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

[0.025068 ERROR]()(no module):
VALIDATION [UNASSIGNED-CoreValidation-Shader-InconsistentSpirv (7060244)] : Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x56306caf1040, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: StorageBuffer OpVariable <id> '4[%storage]' has illegal type.
From Vulkan spec, section 14.5.2:
Variables identified with the StorageBuffer storage class are used to access transparent buffer backed resources. Such variables must be typed as OpTypeStruct, or an array of this type
  %storage = OpVariable %_ptr_StorageBuffer_ushort StorageBuffer

object info: (type: DEVICE, hndl: 94765981831232)

[0.031541 ERROR]()(no module):
VALIDATION [VUID-VkComputePipelineCreateInfo-layout-00703 (-432263797)] : Validation Error: [ VUID-VkComputePipelineCreateInfo-layout-00703 ] Object 0: handle = 0x56306caf1040, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0xe63c2d8b | Type mismatch on descriptor slot 0.0 (expected ``) but descriptor of type VK_DESCRIPTOR_TYPE_STORAGE_BUFFER The Vulkan spec states: layout must be consistent with the layout of the compute shader specified in stage (https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#VUID-VkComputePipelineCreateInfo-layout-00703)
object info: (type: DEVICE, hndl: 94765981831232)

@XAMPPRocky
Copy link
Member

Ok, I'll leave this PR open until the code generation issue is fixed, is there an issue that we can track about that?

Yes it's #362

@tombh
Copy link
Author

tombh commented Feb 7, 2021

I'm still interested in getting this working. I'm just quite out of my depth with the internals that generate the spirv code. Any tips for where to get started with that?

@XAMPPRocky XAMPPRocky added the s: blocked PRs blocked on external factors. label Mar 12, 2021
@khyperia khyperia removed their request for review April 1, 2021 13:56
@khyperia
Copy link
Contributor

khyperia commented Jun 9, 2021

We've merged an updated version of this PR here #623 so closing this

@khyperia khyperia closed this Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s: blocked PRs blocked on external factors.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants