-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Hi,
I’ve implemented BVH traversal routines with drjit (very similar to what FCPW does here). The issue I’m seeing is that with the CUDA backend my code runs correctly (I have tests against a pure Python implementation), but with the LLVM backend it fails with out-of-bounds memory accesses that I can’t pinpoint. In debug mode I get errors like: drjit.gather(): out-of-bounds read from position 1009400057 in an array of size 7. (typing.py:2279)
Some traces (like the one above) even point into code I didn’t write (e.g., typing.py), which makes it hard to diagnose.
Concretely, I’m using a fixed-size stack implemented as a dr.Local over a custom dataclass, following the instructions for nested objects. When reading/writing I’m careful to do:
stack_entry = subtree.read(read_stack(stack_ptr), active=active)
node_index = stack_entry.node
current_dist = stack_entry.distance
subtree.write(stack_entry, read_stack(stack_ptr), active=active)
stack_ptr -= mi.Int32(1)I also keep the active mask updated conservatively, since my initial suspicion was that incorrect reads/writes might come from there.
Do you have any ideas about where these non-deterministic behaviors (LLVM vs CUDA) could be coming from?
If it helps, I’m happy to share the full snippet privately.
Thank you in advance!