Skip to content

Segmentation fault with symbolic loops on LLVM backend (only) #446

@clementjambon

Description

@clementjambon

Hi,

I’ve implemented BVH traversal routines with drjit (very similar to what FCPW does here). The issue I’m seeing is that with the CUDA backend my code runs correctly (I have tests against a pure Python implementation), but with the LLVM backend it fails with out-of-bounds memory accesses that I can’t pinpoint. In debug mode I get errors like: drjit.gather(): out-of-bounds read from position 1009400057 in an array of size 7. (typing.py:2279)

Some traces (like the one above) even point into code I didn’t write (e.g., typing.py), which makes it hard to diagnose.

Concretely, I’m using a fixed-size stack implemented as a dr.Local over a custom dataclass, following the instructions for nested objects. When reading/writing I’m careful to do:

stack_entry = subtree.read(read_stack(stack_ptr), active=active)
node_index = stack_entry.node
current_dist = stack_entry.distance
subtree.write(stack_entry, read_stack(stack_ptr), active=active)
stack_ptr -= mi.Int32(1)

I also keep the active mask updated conservatively, since my initial suspicion was that incorrect reads/writes might come from there.

Do you have any ideas about where these non-deterministic behaviors (LLVM vs CUDA) could be coming from?

If it helps, I’m happy to share the full snippet privately.

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions