|
| 1 | +// Vector addition kernel demonstrating Descend's safe GPU programming model |
| 2 | +// This function showcases extended borrow checking, memory safety, and execution context tracking |
| 3 | + |
| 4 | +// Generic function with type parameters: |
| 5 | +// - n: nat - Natural number parameter (for array size, though not used in this specific function) |
| 6 | +// - r: prv - Provenance parameter tracking memory region/lifetime for all references |
| 7 | +fn add<n: nat, r: prv>( |
| 8 | + // Shared reference to first input vector - multiple threads can read simultaneously |
| 9 | + // Memory space: gpu.global (GPU global memory) |
| 10 | + // Ownership: shrd (shared) - prevents write-after-read data races |
| 11 | + // Type: 16-element array of 16-bit signed integers |
| 12 | + a: &r shrd gpu.global [i16; 16], |
| 13 | + |
| 14 | + // Shared reference to second input vector - multiple threads can read simultaneously |
| 15 | + // Same memory space and ownership constraints as 'a' |
| 16 | + b: &r shrd gpu.global [i16; 16], |
| 17 | + |
| 18 | + // Unique reference to output vector - only one thread can write at a time |
| 19 | + // Ownership: uniq (unique) - prevents write-after-write data races |
| 20 | + // The compiler statically ensures no conflicting borrows exist |
| 21 | + c: &r uniq gpu.global [i16; 16] |
| 22 | + |
| 23 | +// Execution context specification - defines how this function runs on GPU hardware |
| 24 | +// - grid: gpu.grid<X<1>, X<16>> - GPU execution grid with 1 block containing 16 threads |
| 25 | +// - The type system ensures GPU memory is only accessed in GPU execution contexts |
| 26 | +// - Prevents invalid cross-device memory accesses (CPU accessing GPU memory) |
| 27 | +) -[grid: gpu.grid<X<1>, X<16>>]-> () { |
| 28 | + |
| 29 | + // Vector addition operation - element-wise addition of arrays |
| 30 | + // The compiler generates safe parallel code that: |
| 31 | + // 1. Loads data from global memory to local memory for each thread |
| 32 | + // 2. Performs vectorized addition using HIVM dialect operations |
| 33 | + // 3. Stores results back to global memory safely |
| 34 | + // The ownership system ensures this operation is race-free |
| 35 | + // |
| 36 | + // LAZY LOADING: Descend's compiler implements lazy loading strategies: |
| 37 | + // - Memory loads are deferred until actually needed by computation |
| 38 | + // - The HIVM dialect generates 'hivm.hir.load' operations that load from |
| 39 | + // global memory (gm) to local memory (ub) only when data is accessed |
| 40 | + // - This minimizes memory bandwidth usage and improves cache efficiency |
| 41 | + // - The type system ensures loads happen in the correct execution context |
| 42 | + // - Shared references enable read-only access without unnecessary copies |
| 43 | + c = a + b; |
| 44 | + |
| 45 | + // Unit return value - indicates successful completion |
| 46 | + // In MLIR, this becomes a 'return' operation |
| 47 | + () |
| 48 | +} |
0 commit comments