Skip to content

Conversation

@c2akula
Copy link
Owner

@c2akula c2akula commented Oct 19, 2025

Summary

Fixes gradient computation when broadcasting occurs in element-wise ADD, MUL operations and batch MATMUL operations. This resolves the TODO at src/tofu_graph.c:965 and is the first feature for v1.1.0.

Problem

When broadcasting occurs in forward pass, the backward pass was not correctly reducing gradients back to the original input shape:

  1. Element-wise operations (ADD, MUL): [3,1] * [3,4] -> [3,4] - gradient had shape [3,4] but needed to be reduced to [3,1]
  2. Batch MATMUL: [1,3,4] @ [2,4,5] -> [2,3,5] - gradient had shape [2,3,4] but needed to be reduced to [1,3,4]
  3. Matrix transpose: For batch matmul backward pass, transpose was reversing ALL dimensions instead of only the last 2 (matrix dimensions)

Solution

New helper function: reduce_grad_for_broadcast()

  • Identifies broadcast dimensions by comparing input and gradient shapes
  • Sums gradients over broadcast dimensions using tofu_tensor_sumreduce()
  • Handles dimension mismatches (when output has more dims than input)
  • Reshapes result to match original input shape
  • Manages memory properly for intermediate tensors

Updated functions:

  • add_backward() - Now reduces gradients for both inputs when broadcasting occurs
  • mul_backward() - Now reduces gradients for both inputs when broadcasting occurs
  • matmul_backward() - Fixed to:
    • Only transpose matrix dimensions (last 2) not batch dimensions
    • Reduce gradients when batch broadcasting occurs

Testing

New Tests

Created comprehensive test suite (test/standalone/test_broadcast_gradient.c):

  • ✅ MUL with simple broadcasting: [3,1] * [3,4]
  • ✅ MUL with multi-dim broadcasting: [2,1,3] * [1,4,3]
  • ✅ ADD with scalar broadcasting: [1] + [3,4]
  • ✅ ADD with no broadcasting (baseline)
  • ✅ MATMUL with batch broadcasting: [1,3,4] @ [2,4,5] -> [2,3,5]

Results: 5/5 tests pass

Existing Tests

  • ✅ All unit tests pass (66/66)
  • ✅ Phase 1 validation passes (gradient checking for all ops including MATMUL)
  • ✅ Phase 2 validation passes (architecture tests)
  • ✅ Phase 3 validation passes (multi-class classification, regression)

Impact

  • Breaking: No - backward compatible
  • Performance: Minimal overhead (only when broadcasting occurs)
  • Correctness: Fixes incorrect gradient computation for broadcast cases across ADD, MUL, and MATMUL operations

Resolves

Part of v1.1.0 roadmap - Priority 1 (P1) bug fix

**Problem:**
When broadcasting occurs in element-wise operations (e.g., [3,1] * [3,4] -> [3,4]),
the backward pass was not correctly reducing gradients back to the original input shape.
This resulted in shape mismatches and incorrect gradient accumulation.

**Solution:**
- Added `reduce_grad_for_broadcast()` helper function that:
  1. Identifies broadcast dimensions by comparing input and gradient shapes
  2. Sums gradients over broadcast dimensions using `tofu_tensor_sumreduce()`
  3. Handles dimension mismatches (when output has more dims than input)
  4. Reshapes result to match original input shape

- Updated `add_backward()` and `mul_backward()` to use gradient reduction
- Properly handles memory management for intermediate tensors

**Testing:**
- Added comprehensive test suite (test_broadcast_gradient.c) with 4 tests:
  * MUL with simple broadcasting ([3,1] * [3,4])
  * MUL with multi-dimensional broadcasting ([2,1,3] * [1,4,3])
  * ADD with scalar broadcasting ([1] + [3,4])
  * ADD with no broadcasting (baseline test)
- All new tests pass (4/4)
- All existing tests pass (66/66)
- Phase 1 validation passes (gradient checking)
- Phase 3 validation passes (multi-class classification, regression)

**Impact:**
- Fixes TODO at src/tofu_graph.c:965
- Non-breaking change (backward compatible)
- Enables correct gradient computation for all broadcasting scenarios

Resolves #v1.1.0-P1
- Add gradient reduction for batch broadcasting in matmul backward pass
- Fix transpose to only swap matrix dimensions (last 2) not batch dims
- Add comprehensive test for MATMUL [1,3,4] @ [2,4,5] -> [2,3,5] case
- Verify gradients correctly reduced to input shapes
@c2akula c2akula merged commit 26f3736 into develop Oct 19, 2025
2 checks passed
@c2akula c2akula deleted the feature/fix-mul-broadcast-gradient branch October 19, 2025 21:32
c2akula pushed a commit that referenced this pull request Oct 22, 2025
This commit fixes all 5 critical bugs discovered during comprehensive
code review and verified through reproduction tests.

Bug #1: Incorrect stride calculation in tofu_tensor_internal.h
- Location: src/tofu_tensor_internal.h:61
- Issue: Loop used `t->dims[t->ndim - 2]` (dimension VALUE) instead of
  `t->ndim - 2` (array INDEX)
- Impact: Out-of-bounds array access, incorrect stride calculations
- Fix: Changed to use correct array index `t->ndim - 2`

Bug #2: Type cast errors in fprintf functions
- Location: src/tofu_type.c:256, 264
- Issue: fprintf_uint64 and fprintf_uint32 incorrectly cast to uint16_t
- Impact: Data loss (75% for uint64, 50% for uint32), incorrect output
- Fix: Cast to correct types (uint64_t and uint32_t respectively)

Bug #3: Float comparison functions lose precision
- Location: src/tofu_type.c:309-324
- Issue: cmp_double and cmp_float cast subtraction result to int
- Impact: Fractional differences (<1.0) truncate to 0, comparisons fail
- Fix: Use three-way comparison (if/else) instead of arithmetic

Bug #4: Integer comparison overflow/underflow
- Location: src/tofu_type.c:327-381
- Issue: All integer comparison functions use subtraction which can
  overflow (signed) or underflow (unsigned)
- Impact: Wrong comparison results, sorting fails, undefined behavior
- Fix: Use three-way comparison for all integer types (int64, int32,
  int16, int8, uint64, uint32, uint16, uint8)

Bug #5: Backwards NDEBUG conditional logic
- Location: src/tofu_tensor.c:29
- Issue: Used `#ifdef NDEBUG` with assert(), but assert() is disabled
  when NDEBUG is defined
- Impact: Bounds checking never runs in debug OR release mode
- Fix: Changed to `#ifndef NDEBUG` so bounds checking runs in debug mode

All fixes have been:
- Verified to compile without errors
- Tested with bug reproduction test suite
- Documented in CODEBASE_REVIEW_REPORT.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants