Fix broadcasting gradient reduction in ADD and MUL operations #1

c2akula · 2025-10-19T21:16:27Z

Summary

Fixes gradient computation when broadcasting occurs in element-wise ADD, MUL operations and batch MATMUL operations. This resolves the TODO at src/tofu_graph.c:965 and is the first feature for v1.1.0.

Problem

When broadcasting occurs in forward pass, the backward pass was not correctly reducing gradients back to the original input shape:

Element-wise operations (ADD, MUL): [3,1] * [3,4] -> [3,4] - gradient had shape [3,4] but needed to be reduced to [3,1]
Batch MATMUL: [1,3,4] @ [2,4,5] -> [2,3,5] - gradient had shape [2,3,4] but needed to be reduced to [1,3,4]
Matrix transpose: For batch matmul backward pass, transpose was reversing ALL dimensions instead of only the last 2 (matrix dimensions)

Solution

New helper function: reduce_grad_for_broadcast()

Identifies broadcast dimensions by comparing input and gradient shapes
Sums gradients over broadcast dimensions using tofu_tensor_sumreduce()
Handles dimension mismatches (when output has more dims than input)
Reshapes result to match original input shape
Manages memory properly for intermediate tensors

Updated functions:

add_backward() - Now reduces gradients for both inputs when broadcasting occurs
mul_backward() - Now reduces gradients for both inputs when broadcasting occurs
matmul_backward() - Fixed to:
- Only transpose matrix dimensions (last 2) not batch dimensions
- Reduce gradients when batch broadcasting occurs

Testing

New Tests

Created comprehensive test suite (test/standalone/test_broadcast_gradient.c):

✅ MUL with simple broadcasting: [3,1] * [3,4]
✅ MUL with multi-dim broadcasting: [2,1,3] * [1,4,3]
✅ ADD with scalar broadcasting: [1] + [3,4]
✅ ADD with no broadcasting (baseline)
✅ MATMUL with batch broadcasting: [1,3,4] @ [2,4,5] -> [2,3,5]

Results: 5/5 tests pass

Existing Tests

✅ All unit tests pass (66/66)
✅ Phase 1 validation passes (gradient checking for all ops including MATMUL)
✅ Phase 2 validation passes (architecture tests)
✅ Phase 3 validation passes (multi-class classification, regression)

Impact

Breaking: No - backward compatible
Performance: Minimal overhead (only when broadcasting occurs)
Correctness: Fixes incorrect gradient computation for broadcast cases across ADD, MUL, and MATMUL operations

Resolves

Part of v1.1.0 roadmap - Priority 1 (P1) bug fix

**Problem:** When broadcasting occurs in element-wise operations (e.g., [3,1] * [3,4] -> [3,4]), the backward pass was not correctly reducing gradients back to the original input shape. This resulted in shape mismatches and incorrect gradient accumulation. **Solution:** - Added `reduce_grad_for_broadcast()` helper function that: 1. Identifies broadcast dimensions by comparing input and gradient shapes 2. Sums gradients over broadcast dimensions using `tofu_tensor_sumreduce()` 3. Handles dimension mismatches (when output has more dims than input) 4. Reshapes result to match original input shape - Updated `add_backward()` and `mul_backward()` to use gradient reduction - Properly handles memory management for intermediate tensors **Testing:** - Added comprehensive test suite (test_broadcast_gradient.c) with 4 tests: * MUL with simple broadcasting ([3,1] * [3,4]) * MUL with multi-dimensional broadcasting ([2,1,3] * [1,4,3]) * ADD with scalar broadcasting ([1] + [3,4]) * ADD with no broadcasting (baseline test) - All new tests pass (4/4) - All existing tests pass (66/66) - Phase 1 validation passes (gradient checking) - Phase 3 validation passes (multi-class classification, regression) **Impact:** - Fixes TODO at src/tofu_graph.c:965 - Non-breaking change (backward compatible) - Enables correct gradient computation for all broadcasting scenarios Resolves #v1.1.0-P1

- Add gradient reduction for batch broadcasting in matmul backward pass - Fix transpose to only swap matrix dimensions (last 2) not batch dims - Add comprehensive test for MATMUL [1,3,4] @ [2,4,5] -> [2,3,5] case - Verify gradients correctly reduced to input shapes

This commit fixes all 5 critical bugs discovered during comprehensive code review and verified through reproduction tests. Bug #1: Incorrect stride calculation in tofu_tensor_internal.h - Location: src/tofu_tensor_internal.h:61 - Issue: Loop used `t->dims[t->ndim - 2]` (dimension VALUE) instead of `t->ndim - 2` (array INDEX) - Impact: Out-of-bounds array access, incorrect stride calculations - Fix: Changed to use correct array index `t->ndim - 2` Bug #2: Type cast errors in fprintf functions - Location: src/tofu_type.c:256, 264 - Issue: fprintf_uint64 and fprintf_uint32 incorrectly cast to uint16_t - Impact: Data loss (75% for uint64, 50% for uint32), incorrect output - Fix: Cast to correct types (uint64_t and uint32_t respectively) Bug #3: Float comparison functions lose precision - Location: src/tofu_type.c:309-324 - Issue: cmp_double and cmp_float cast subtraction result to int - Impact: Fractional differences (<1.0) truncate to 0, comparisons fail - Fix: Use three-way comparison (if/else) instead of arithmetic Bug #4: Integer comparison overflow/underflow - Location: src/tofu_type.c:327-381 - Issue: All integer comparison functions use subtraction which can overflow (signed) or underflow (unsigned) - Impact: Wrong comparison results, sorting fails, undefined behavior - Fix: Use three-way comparison for all integer types (int64, int32, int16, int8, uint64, uint32, uint16, uint8) Bug #5: Backwards NDEBUG conditional logic - Location: src/tofu_tensor.c:29 - Issue: Used `#ifdef NDEBUG` with assert(), but assert() is disabled when NDEBUG is defined - Impact: Bounds checking never runs in debug OR release mode - Fix: Changed to `#ifndef NDEBUG` so bounds checking runs in debug mode All fixes have been: - Verified to compile without errors - Tested with bug reproduction test suite - Documented in CODEBASE_REVIEW_REPORT.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

c2akula added 2 commits October 19, 2025 22:15

c2akula merged commit 26f3736 into develop Oct 19, 2025
2 checks passed

c2akula deleted the feature/fix-mul-broadcast-gradient branch October 19, 2025 21:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix broadcasting gradient reduction in ADD and MUL operations #1

Fix broadcasting gradient reduction in ADD and MUL operations #1

Uh oh!

c2akula commented Oct 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix broadcasting gradient reduction in ADD and MUL operations #1

Fix broadcasting gradient reduction in ADD and MUL operations #1

Uh oh!

Conversation

c2akula commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Testing

New Tests

Existing Tests

Impact

Resolves

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

c2akula commented Oct 19, 2025 •

edited

Loading