Tofu is a lightweight deep learning framework for C, designed for embedded systems and environments where Python frameworks aren't available. It features automatic differentiation, dynamic computation graphs, and comprehensive validation across diverse neural network architectures.
✨ v1.1.0 Released! Feature release with improved broadcasting gradient reduction and full NumPy-compatible slicing. All tofu_* functions follow strict semantic versioning guarantees. See API_STABILITY.md and CHANGELOG.md for details.
Complete documentation is available at: https://c2akula.github.io/tofu/
The documentation includes:
- Getting Started Guide - Installation, quick start, and core concepts
- User Guide - Tensors, graphs, training loops, optimizers, and loss functions
- Tutorials - Step-by-step examples for linear regression, classification, CNNs, and ResNets
- Best Practices - Memory management, error handling, debugging, and performance tips
- API Reference - Complete API documentation for all public functions
Key Features:
- 🧠 Automatic Differentiation: Dynamic computation graphs with backward pass
- ✅ Validated Operations: All gradients numerically verified (13/13 tests passing)
- 🎯 Production Ready: Multi-class classification (100% accuracy), regression (MSE < 0.001)
- 📊 NumPy-compatible API: Familiar broadcasting semantics and tensor operations
- 🔧 Zero Dependencies: Pure C with no external libraries
- 📱 ESP32 Support: Cross-compilation for embedded deployment
- 🏗️ Modern Architectures: Residual networks, deep networks (10+ layers) validated
#include "tofu_graph.h"
#include "tofu_tensor.h"
#include "tofu_optimizer.h"
// Create computation graph
tofu_graph* g = tofu_graph_create();
// Build simple neural network: [2] → [4] → [1]
float W1_data[8], W2_data[4], input_data[2] = {1.0f, 2.0f};
tofu_tensor* t_input = tofu_tensor_create(input_data, 1, (int[]){2}, TOFU_FLOAT);
tofu_tensor* t_W1 = tofu_tensor_create(W1_data, 2, (int[]){2, 4}, TOFU_FLOAT);
tofu_tensor* t_W2 = tofu_tensor_create(W2_data, 2, (int[]){4, 1}, TOFU_FLOAT);
// Forward pass
tofu_graph_node* x = tofu_graph_input(g, t_input);
tofu_graph_node* W1 = tofu_graph_param(g, t_W1);
tofu_graph_node* h = tofu_graph_matmul(g, x, W1);
tofu_graph_node* h_act = tofu_graph_relu(g, h);
tofu_graph_node* W2 = tofu_graph_param(g, t_W2);
tofu_graph_node* output = tofu_graph_matmul(g, h_act, W2);
// Backward pass (automatic differentiation)
tofu_graph_backward(g, output);
// Gradients computed in W1->grad, W2->grad
tofu_graph_free(g);See the online tutorials and examples/ for complete training examples.
Current Behavior (v1.1.0):
- Tofu uses
assert()for parameter validation and error detection - Invalid inputs (NULL pointers, mismatched dimensions, NaN/Inf) will trigger assertions and crash
- This is intentional for development/debugging - crashes provide immediate feedback
Known Limitations:
- No graceful error recovery - asserts will terminate the program
- Limited input validation - assumes well-formed data
- Float32 precision: ~7 decimal digits
Best Practices:
- Validate dimensions before calling Tofu operations
- Check for NaN/Inf in your data if using untrusted inputs
- Use debug builds during development (
-gflag) - Test with sanitizers (AddressSanitizer, UndefinedBehaviorSanitizer) to catch issues early
Roadmap:
- v1.0.0: Document all edge cases with regression tests
- v1.1.0+: Graceful error handling with return codes (breaking change)
| Platform | Build | Tests | CI/CD | Status |
|---|---|---|---|---|
| Linux (Ubuntu 20.04+) | ✅ | ✅ | ✅ | Fully supported |
| macOS (13+) | ✅ | ✅ | ✅ | Fully supported |
| ESP32 | ✅ | ❌ | Build only (no test suite) | |
| Windows | ❌ | ❌ | ❌ | Not supported (planned v1.1.0+) |
Build Requirements:
- Compiler: GCC 7+ or Clang 10+
- Build System: GNU Make + configure script
- Dependencies: pkg-config, check (for tests)
- Optional: ESP32 toolchain for cross-compilation
ESP32 Cross-Compilation:
./configure --esp32=yes --esp32-toolchain-dir=/path/to/toolchain
make libThe following steps have been tested for Ubuntu 16.04 but should work with other distros as well. Required packages can be installed using the following command:
sudo apt-get install build-essential perl git pkg-config check
To cross-compile for ESP32, you need the ESP32 toolchain installed. You can install it by:
- Installing ESP-IDF: Follow the instructions at ESP-IDF Programming Guide
- Ensuring the ESP32 toolchain is in your PATH or specify its directory with the
--esp32-toolchain-diroption
For consistent testing across different environments, you can use the provided Docker setup:
- Make sure you have Docker installed on your system
- Run the provided script to build and test in a Docker container:
./run-tests-docker.sh
This will create an Ubuntu 22.04 container with all required dependencies and run the tests inside it. See the Docker documentation for more information.
-
Clone this repository to your local directory.
cd <my_working_directory> git clone https://github.com/c2akula/tofu.git cd tofu -
Build and install
First, configure your installation using:
chmod +x configure ./configureThere are options to customize your building and installation process. You can append them after
./configure.For cross-compiling for ESP32:
./configure --esp32=yesOr if your ESP32 toolchain is not in your PATH:
./configure --esp32=yes --esp32-toolchain-dir=/path/to/esp-idf/tools/xtensa-esp32-elfDetailed
./configureoptions can be printed using./configure -h.After that, use
maketo compile the library and run the tests. Thenmake installto copy the library files and headers into the installation directory, orsudo make installif you don't have the permissions with that directory.make sudo make install -
Other
makeoptionsUse
make infoto see othermakeoptions. Especially, you can usemake cleanto clean up the build directory and all object files, andmake uninstallto remove library files and headers from the installation directory.
Include tofu_tensor.h in your project to use Tofu functions.
You can use the following command to get the compilation and linking flags when building your project.
pkg-config --cflags --libs tofu
Tofu now supports broadcasting operations similar to NumPy. Broadcasting allows you to perform operations on arrays of different shapes. The rules for broadcasting are:
- Arrays with fewer dimensions are prepended with dimensions of size 1.
- Size-1 dimensions are stretched to match the corresponding dimension of the other array.
- Arrays must be broadcastable in each dimension (they must have the same size, or one of them must have size 1).
// Check if two tensors can be broadcast together
int tofu_tensor_isbroadcastable(const tofu_tensor *t1, const tofu_tensor *t2);
// Broadcast a tensor to a new shape
tofu_tensor *tofu_tensor_broadcast_to(const tofu_tensor *src, tofu_tensor *dst, int ndim, const int *dims);
// Element-wise operation with broadcasting
tofu_tensor *tofu_tensor_elew_broadcast(const tofu_tensor *src1, const tofu_tensor *src2, tofu_tensor *dst, tofu_elew_op elew_op);// Example 1: Broadcasting a scalar to a matrix
float scalar_val = 5.0f;
tofu_tensor *scalar = tofu_tensor_create(&scalar_val, 1, (int[]){1}, TOFU_FLOAT);
tofu_tensor *result = tofu_tensor_broadcast_to(scalar, NULL, 2, (int[]){3, 4});
// result will be a 3×4 matrix filled with 5.0
// Example 2: Element-wise multiplication with broadcasting
float arr1[] = {1, 2, 3}; // Shape: [3]
float arr2[] = {10, 20}; // Shape: [2, 1]
tofu_tensor *t1 = tofu_tensor_create(arr1, 1, (int[]){3}, TOFU_FLOAT);
tofu_tensor *t2 = tofu_tensor_create(arr2, 2, (int[]){2, 1}, TOFU_FLOAT);
tofu_tensor *result = tofu_tensor_elew_broadcast(t1, t2, NULL, TOFU_MUL);
// result will have shape [2, 3] and values [[10, 20, 30], [20, 40, 60]]Tofu provides NumPy-compatible tensor operations for linear algebra and tensor manipulation.
Computes the inner product (sum-product over last axes) with cartesian product semantics.
tofu_tensor *tofu_tensor_inner(const tofu_tensor *src1, const tofu_tensor *src2, tofu_tensor *dst);Behavior:
- 1-D × 1-D: Dot product → scalar
- 2-D × 2-D:
result[i,j] = sum(a[i,:] * b[j,:]) - N-D × N-D: Cartesian product of non-last dimensions
Output shape: (*a.shape[:-1], *b.shape[:-1])
Example:
// Vector dot product
float a[] = {1, 2, 3};
float b[] = {4, 5, 6};
tofu_tensor *v1 = tofu_tensor_create(a, 1, (int[]){3}, TOFU_FLOAT);
tofu_tensor *v2 = tofu_tensor_create(b, 1, (int[]){3}, TOFU_FLOAT);
tofu_tensor *result = tofu_tensor_inner(v1, v2, NULL);
// result: scalar 32.0 (1*4 + 2*5 + 3*6)
// Matrix inner product [2,3] × [2,3] → [2,2]
float mat1[] = {1, 2, 3, 4, 5, 6};
float mat2[] = {1, 1, 1, 2, 2, 2};
tofu_tensor *m1 = tofu_tensor_create(mat1, 2, (int[]){2, 3}, TOFU_FLOAT);
tofu_tensor *m2 = tofu_tensor_create(mat2, 2, (int[]){2, 3}, TOFU_FLOAT);
result = tofu_tensor_inner(m1, m2, NULL);
// result[i,j] = sum(m1[i,:] * m2[j,:])Computes matrix multiplication with broadcasting on batch dimensions.
tofu_tensor *tofu_tensor_matmul(const tofu_tensor *src1, const tofu_tensor *src2, tofu_tensor *dst);Behavior:
- 1-D @ 1-D: Dot product → scalar
- 2-D @ 2-D: Standard matrix multiplication
- N-D @ 1-D: Matrix-vector, drops last dimension
- 1-D @ N-D: Vector-matrix, drops first dimension
- N-D @ N-D: Batch matrix multiplication with broadcasting
Output shape: Broadcasts batch dimensions, contracts last of a with second-to-last of b
Example:
// Standard matrix multiplication [2,3] @ [3,2] → [2,2]
float a[] = {1, 2, 3, 4, 5, 6};
float b[] = {1, 1, 2, 2, 3, 3};
tofu_tensor *m1 = tofu_tensor_create(a, 2, (int[]){2, 3}, TOFU_FLOAT);
tofu_tensor *m2 = tofu_tensor_create(b, 2, (int[]){3, 2}, TOFU_FLOAT);
tofu_tensor *result = tofu_tensor_matmul(m1, m2, NULL);
// Standard matrix multiply: result[i,j] = sum(m1[i,:] * m2[:,j])
// Batch matrix multiplication with broadcasting [3,4] @ [2,4,5] → [2,3,5]
// The [3,4] matrix is broadcast across 2 batchesComputes the outer product (cartesian product without summation).
tofu_tensor *tofu_tensor_outer(const tofu_tensor *src1, const tofu_tensor *src2, tofu_tensor *dst);Behavior:
- Flattens both input tensors
- Computes:
result[i,j] = a[i] * b[j] - Always produces 2-D output
Output shape: [a.size, b.size] where size is total element count
Example:
// Vector outer product [3] outer [4] → [3,4]
float a[] = {1, 2, 3};
float b[] = {4, 5, 6, 7};
tofu_tensor *v1 = tofu_tensor_create(a, 1, (int[]){3}, TOFU_FLOAT);
tofu_tensor *v2 = tofu_tensor_create(b, 1, (int[]){4}, TOFU_FLOAT);
tofu_tensor *result = tofu_tensor_outer(v1, v2, NULL);
// result[i,j] = a[i] * b[j]
// [[4, 5, 6, 7],
// [8, 10, 12, 14],
// [12, 15, 18, 21]]
// Multi-dimensional inputs are flattened first
// [2,2] outer [2] → flatten to [4] outer [2] → [4,2]| Operation | 1-D × 1-D | 2-D × 2-D | N-D Behavior | Use Case |
|---|---|---|---|---|
| inner | Dot product (scalar) | Sum over last of both | Cartesian product | Generalized inner products |
| matmul | Dot product (scalar) | Matrix multiplication | Broadcasting batch dims | Deep learning, batch ops |
| outer | Outer product [m,n] | Flattens first [m²,n²] | Always flattens | Tensor products |
Key Differences:
- inner: Cartesian product of non-last dimensions (independent indices)
- matmul: Broadcasting of batch dimensions (shared indices) - most efficient for batches
- outer: No summation, always produces 2-D output
All tensor operations accept an optional pre-allocated destination tensor:
// Create destination tensor first
tofu_tensor *dst = tofu_tensor_zeros(2, (int[]){2, 2}, TOFU_FLOAT);
// Use pre-allocated destination
tofu_tensor *result = tofu_tensor_matmul(m1, m2, dst);
// result points to dst (reused)This avoids memory allocation overhead in performance-critical loops.
Additional documentation is coming soon. But the API should be familiar if you have experience with numpy in Python.
Tofu is a derivative work based on TensorLight by Zhao Zhixu (2018-2020). We are grateful for the original project, which provided the foundational tensor computation library.
Major Enhancements (2025):
- Dynamic computation graph infrastructure with automatic differentiation
- Complete backward pass implementation (all operation gradients)
- Comprehensive validation suite (13+ tests with numerical verification)
- CI/CD pipeline with multi-platform testing
- Memory safety verification (AddressSanitizer)
- Production-ready examples (CNN, ResNet)
- ESP32 cross-compilation support
- Edge case testing and documentation
See CONTRIBUTORS.md for detailed attribution.