Thanks for contributing to solvr. This guide covers the architecture conventions and quality gates the project expects.
- A recent stable Rust toolchain.
- A clean working tree before opening a pull request.
The most valuable contributions are usually missing algorithms — coverage that SciPy/scikit-learn/scikit-image have but solvr does not yet, or new methods within an existing module. Bug fixes, numerical-accuracy improvements, and additional backend coverage are equally welcome.
Before writing a non-trivial algorithm, open an issue first describing what you want to add, the method/reference, and which crate it belongs in (see below). This avoids duplicated effort and lets us agree on placement and API up front. Small, self-contained fixes can go straight to a pull request.
solvr is one layer of a stack, and a contribution only belongs here if it fits this layer. Place new work by what it is, not where it's convenient:
- numr — foundational primitives that
everything else builds on: tensor ops, dtypes, the
Runtime/backend abstraction (and new backends themselves), FFT, core linear algebra (matmul, LU/QR/SVD/eigen,solve), special functions, and basic descriptive statistics. If it's a building block reused across domains, or it adds/touches a hardware backend, it goes in numr. - solvr (this crate) — complete scientific/solving algorithms composed from numr primitives: optimization, ODE/DAE/BVP/PDE, interpolation, advanced statistics (distributions, tests, regression), signal processing, spatial, clustering, graphs, morphology, and matrix-equation solvers.
- boostr — AI/ML-specific building blocks: attention, positional encodings, mixture-of-experts, quantization, neural-network layers, and training/inference machinery.
Quick test:
- Is it a low-level primitive (a tensor op, an FFT, a linear-algebra factorization, a special function) or a new backend? → numr.
- Is it a domain solver a scientist/engineer would reach for? → solvr.
- Does it only make sense for neural networks / LLMs? → boostr.
When in doubt, propose it in an issue and we'll help place it. A primitive that several higher layers would reuse should live in numr so the whole stack benefits, rather than being duplicated in solvr.
solvr is backend-agnostic: the same algorithm runs on CPU, CUDA, and WebGPU
through numr's Runtime abstraction. Every
algorithm is written once, generically, and each backend is a thin
delegation. Please follow this structure when adding or changing algorithms.
- Be generic over
R: Runtime; operate onTensor<R>, never on&[f64]/Vec<f64>parameters or struct fields. - Build computation out of numr operations rather than scalar
forloops — numr uses SIMD on CPU and kernels on GPU, so scalar loops are both slower and not portable across backends. - Support multiple dtypes (
F32/F64). Respect backend dtype limits (for example, the WebGPU backend is F32-only) and surface a clear error rather than silently degrading. - If a primitive you need does not exist in numr, add it to numr instead of working around it with a host-side loop.
Each module is laid out so the algorithm exists in exactly one place:
src/<module>/
├── mod.rs # ONLY `pub mod` + `pub use`
├── traits/ # trait definitions + option/result types
├── impl_generic/ # the algorithm: `fn <algo>_impl<R, C>(...)`
├── cpu/ # `impl Trait for CpuClient` — delegates to *_impl
├── cuda/ # `impl Trait for CudaClient` — delegates to *_impl
└── wgpu/ # `impl Trait for WgpuClient` — delegates to *_impl
- One algorithm = one file, with the same file name under
traits/,impl_generic/,cpu/,cuda/, andwgpu/. mod.rscontains onlypub mod/pub use— no logic, traits, or types.- Backend files (
cpu/,cuda/,wgpu/) are thin: they implement the trait by calling the generic*_implfunction and nothing else. - Adding an algorithm means adding new files, not expanding existing ones.
Host/device transfers cost far more than the computation itself. Inside
algorithms, do not call tensor.to_vec() or Tensor::from_slice(...).
The only acceptable transfers are:
- at the public API boundary (user-provided input / returned output), and
- a single scalar pulled to the host for a convergence/control-flow check.
Keep state in Tensor<R>, and keep loops on-device using numr ops.
cargo build # CPU (default)
cargo build --features cuda # CUDA (requires a CUDA 12.x toolchain)
cargo build --features wgpu # WebGPU
cargo build --features sparse # sparse-tensor-backed modulesThe graph and pde modules require sparse (enabled by default).
- Put unit tests in the same file as the code under test
(
#[cfg(test)] mod tests). - Test numerical correctness against an analytic or reference result, not just
that the call returns
Ok. - A backend-specific test should skip gracefully when no device is available rather than fail.
- Run the suite on each backend you can; CUDA/WebGPU paths exercise the same generic code but catch backend-specific issues.
cargo test --release # CPU
cargo test --release --features cuda,sparse
cargo test --release --features wgpuRun these before submitting. Clippy is run with -D warnings to match CI, so a
warning is a failure — treat it as one locally too.
cargo fmt --all -- --check
cargo clippy --all-targets --features f16,sparse,graph,pde -- -D warnings
cargo test --releaseIf you touch GPU backends, also run clippy with --features cuda and
--features wgpu.
- Keep PRs focused and scoped.
- Preserve the module structure and
impl_genericpattern described above. - Include tests for behavioral changes; verify numerical parity across backends where applicable.
- Update docs when public APIs or features change.
- Avoid
.unwrap()in library code — return a typed error with context.
Use Conventional Commits with a clear, imperative summary, for example:
feat(integrate): add automatic Jacobian sparsity detection for BDF and Radau
fix(spatial): correct single-vector rotation to apply R instead of R^T