Algebra Library for Tensors with Highly-Extensible APIs
This project is written in C# (>= 11 and .NET >= 7.0) and CUDA (>= 10.0 with C++ >= 17) (and Intel TBB in future). It mainly focuses on general purposed scientific computations. It is designed to fulfill the high performance, user-friendliness as well as high extendibility at the same time.
Currently, this project is not yet tested and the UnitTest
sub-project is obsolete.
This library follows the GNU GPL v3 license (TBD).
-
Cross platform and cross device
- It can be run on Windows, Linux or MacOS with CPUs and GPUs
-
Thread and memory safe
-
Common Language Specification (CLS) compliant (see CLSCompliantAttribute)
- Therefore, users of VB.NET and F# can import this project directly
-
Fully aspect- and interface- oriented
-
High Performance (with high-performance implementations such as the default CUDA and MKL ones)
-
High Extensibility
- All modules and aspects are designed to support any possible extensions in the future and all the default implementations are written in the same regulations
- All public extendable classes are implementations of interfaces and All public operations depend on these interfaces
- Each module and aspect can be changed to custom ones individually during runtime
- Fully functional source code generators (e.g.
ApiSelectorGenerator
) to help users to generate repetitive codes - Level 1: Array classes like
Althea.Array.DenseVector<T, TS>
can be inherited or fully rewritten with similar interfaces if necessary - Level 2: API selectors that are auto generated by
ApiSelectorGenerator
- Level 3: APIs that inherits
IAbstractRuntimeApi
. One can simply add other APIs via inheriting sameIAbstractRuntimeApi
- Level 4: Actual classes that implements APIs (e.g.
Althea.Backend.Cuda.LinearAlgebra.Dense.Api
). These classes can be inherited to modify the default behaviors or upgrade to a new version of actual backend - Level 5: Native methods like CUDA C ABIs. The
NativeMethodsGenerator
provides a easy way to auto generate repetitive P/Invoke codes
-
Althea
-- basic definitions and interfaces- The abstract runtime API interfaces to be inherited by all APIs in all modules and the helper interface and class to help code generations
- The settings that provides easy access and modify the usage of all backends
-
Althea.Numerics
-- interfaces and structures for real and complex number types- Since .NET 7.0 does not provide complex-compatible number interfaces, most structures of this namespace is auto generated ones that implements custom complex-compatible number interfaces
-
Althea.Helpers
-- helper classes (mostly static ones) and structures that can be used to enhance code simplicityExtensionHelper.cs
-- all sorts of utility extension methods- fixed buffers -- template files that generate fixed-sized buffer structures for unmanaged data types and class types
LimitSizedCacher.cs
-- thread-safe caching structures of limited size and/or limited candidate sizeLog.cs
-- a simple asynchronous log systemSpanHelper.cs
-- extension methods forSpan<T>
andReadOnlySpan<T>
that provide LINQ-like operations for spans to reduce GC pressureSpanVariants.cs
--SpanList<T>
andSpanMatrix<T>
that are based onSpan<T>
while behave likeList<T>
and a column-majored matrix. Can be used to reduce GC pressureSwapSort.tt
-- a template that used to generate one-key-multiple-values sort based on swapping
-
Althea.Storage
-- native storage related structures, interfaces and classes which provide a unified and easy-to-use interface for accessing and manipulating memory blocks and local/remote files on different devicesInterface.cs
-- various interfaces for accessing storages, the interfaces ofAlthea.Array
are based on these interfaces rather than abstract classesAPI.cs
-- the storage API that regulates the allocation, free, initialization and copy operations between same and/or different memory positions. Also, IL generator is used to generate complicated operation methods for concrete storage classes if necessaryCacheStorage.cs
,MixedStorage.cs
andPureStorages.cs
-- simple implementations of storage interfaces
-
Althea.Array
-- interfaces and implementation of arrays, vectors, matrices and tensorsBaseInterfaces.cs
-- the basic interfaces for arrays, vectors, matrices and tensorsValueArray.cs
-- the basic interfaces for arrays with at least one value storage and a value array manager class to manage the storage cross references of themWrapper.cs
-- the wrappers for dense and sparse arrays and sparse format- operations -- array operations' interfaces
- vectors -- vector interface, dense vector, abstract sparse vector and the most common implementation, i.e., coordinated sparse vector
- matrices -- matrix interface, dense matrix, triangular matrix, symmetric/Hermitian matrix, abstract sparse matrix and the most common implementations, i.e., COO, CSR/CSC and BSR/BSC sparse matrices
- tensors -- tensor interface, dense tensor, abstract sparse tensor and the most common implementation, i.e., coordinated sparse tensor
-
Althea.LinearAlgebra
-- APIs and structures for dense and sparse vector and matrix operationsDense
-- stride and 2D copy APIs, BLAS- and LAPACK-like APIs, vector and matrix math APIs, symmetric and triangular matrix APIs and
ded math APIs
Sparse
-- conversion and computation APIs for sparse vectors and matrices, as well as extended index-related APIs (e.g. sort with multiple values)
-
Althea.TensorAlgebra
-- interfaces for dense and sparse tensors operationsTensorOrder.cs
-- the structure used to label the order of tensors indicated byint
,char
(the native label type of tensors),System.Index
and/orSystem.Range
Dense
-- dense tensor transposition, reduction, contraction and element-wise math APIsSparse
-- sparse tensor slicing, transposition, reduction, contraction and element-wise math APIs
-
*Distributions.cs
-- interfaces and implementations of random distributions of integral and floating point number types of different ranksAPI.tt
-- template file that generates APIs for filling storages according to distribution of different ranks
-
Althea.Transformer
-- dense array (of any rank) discrete fast Fourier transformation APIs
The C#, CUDA and MKL backends that implements the APIs above. Currently, not all of the APIs are implemented: the C# backend only supports operations with complexity not larger than
-
Althea.Backend.CSharp
-- default implementations of storage, linear algebra operations and random number generators using only C# language to make sure the basic functionalities of this library works in case none of CUDA, MKL or other custom backends is available- The implementations utilizes SIMD if possible
- The eigen-problem and Schur-problem of one symmetric/Hermitian or non-symmetric/Hermitian matrix is implemented via SIMD, while the complex types may not be well accelerated
- The DFFT for one-dimensional arrays with integral
$\log_2(n)$ is supported via SIMD - The above two implementations are publicly available for
Span<T>
-
Althea.Backend.Cuda
-- implementations of storage, linear and tensor algebra operations, random number generators and DFFT using CUDA, cuTENSOR and custom functions written inthrust::par::cuda
-
Althea.Backend.Mkl
-- implementations of storage, linear algebra operations, random number generators and DFFT using MKL and custom functions written inthrust::par::tbb
The general solver extensions.
-
Kronecker
-- provides the memory and time efficient way of computing vector multiplying Kronecker product/sum result, i.e.$(A\otimes B)vec(X)$ and$(A\oplus B)vec(X)$ . Including interfaces that requires vectors/matrices to implement, API, implementation and zero-overhead extensions to dense vector and matrix inAlthea.Array
that implements the required interfaces -
Krylov
-- provides the memory efficient way of computing first (several) eigenvector(s) of large matrices that cannot be stored explicitly via Krylov subspace methods. Including the interface (IKrylovVector<T, TVec>
) that requires vectors to implement, API, implementation and zero-overhead extensions toAlthea.Array.DenseVector<T, TS>
that implements the required interface- Naive Lanczos solver
- Restart Lanczos solver
- Restart Krylov-Schur solver
- Generalized Minimal Residual solver
- Conjugate Gradient solver
The custom functions written in C++ to implement extend math operations on both CPU and GPU.
If you are using Linux and the CUDA or MKL is correctly installed (PATH, etc. are correctly configured), then simply import this project and you are good to go.
using System;
using Althea;
using Althea.Array;
using Althea.Numerics;
using Althea.Random;
using MP = Althea.Backend.Cuda.CudaMemoryPointer<Althea.Backend.Cuda.GpuId0>;
// or
// using MP = Althea.Backend.CpuMemoryPointer;
// to use CPU instead
using Storage = Althea.Storage.PureStorage<Float64, MP>;
// top level codes
Settings.Initialize(); // initialize settings
var s1 = Storage.Create(1024); // create a storage that occupies 1024 * sizeof(Float64) bytes on GPU0, 1024 can be a runtime variable
var vec = new DenseVector<Float64, Storage>(s1, s1.Length);
s1.FillWith(1.0); // fill vector with ones, 1.0 can be a runtime variable, and if it is composed of same bytes, fast initialization will be used
var s2 = Storage.Create(1024 * 1024);
var mat = new DenseMatrix<Float64, Storage>(s2, 1024, 1024);
var dist = new NormalDistribution<Float64>(); // a standard normal distribution
ApiSelector.FillWithRandom<Float64, Storage>(s2, dist); // fill matrix with random values generated from `dist`
var sumRows = mat * vec; // multiply `mat` and `vec` and create a new storage and vector to store the result
Console.WriteLine(sumRows.Print()); // print the contents of the resulting vector to console
Also, operator wrappers are not exclusive for vectors and matrices, even sparse and dense vector, matrix and operations have operator wrappers. There are also series of operations that can in-place operate them, solving some problems of them, etc.
Writing in this way, all unmanaged memories (such as GPU memory allocated at GPU0 in the example) will be collected by GC. This is usually fine if the memory is not at tense, otherwise, the following way is recommended.
var s1 = Storage.Create(1024);
var s2 = Storage.Create(1024 * 1024);
{
using var vec = new DenseVector<Float64, Storage>(s1, s1.Length);
vec.Storage.FillWith(1.0);
using var mat = new DenseMatrix<Float64, Storage>(s2, 1024, 1024);
var dist = new NormalDistribution<Float64>();
ApiSelector.FillWithRandom<Float64, Storage>(mat.Storage, dist);
using var sumRows = mat * vec;
Console.WriteLine(sumRows.Print());
}
// now, the `using` variables will be automatically disposed when out-of-scope
// i.e., the GPU memory represented by `s1` and `s2` are freed
Furthermore, if you do not need the wrappers provided by array classes, you can simply use API selectors directly.
{
using var vec = Storage.Create(1024);
using var mat = Storage.Create(1024 * 1024);
vec.FillWith(1.0);
var dist = new NormalDistribution<Float64>();
ApiSelector.FillWithRandom<Float64, Storage>(mat, dist);
using var sumRows = vec.CreateAlike();
Althea.LinearAlgebra.Dense.BlasApiSelector.GeneralMatrixMultiplyVector(Althea.LinearAlgebra.MatrixOperation.None, 1024, 1024, 1.0, mat, 1024, vec, 1, 0.0, sumRows, 1);
foreach (var item in sumRows)
{
Console.WriteLine(item);
}
}
Although you can eliminate array wrappers' overheads, it is not recommended unless you are pretty sure what you are doing. Since the overheads are quite small relative to the actual computation when the array size is large.
Also, there are so many im
using Althea;
......
// if the API type of `impl` is known
LinearAlgebra.Dense.IBlasApi impl = ...;
Settings.SetImplementation(impl);
// if the API type of `impl` is unknown or `impl` implements multiple APIs
object impl = ...;
Settings.SetImplementation(impl);
// if multiple APIs of same kind will be set at the same time
IBackend backend = ...;
Settings.TrySetBackend(backend);
Following the implementation classes in Althea.Backend
, I believe it is relatively easy to write your own implementation for any API.
As for upgrading existing implementation(s), there are many options, in which the easiest one is to inherit the non-sealed existing implementation class(es). For example,
using Althea.Backend.Cuda.LinearAlgebra.Dense;
namespace MyNamespace;
public class ApiUpgrade : Api
{
/// <inheritdoc/>
public override bool GeneralMatrixMultiplyVector<T, TSM, TSV1, TSV2>(MatrixOperation op, long m, long n, T α, TSM A, long lda, TSV1 x, long strideX, T β, TSV2 y, long strideY) where T : unmanaged, IBaseNumber<T> where TSM : class, IStorage<T, TSM> where TSV1 : class, IStorage<T, TSV1> where TSV2 : class, IStorage<T, TSV2>
{
// your own implementation here. You can simply use cuBLASXt rather than default cuBLAS to utilize multiple GPUs.
}
// other implementations ...
}
And then set implementation(s) to your class instance.
By referring the sub-project Althea.GeneralSolvers
, I believe that it is not difficult to write your own API and implementation at the same time.
- Directly use Intel TBB rather than
thrust::par::tbb
inAlthea.ExtendBlas
when CPU routines are to be compiled - Add partial Schur vector solver for
Althea.GeneralSolvers.Krylov
- Add ODE and PDE solvers for
Althea.GeneralSolvers
- Add dynamic extended math supports
- Add a sub-project for (random) particle simulation extension
- Add a more user-friendly frontend
- Finish unit test of the whole project