RTen (the Rust Tensor engine) † is a machine learning runtime. It supports models in ONNX format. RTen enables you to take machine learning models which have been trained in Python using frameworks such as PyTorch and run them in Rust.
In addition to ML inference, the project also provides supporting libraries for common pre-processing and post-processing tasks in various domains. This makes RTen a more complete toolkit for running models in Rust applications.
† The name is also a reference to PyTorch's ATen library.
- Provide a (relatively) small and efficient neural network runtime that makes it easy to take models created in frameworks such as PyTorch and run them in Rust applications.
- Be easy to compile and run on a variety of platforms, including WebAssembly
- End-to-end Rust. This project and all of its required dependencies are written in Rust. This simplifies the build and deployment process.
RTen currently supports CPU inference only. It supports SIMD via AVX2, AVX-512, Arm Neon and WebAssembly SIMD. Inference uses multiple threads by default, defaulting to the number of physical cores (or performance cores). This can be customized.
RTen supports most standard ONNX operators. See this tracking issue for details. Please open an issue if you find that you cannot run a model because an operator is not supported.
RTen supports models with float32 weights as well as quantized models with int8 or uint8 weights. Quantized models can take advantage of CPU features such as VNNI (x86) and UDOT / i8mm (Arm) for better performance.
RTen can load models in ONNX format directly. It also supports a custom .rten
format which can offer faster load times and supports arbitrarily large models
in a single file. See the rten file format
documentation for more details on the format and
information on how to convert models.
The best way to get started is to clone this repository and try running some of the examples locally. Many of the examples use Hugging Face's Optimum or other Python-based tools to export the ONNX model, so you will need a recent Python version installed.
The examples are located in the rten-examples/ directory. See the README for descriptions of all the examples and steps to run them. As a quick-start, here are the steps to run the image classification example:
git clone https://github.com/robertknight/rten.git
cd rten
# Install dependencies for Python scripts
pip install -r tools/requirements.txt
# Export an ONNX model. We're using resnet-50, a classic image classification model.
python -m tools.export-timm-model timm/resnet50.a1_in1k
# Run image classification example. Replace `image.png` with your own image.
cargo run -p rten-examples --release --bin imagenet resnet50.a1_in1k.onnx image.pngModel format note: Support for running .onnx models directly is new in
RTen v0.23. To run models with earlier versions you need to convert them to the
.rten format first using rten-convert.
To use this library in a JavaScript application, there are two approaches:
-
Prepare model inputs in JavaScript and use the rten library's built-in WebAssembly API to run the model and return a tensor which will then need to be post-processed in JS. This approach may be easiest for tasks where the pre-processing is simple.
The image classification example uses this approach.
-
Create a Rust library that uses rten and does pre-processing of inputs and post-processing of outputs on the Rust side, exposing a domain-specific WebAssembly API. This approach is more suitable if you have complex and/or computationally intensive pre/post-processing to do.
Before running the examples, you will need to follow the steps under "Building the WebAssembly library" below.
The general steps for using RTen's built-in WebAssembly API to run models in a JavaScript project are:
- Develop a model or find a pre-trained one that you want to run. Pre-trained models in ONNX format can be obtained from the ONNX Model Zoo or Hugging Face.
- If the model is not already in ONNX format, convert it to ONNX. PyTorch users can use torch.onnx for this.
- Use the
rten-convertpackage in this repository to convert the model to the optimized format RTen uses. See the section above on converting models. - In your JavaScript code, fetch the WebAssembly binary and initialize RTen
using the
initfunction. - Fetch the prepared
.rtenmodel and use it to an instantiate theModelclass from this library. - Each time you want to run the model, prepare one or more
Float32Arrays containing input data in the format expected by the model, and callModel.run. This will return aTensorListthat provides access to the shapes and data of the outputs.
After building the library, API documentation for the Model and TensorList
classes is available in dist/rten.d.ts.
To build RTen for WebAssembly you will need:
- A recent stable version of Rust
make- (Optional) The
wasm-opttool from Binaryen can be used to optimize.wasmbinaries for improved performance - (Optional) A recent version of Node for running demos
git clone https://github.com/robertknight/rten.git
cd rten
make wasmThe build created by make wasm requires support for WebAssembly SIMD,
available since Chrome 91, Firefox 89 and Safari 16.4. It is possible to
build the library without WebAssembly SIMD support using make wasm-nosimd,
or both using make wasm-all. The non-SIMD builds are significantly slower.
At runtime, you can find out which build is supported by calling the
binaryName() function exported by this package.