A compiler that runs ONNX machine learning models using Vulkan compute.
Below are the transformations the ONNX goes through until it reaches its final form which can be (indirectly) fed to Vulkan for execution. Details are a bit foggy at this stage as I'm still planning and experimenting. The l0 through ln names are inspired by Nanopass, which I am trying to follow the design patterns enabled and encouraged by. This is by no means the final architecture. More IRs might be introduced at any place in the transformation pipeline.
The ONNX file, which is a protobuf, is decoded into Rust data structures from the binary source using the protobuf crate. Said data structure are automatically generated by the same crate using the protobuf schema from the ONNX repository.
The decoded tree of the decoded protobuf containts a lot of unnecessary information and is quite wasteful as a form of MLM representation so we parse that into an IR that we call l0, which stores only the necessary information useful to latter stages that can be extracted from the source.
Even at this stage, the representation is still not exact enough.
Converts l0 into l1, an even more strict and verbose IR. During this translation shape inference for tensors is performed.
- Shape variables
Converts l1 into l2, an IR that represents low level operations on tensor-buffers (not just buffers because they sill have their shape).
- Implement optimizations such as using an operand buffer as the result one, etc
Now represented by session, this is the stage where all the Vulkan setup happens, all derived from a context. At this stage the operations are converted to SPIR-V binaries through kernel which in turn generates shaders.
The session can be given inputs and can be invoked. Invocation will in turn put to work all the Vulkan machinery according to the execution graph.
- Kernel reuse
- Planned allocation of buffers
- Kernel IR for optimizations such as fusion, etc
- Use cranelift-entity instead of handmade vectors + indices as references.