Skip to content

onnx-compilers/vk_onnx

Repository files navigation

vk_onnx

A compiler that runs ONNX machine learning models using Vulkan compute.

Operation and high level architecture

Diagram of high level system overview

Below are the transformations the ONNX goes through until it reaches its final form which can be (indirectly) fed to Vulkan for execution. Details are a bit foggy at this stage as I'm still planning and experimenting. The l0 through ln names are inspired by Nanopass, which I am trying to follow the design patterns enabled and encouraged by. This is by no means the final architecture. More IRs might be introduced at any place in the transformation pipeline.

Decoding

The ONNX file, which is a protobuf, is decoded into Rust data structures from the binary source using the protobuf crate. Said data structure are automatically generated by the same crate using the protobuf schema from the ONNX repository.

Parsing

The decoded tree of the decoded protobuf containts a lot of unnecessary information and is quite wasteful as a form of MLM representation so we parse that into an IR that we call l0, which stores only the necessary information useful to latter stages that can be extracted from the source.

Even at this stage, the representation is still not exact enough.

TODO

Computational graph refinement and shape inference

Converts l0 into l1, an even more strict and verbose IR. During this translation shape inference for tensors is performed.

TODO

  • Shape variables

Lowering to abstract execution graph

Converts l1 into l2, an IR that represents low level operations on tensor-buffers (not just buffers because they sill have their shape).

TODO

  • Implement optimizations such as using an operand buffer as the result one, etc

Lowering to concrete execution graph

Now represented by session, this is the stage where all the Vulkan setup happens, all derived from a context. At this stage the operations are converted to SPIR-V binaries through kernel which in turn generates shaders.

The session can be given inputs and can be invoked. Invocation will in turn put to work all the Vulkan machinery according to the execution graph.

TODO

  • Kernel reuse
  • Planned allocation of buffers
  • Kernel IR for optimizations such as fusion, etc

TOOD

About

ONNX machine learning model -> Vulkan compute shaders compiler

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published