Description
Here's an actual real world problem that Fable.Python can solve...
PyTorch has a JIT mode, that can make neural network models run much faster: Speeding up model training with PyTorch JIT
The way JIT works is:
Instead of interpreting Python neural network code and pushing tensor ops to the GPU line by line, you produce TorchScript code containing neural network model's AST.
TorchScript is restricted subset of Python that is simple enough, that it can be parsed by some C++ code inside Torch and then compiled into a CUDA kernel. And then that CUDA kernel code gets executed in a single call to the GPU making everything much faster (hence JIT).
Problem is that TorchScript (in Python world) is pain to produce. There are two ways:
- write restricted subset of Python by hand (error prone, because you need to figure out which subset of Python is allowed),
- write normal Python code and then run that code through a runtime tracer to reconstruct the original model AST from the trace. The problem is that tracing won't explore all if/then/else branches unless inputs are carefully designed. It also needs to be told which tensor dimensions are fixed and which are variable, etc.
F# solves this: unlike Python, F# makes it easy to produce ASTs without tracing.
So the idea is to use F# code quotations / computational expressions / reflected definitions of the F# AST code, then convert the AST into the TorchScript subset of Python, load that it into Torch to render fast CUDA kernel and then call that kernel directly.
Most of the infrastructure is already present: TorchSharp project contains the dotnet bindings to C++ Torch libraries (the same stuff that PyTorch and Lua Torch are built upon).
C++ Torch can load .pt
files containing TorchScript models.
TorchSharp could easily load .pt
files as well (the API is there), but unfortunately the C# code is lacking the Python parser and unpickler to extract Python method signatures from TorchScript, so that part is not supported yet. But if we generated the Python code ourselves from F#, we wouldn't need to parse that Python anyway, because we knew the method signatures all along from F# AST.
So the question is, which parts of Fable.Python can be reused for this?
At the minimum, I assume that Python AST parts can come handy, but I hope one can start somewhere closer to F# AST and reuse more of Fable infrastructure.
I'm just assuming that code quotations would be preferred here, because the tensor model hot-spot code that needs to get compiled into CUDA is usually just a few lines of code compared to whole project, that includes data loading, training loop, setting up the optimizers, etc... i.e. stuff that runs just fine in F# TorchSharp and can't be put on the GPU anyway. We could potentially parse the whole source file using FCS and throw everything else but the model code away.
Any thoughts?