-
Notifications
You must be signed in to change notification settings - Fork 854
[RFC] Using tvm-ffi binding instead of python api to remove the dependency for python binding #297
Description
To: DeepGEMM Maintainers / SGLang Development Team
From: @rainj-me @Fridge003
Date: March 25, 2026
Subject: RFC: Transitioning DeepGEMM to TVM-FFI for Python-Agnostic Bindings
1. Executive Summary
This RFC proposes replacing the current Python-native bindings in DeepGEMM with TVM-FFI. The goal is to decouple the compiled library from specific Python minor versions, significantly reducing the maintenance overhead of releasing multiple wheel packages and simplifying integration within the SGLang runtime.
2. Problem Statement
Currently, DeepGEMM’s Python bindings create a strict dependency on the Python version used during compilation (e.g., a wheel built for Python 3.10 is incompatible with Python 3.12). This necessitates:
- Redundant CI/CD pipelines to build wheels for every supported Python version.
- Increased package distribution size and complexity.
- Friction for SGLang users operating in diverse environments.
3. Proposed Solution: TVM-FFI Migration
By migrating to TVM-FFI, we can move toward a "compile once, run anywhere" model regarding Python versions. The FFI (Foreign Function Interface) approach allows the core C++/CUDA logic to be exposed via a stable C API.
3.1 Key Advantages
Binary Portability: A single binary can serve multiple Python versions (e.g., 3.10 through 3.13), as long as the Torch and CUDA versions match.
Reduced Package Footprint: Eliminates the need for version-specific wheels, streamlining the release process.
Proven Compatibility: Preliminary testing within the SGLang fork confirms that DeepGEMM compiled with Python 3.10 successfully operates in a Python 3.12 environment after the TVM-FFI migration.
4. Implementation Status
We have already initiated a proof-of-concept within the SGLang ecosystem. Reference implementation and initial testing can be found here:
- Reference PR: sgl-project/DeepGEMM/pull/22
5. Request for Feedback
We would like to coordinate with the DeepGEMM maintainers on the following:
- Does the DeepGEMM project accept the transition to TVM-FFI as the primary binding method?
- Are there specific architectural constraints or upstream requirements we should consider before submitting a formal PR?