[RFC] Using tvm-ffi binding instead of python api to remove the dependency for python binding

**To**: DeepGEMM Maintainers / SGLang Development Team
**From**: @rainj-me @Fridge003
**Date**: March 25, 2026
**Subject**: RFC: Transitioning DeepGEMM to TVM-FFI for Python-Agnostic Bindings

## 1. Executive Summary
This RFC proposes replacing the current Python-native bindings in DeepGEMM with TVM-FFI. The goal is to decouple the compiled library from specific Python minor versions, significantly reducing the maintenance overhead of releasing multiple wheel packages and simplifying integration within the SGLang runtime.

## 2. Problem Statement
Currently, DeepGEMM’s Python bindings create a strict dependency on the Python version used during compilation (e.g., a wheel built for Python 3.10 is incompatible with Python 3.12). This necessitates:
- Redundant CI/CD pipelines to build wheels for every supported Python version.
- Increased package distribution size and complexity.
- Friction for SGLang users operating in diverse environments.

## 3. Proposed Solution: TVM-FFI Migration
By migrating to TVM-FFI, we can move toward a "compile once, run anywhere" model regarding Python versions. The FFI (Foreign Function Interface) approach allows the core C++/CUDA logic to be exposed via a stable C API.

### 3.1 Key Advantages
Binary Portability: A single binary can serve multiple Python versions (e.g., 3.10 through 3.13), as long as the Torch and CUDA versions match.

Reduced Package Footprint: Eliminates the need for version-specific wheels, streamlining the release process.

Proven Compatibility: Preliminary testing within the SGLang fork confirms that DeepGEMM compiled with Python 3.10 successfully operates in a Python 3.12 environment after the TVM-FFI migration.

## 4. Implementation Status
We have already initiated a proof-of-concept within the SGLang ecosystem. Reference implementation and initial testing can be found here:
- Reference PR: [sgl-project/DeepGEMM/pull/22](https://github.com/sgl-project/DeepGEMM/pull/22)

## 5. Request for Feedback
We would like to coordinate with the DeepGEMM maintainers on the following:
1. Does the DeepGEMM project accept the transition to TVM-FFI as the primary binding method?
1. Are there specific architectural constraints or upstream requirements we should consider before submitting a formal PR?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Using tvm-ffi binding instead of python api to remove the dependency for python binding #297

1. Executive Summary

2. Problem Statement

3. Proposed Solution: TVM-FFI Migration

3.1 Key Advantages

4. Implementation Status

5. Request for Feedback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Using tvm-ffi binding instead of python api to remove the dependency for python binding #297

Description

1. Executive Summary

2. Problem Statement

3. Proposed Solution: TVM-FFI Migration

3.1 Key Advantages

4. Implementation Status

5. Request for Feedback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions