Skip to content

[RFC] Using tvm-ffi binding instead of python api to remove the dependency for python binding #297

@rainj-me

Description

@rainj-me

To: DeepGEMM Maintainers / SGLang Development Team
From: @rainj-me @Fridge003
Date: March 25, 2026
Subject: RFC: Transitioning DeepGEMM to TVM-FFI for Python-Agnostic Bindings

1. Executive Summary

This RFC proposes replacing the current Python-native bindings in DeepGEMM with TVM-FFI. The goal is to decouple the compiled library from specific Python minor versions, significantly reducing the maintenance overhead of releasing multiple wheel packages and simplifying integration within the SGLang runtime.

2. Problem Statement

Currently, DeepGEMM’s Python bindings create a strict dependency on the Python version used during compilation (e.g., a wheel built for Python 3.10 is incompatible with Python 3.12). This necessitates:

  • Redundant CI/CD pipelines to build wheels for every supported Python version.
  • Increased package distribution size and complexity.
  • Friction for SGLang users operating in diverse environments.

3. Proposed Solution: TVM-FFI Migration

By migrating to TVM-FFI, we can move toward a "compile once, run anywhere" model regarding Python versions. The FFI (Foreign Function Interface) approach allows the core C++/CUDA logic to be exposed via a stable C API.

3.1 Key Advantages

Binary Portability: A single binary can serve multiple Python versions (e.g., 3.10 through 3.13), as long as the Torch and CUDA versions match.

Reduced Package Footprint: Eliminates the need for version-specific wheels, streamlining the release process.

Proven Compatibility: Preliminary testing within the SGLang fork confirms that DeepGEMM compiled with Python 3.10 successfully operates in a Python 3.12 environment after the TVM-FFI migration.

4. Implementation Status

We have already initiated a proof-of-concept within the SGLang ecosystem. Reference implementation and initial testing can be found here:

5. Request for Feedback

We would like to coordinate with the DeepGEMM maintainers on the following:

  1. Does the DeepGEMM project accept the transition to TVM-FFI as the primary binding method?
  2. Are there specific architectural constraints or upstream requirements we should consider before submitting a formal PR?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions