Skip to content

Conversation

@bkryu
Copy link
Collaborator

@bkryu bkryu commented Oct 25, 2025

📌 Description

DRAFT. Please do not merge.

Current PR:

  • Introduces an auto backend to mm_fp4 that can be autotuned. It replaces cudnn as the default.
  • Allows cudnn backend to be autotuned.

Behavior of auto backend:

  • Examines CUDA version & cuDNN version and calls either cutlass or cudnn kernel backends.
    • trtllm kernel is not considered due to a non-interchangeable interface between trtllm and (cutlass, cudnn) backend.
    • ** auto backend therefore only supports inputs runnable by cutlass and/or cudnn**
  • Non-autotuned behavior:
    • Constructs an ordered list of backends (cudnn, cutlass) or (cutlass, cudnn) where ordering is based on previous microbenchmark study results.
      • If CUDA 12 --> cutlass comes to front.
      • If CUDA 13 and cuDNN version < 9.14 --> cutlass comes front
      • If CUDA 13 and cuDNN version >= 9.14 --> cudnn comes front
    • If kernel is not available from a support check, it is removed from the list.
      • For example, if use_nvfp4=False, cutlass will be removed from the backend list as it fails support check.
  • Autotune behavior:
    • If backend='trtllm' or backend='cutlass' or backend='cudnn' --> Autotunes within the backend. Same as previous behavior, but now autotuning is supported for cudnn.
    • If backend='auto' --> Autotunes within and across backends (cudnn & cutlass) and chooses the best config of best backend.
      • trtllm kernel is not considered due to a non-interchangeable interface between trtllm and (cutlass, cudnn) backend.
  • Note: A lot of helper functions to mm_fp4 were refactored to enable cross-backend autotuning. Refactoring was done to match cross-backend autotune-enabled bmm_fp8 as a reference.

🔍 Related Issues

#1722

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 25, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@bkryu bkryu changed the title feat: Add backend='auto' to mm_fp4 feat: Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn' Oct 25, 2025
@bkryu bkryu changed the title feat: Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn' feat: [DRAFT] Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn' Oct 25, 2025
@bkryu bkryu changed the title feat: [DRAFT] Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn' [wip] feat: Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn' Oct 25, 2025
@bkryu bkryu self-assigned this Oct 27, 2025
@bkryu bkryu force-pushed the mm_fp4_auto_backend branch from d69eb48 to 8d55564 Compare October 28, 2025 01:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant