[wip] feat: Add backend='auto' to mm_fp4 and enable autotune for backend='cudnn' #1979
+456
−284
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📌 Description
DRAFT. Please do not merge.
Current PR:
autobackend tomm_fp4that can be autotuned. It replacescudnnas the default.cudnnbackend to be autotuned.Behavior of
autobackend:cutlassorcudnnkernel backends.trtllmkernel is not considered due to a non-interchangeable interface between trtllm and (cutlass, cudnn) backend.autobackend therefore only supports inputs runnable bycutlassand/orcudnn**use_nvfp4=False,cutlasswill be removed from the backend list as it fails support check.backend='trtllm'orbackend='cutlass'orbackend='cudnn'--> Autotunes within the backend. Same as previous behavior, but now autotuning is supported for cudnn.backend='auto'--> Autotunes within and across backends (cudnn & cutlass) and chooses the best config of best backend.trtllmkernel is not considered due to a non-interchangeable interface between trtllm and (cutlass, cudnn) backend.mm_fp4were refactored to enable cross-backend autotuning. Refactoring was done to match cross-backend autotune-enabledbmm_fp8as a reference.🔍 Related Issues
#1722
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes