docs: update README for amd-flashinfer library consumers#239
Merged
Conversation
Reframe the top-level README for developers embedding FlashInfer+ROCm. Add a minimal usage example, feature matrix with prefill backends (fa2, aiter, fa3_cdna3), consolidated GPU/ROCm/PyTorch support, AITER page-size constraints from prefill_rocm, notebook link, and a dedicated prefill backends section. Remove verbose docker details blocks in favor of inline context. Made-with: Cursor
Restore the practical sections that the prior rewrite dropped (Docker tag table, source-build instructions, CPX-mode pytest guidance, AITER install recipes) and refresh the Feature Support Matrix to reflect what has actually landed on amd-integration: Cascade, MLA (AITER), RoPE, paged KV-cache append, RMSNorm/AITER, sliding-window decode, torch.compile. Drop the stale fa3_cdna3 backend mention — it has no Python dispatch entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Refreshes the FlashInfer+ROCm consumer-facing README to better reflect current ROCm feature availability on amd-integration, provide a quick-start usage snippet, and consolidate support/install guidance.
Changes:
- Reworked README introduction + added a “Basic Usage” snippet for library consumers.
- Updated the Feature Support Matrix and clarified GPU/ROCm/PyTorch support + PyTorch install instructions (
--index-url). - Expanded/updated AITER section (install options, limitations, and updated capability notes) and added a License/Acknowledgements section.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Split the single "AITER backend" column into HIP and AITER columns plus a new `backend="auto"` column that spells out the exact conditions that auto-routes to AITER vs. HIP per kernel. MLA is flagged as AITER-only (no HIP fallback); RMSNorm auto stays on HIP even though AITER is available (opt-in only). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous matrix listed dtypes inconsistently — single-decode named fp16/bf16/fp8 explicitly while sibling rows didn't. Drop the implicit fp16/bf16 enumeration (already covered by the ✅ HIP marker) and call out fp8 only where it's actually supported: batch decode KV-cache (E4M3FNUZ), RoPE fused quant+append (E4M3FNUZ + E5M2FNUZ), paged KV-cache append HIP path. Prefill rows mark fp8 as WIP. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…helpers The matrix referenced a nonexistent FLASHINFER_ENABLE_TORCH_COMPILE; the actual gate is FLASHINFER_USE_TORCH_CUSTOM_OPS=1 (must be set before importing flashinfer, requires PyTorch >= 2.4). While here, add an Environment Variables section covering the runtime knobs that aren't already in CLAUDE.md (FLASHINFER_HIP_FUSED_CASCADE, FLASHINFER_LOGGING_LEVEL, FLASHINFER_DISABLE_JIT, ROCM_PATH/ROCM_HOME) and a Runtime Helpers section pointing at is_aiter_supported and check_torch_rocm_compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep only the current validated rocm/flashinfer image and point readers at hub.docker.com/r/rocm/flashinfer/tags for older ROCm/PyTorch combos. The full table goes stale on every release. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The base environment is activated on shell start inside the rocm/flashinfer images, so the explicit `micromamba activate base` call was misleading. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace <container-name>/<docker-image-tag> placeholders with the flashinfer-rocm container name and the actual latest image tag so the snippet is copy-pasteable. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the wget-based download steps with a brief pointer to the examples/ directory and a single run command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Basic Usage snippet at the top of the README already shows the same call pattern; the AITER-section duplicate added no extra information. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Hyperlink the first mention of CDNA3 to the ROCm MI300 microarchitecture docs and CDNA4 to AMD's MI350 product page. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the GPU product / ROCm doc links with the AMD CDNA3 and CDNA4 architecture whitepapers — the right reference for the architectures themselves rather than the cards. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Drop the "AMD ROCm port" redundancy with the title and lead with what ships in-tree (the HIP kernel set) versus what dispatches to AITER. - Cross-link the Feature Support Matrix and AITER from the first paragraph so readers landing on the README see the structure immediately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Install / Feature Matrix / Build / AITER are what a new reader needs first; the code snippet reads better as a closing example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Tighten Feature Matrix preamble and Legend; drop the duplicate AITER link (already in the intro). - Collapse the AITER Support intro that overlapped with the matrix; cross-link Known Limitations instead of re-listing the conditions. - Rewrite Known Limitations preamble to call out the two-group split (hard errors vs. silently-ignored kwargs) more directly. - Split the dense CPX-mode pytest notes into labelled bullets. - Drop the unused validate_flashinfer_rocm_arch import from the runtime helpers snippet and note (separately) that it's a build-time validator, not a runtime helper. - Move the pre-commit / pytest contributing reminder out of the License paragraph into its own line. - Fix "Python tests suite" → "Python test suite". Verified against the codebase: env var names + defaults (FLASHINFER_USE_TORCH_CUSTOM_OPS, FLASHINFER_HIP_FUSED_CASCADE, FLASHINFER_LOGGING_LEVEL, FLASHINFER_DISABLE_JIT, ROCM_PATH/ROCM_HOME), hip_utils / aiter_utils helper signatures, attention_reference.py path, and the MI308X CPX-mode reference (decode.cuh:707). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MLA on ROCm previously forced the user to pass backend="aiter" explicitly: the wrapper's __init__ raised ValueError on anything other than "aiter", including the auto value used by every other ROCm kernel. That left MLA as the odd one out in the public API even though it has exactly one implementation to choose from on ROCm. Accept both "auto" and "aiter" (default is now "auto" to match the rest of the ROCm wrappers); any other value still raises with an updated message. The behaviour is unchanged for callers who already pass "aiter". ### Test plan - New parametrized test covering backend="auto" / "aiter" construction. - New test that backend="fa2" still raises ValueError (runs anywhere, no GPU required since the check fires before the AITER probe). - Full tests/rocm_tests/test_mla_aiter_hip.py — 11 passed. - pre-commit run -a — passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The original "Environment name varies …" wording was ambiguous in context (the surrounding section is about the Docker image tag, not a shell or micromamba env). Rewrite to spell out that it's the Docker image tag that encodes the versions, and that the -t tag and the tag passed to docker run must match. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The build/run blocks already show the matching -t tag and the docker run image tag side-by-side; the extra explanatory note added noise without new information. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
README.md:352
- This guidance suggests “pass
backend=\"fa2\"explicitly” when AITER ignores kwargs, but several AITER-enabled non-attention APIs don’t accept"fa2"(e.g.,append_paged_kv_cache/rmsnormuse"native"). Please adjust the wording to reference the correct non-AITER backend(s) depending on the API being discussed.
**Conditions that fall back to `fa2` under `backend="auto"`:**
* GPU is not gfx942 or gfx950
* `kv_layout` is not `NHD`
- Feature matrix: add `pos_encoding_mode="NONE"` to batch decode AITER auto-routing criteria; add gfx942/gfx950 arch gate to the `append_paged_kv_cache` row. - AITER Support: clarify the in-tree backend strings per-op (`fa2` for attention wrappers vs `native` for `append_paged_kv_cache` / `rmsnorm`) and call out the two backend-specific quirks (`rmsnorm` auto stays on HIP, batch decode auto avoids CUDA-graph / tensor cores). - Known Limitations: promote `pos_encoding_mode != "NONE"` and batch decode's `use_cuda_graph` / `use_tensor_cores` from the silently-ignored group to the hard-error / fallback group; the AITER attention paths reject them outright. - Runtime Helpers: add the missing `import torch` to the snippet and correct the `is_aiter_supported` comment — the function only checks ROCm build + GPU arch, not whether the `aiter` Python package can actually be imported. - CLAUDE.md: update the README anchor link to follow the renamed "GPU, ROCm, and PyTorch Support" section so cross-references stay live. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refresh the FlashInfer+ROCm README aimed at library consumers, refresh the Feature Support Matrix to match what has actually landed on
amd-integration, and align the ROCm MLA wrapper with the rest of the ROCm backends sobackend="auto"is accepted everywhere.What changed
README.mdbackend="auto"resolves to / Notes). New ✅ rows: Cascade (feat(hip): cascade attention support on ROCm using HIP #221), MLA via AITER (feat(hip): AITER backend for batch-ragged prefill, batch-paged decode, KV-cache append, MLA, and RMSNorm #232), RoPE (feat(hip): port RoPE to ROCm #223), paged KV-cache append, RMSNorm via AITER (feat(hip): AITER backend for batch-ragged prefill, batch-paged decode, KV-cache append, MLA, and RMSNorm #232), sliding-window decode on the AITER path (fix(rocm): correct AITER decode backend gaps — sliding window, CUDA graph, return_lse #234), activation, quantization, and opt-intorch.compile(Enable torch.compile under a flag #210). Every ✅ is backed by atests/rocm_tests/test_*_hip.py. FP8 status is folded into per-row notes rather than a dedicated column.pip install torchuses--index-urlinstead of-fso pip cannot silently fall back to a CPU-only PyPI wheel (matches CLAUDE.md).micromamba activate basestep (the env is auto-activated). Used the concrete image tag plus a--name=flashinfer-rocmin thedocker runsnippet.examples/plus one run command — no wget-based downloads.FLASHINFER_USE_TORCH_CUSTOM_OPS,FLASHINFER_HIP_FUSED_CASCADE,FLASHINFER_LOGGING_LEVEL,FLASHINFER_DISABLE_JIT,ROCM_PATH/ROCM_HOME. Build-time vars stay inCLAUDE.mdand are linked from here.is_aiter_supportedandcheck_torch_rocm_compatibility; calls outvalidate_flashinfer_rocm_archas a build-time validator, not a runtime helper.slowmarker / HIPBLAS retry).flashinfer/mla_rocm.py+tests/rocm_tests/test_mla_aiter_hip.pybackend="auto"as an alias for"aiter"on the ROCm MLA wrapper (default is now"auto"to match every other ROCm wrapper). Previously the wrapper raisedValueErroron anything other than"aiter", leaving MLA as the odd one out in the public API even though there is exactly one implementation to pick from on ROCm.test_mla_backend_accepts_auto_and_aiter(parametrized over both values) andtest_mla_backend_rejects_unsupported(confirmsbackend="fa2"still raises; runs without a GPU since the check fires before the AITER probe).Test plan
pre-commit run -apasses.pre-commit run markdownlint --files README.mdpasses after every change.##heading in the body.tests/rocm_tests/test_*_hip.py.pytest tests/rocm_tests/test_mla_aiter_hip.py— 11 passed.<details>sections look right.🤖 Generated with Claude Code