Skip to content

docs: update README for amd-flashinfer library consumers#239

Merged
demandal25 merged 20 commits into
ROCm:amd-integrationfrom
demandal25:update-readme
May 21, 2026
Merged

docs: update README for amd-flashinfer library consumers#239
demandal25 merged 20 commits into
ROCm:amd-integrationfrom
demandal25:update-readme

Conversation

@demandal25
Copy link
Copy Markdown
Collaborator

@demandal25 demandal25 commented May 21, 2026

Summary

Refresh the FlashInfer+ROCm README aimed at library consumers, refresh the Feature Support Matrix to match what has actually landed on amd-integration, and align the ROCm MLA wrapper with the rest of the ROCm backends so backend="auto" is accepted everywhere.

What changed

README.md

  • Intro and structure. Tighten the intro to call out HIP-in-repo kernels vs AITER dispatch up front; link to the Feature Support Matrix and AITER sections from the first paragraph. Cross-link CDNA3 / CDNA4 to AMD's official architecture whitepapers on first mention.
  • Feature Support Matrix. Replaced with a five-column table (Kernel / HIP / AITER / backend="auto" resolves to / Notes). New ✅ rows: Cascade (feat(hip): cascade attention support on ROCm using HIP #221), MLA via AITER (feat(hip): AITER backend for batch-ragged prefill, batch-paged decode, KV-cache append, MLA, and RMSNorm #232), RoPE (feat(hip): port RoPE to ROCm #223), paged KV-cache append, RMSNorm via AITER (feat(hip): AITER backend for batch-ragged prefill, batch-paged decode, KV-cache append, MLA, and RMSNorm #232), sliding-window decode on the AITER path (fix(rocm): correct AITER decode backend gaps — sliding window, CUDA graph, return_lse #234), activation, quantization, and opt-in torch.compile (Enable torch.compile under a flag #210). Every ✅ is backed by a tests/rocm_tests/test_*_hip.py. FP8 status is folded into per-row notes rather than a dedicated column.
  • GPU / ROCm / PyTorch. Consolidated into one section with arch codenames inline (gfx942 → MI300X/MI325X = CDNA3, gfx950 → MI355X = CDNA4). pip install torch uses --index-url instead of -f so pip cannot silently fall back to a CPU-only PyPI wheel (matches CLAUDE.md).
  • Getting Started. Collapsed the Docker image table to the latest validated tag and pointed at Docker Hub for older releases. Dropped the manual micromamba activate base step (the env is auto-activated). Used the concrete image tag plus a --name=flashinfer-rocm in the docker run snippet.
  • Trying the Examples. Simplified to point at examples/ plus one run command — no wget-based downloads.
  • Install from Source. Renamed from "Build from Source"; rewrote the ambiguous "Environment name varies …" note (and later removed it once the build / run blocks made the matching tag self-evident).
  • AITER Support. Collapsed the section intro to avoid re-listing conditions already in the matrix; cross-link Known Limitations. Rewrote Known Limitations preamble to state the two-group split (hard errors vs silently-ignored kwargs). Dropped the redundant Single Prefill Example (Basic Usage already shows the call pattern).
  • Environment Variables. New section documenting runtime env vars — FLASHINFER_USE_TORCH_CUSTOM_OPS, FLASHINFER_HIP_FUSED_CASCADE, FLASHINFER_LOGGING_LEVEL, FLASHINFER_DISABLE_JIT, ROCM_PATH / ROCM_HOME. Build-time vars stay in CLAUDE.md and are linked from here.
  • Runtime Helpers. Short snippet showing is_aiter_supported and check_torch_rocm_compatibility; calls out validate_flashinfer_rocm_arch as a build-time validator, not a runtime helper.
  • CPX-mode pytest notes. Split the dense paragraph into labelled bullets (Worker count / Reruns / slow marker / HIPBLAS retry).
  • Basic Usage. Moved to the end of the README as a closing example.
  • License and Acknowledgements. Added; the contributing reminder lives on its own line.

flashinfer/mla_rocm.py + tests/rocm_tests/test_mla_aiter_hip.py

  • Accept backend="auto" as an alias for "aiter" on the ROCm MLA wrapper (default is now "auto" to match every other ROCm wrapper). Previously the wrapper raised ValueError on anything other than "aiter", leaving MLA as the odd one out in the public API even though there is exactly one implementation to pick from on ROCm.
  • New tests: test_mla_backend_accepts_auto_and_aiter (parametrized over both values) and test_mla_backend_rejects_unsupported (confirms backend="fa2" still raises; runs without a GPU since the check fires before the AITER probe).

Test plan

  • pre-commit run -a passes.
  • pre-commit run markdownlint --files README.md passes after every change.
  • Every TOC entry resolves to an ## heading in the body.
  • Every ✅ in the Feature Support Matrix has a backing tests/rocm_tests/test_*_hip.py.
  • pytest tests/rocm_tests/test_mla_aiter_hip.py — 11 passed.
  • Render the README on the PR page and visually confirm tables, code blocks, and <details> sections look right.

🤖 Generated with Claude Code

demandal25 and others added 2 commits May 21, 2026 04:14
Reframe the top-level README for developers embedding FlashInfer+ROCm.
Add a minimal usage example, feature matrix with prefill backends (fa2,
aiter, fa3_cdna3), consolidated GPU/ROCm/PyTorch support, AITER page-size
constraints from prefill_rocm, notebook link, and a dedicated prefill
backends section. Remove verbose docker details blocks in favor of inline
context.

Made-with: Cursor
Restore the practical sections that the prior rewrite dropped (Docker
tag table, source-build instructions, CPX-mode pytest guidance, AITER
install recipes) and refresh the Feature Support Matrix to reflect
what has actually landed on amd-integration: Cascade, MLA (AITER),
RoPE, paged KV-cache append, RMSNorm/AITER, sliding-window decode,
torch.compile. Drop the stale fa3_cdna3 backend mention — it has no
Python dispatch entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 04:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refreshes the FlashInfer+ROCm consumer-facing README to better reflect current ROCm feature availability on amd-integration, provide a quick-start usage snippet, and consolidate support/install guidance.

Changes:

  • Reworked README introduction + added a “Basic Usage” snippet for library consumers.
  • Updated the Feature Support Matrix and clarified GPU/ROCm/PyTorch support + PyTorch install instructions (--index-url).
  • Expanded/updated AITER section (install options, limitations, and updated capability notes) and added a License/Acknowledgements section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread README.md Outdated
demandal25 and others added 2 commits May 21, 2026 12:52
Split the single "AITER backend" column into HIP and AITER columns
plus a new `backend="auto"` column that spells out the exact
conditions that auto-routes to AITER vs. HIP per kernel. MLA is
flagged as AITER-only (no HIP fallback); RMSNorm auto stays on HIP
even though AITER is available (opt-in only).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous matrix listed dtypes inconsistently — single-decode named
fp16/bf16/fp8 explicitly while sibling rows didn't. Drop the implicit
fp16/bf16 enumeration (already covered by the ✅ HIP marker) and call
out fp8 only where it's actually supported: batch decode KV-cache
(E4M3FNUZ), RoPE fused quant+append (E4M3FNUZ + E5M2FNUZ), paged
KV-cache append HIP path. Prefill rows mark fp8 as WIP.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 12:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
demandal25 and others added 2 commits May 21, 2026 13:47
…helpers

The matrix referenced a nonexistent FLASHINFER_ENABLE_TORCH_COMPILE; the
actual gate is FLASHINFER_USE_TORCH_CUSTOM_OPS=1 (must be set before
importing flashinfer, requires PyTorch >= 2.4). While here, add an
Environment Variables section covering the runtime knobs that aren't
already in CLAUDE.md (FLASHINFER_HIP_FUSED_CASCADE, FLASHINFER_LOGGING_LEVEL,
FLASHINFER_DISABLE_JIT, ROCM_PATH/ROCM_HOME) and a Runtime Helpers section
pointing at is_aiter_supported and check_torch_rocm_compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Keep only the current validated rocm/flashinfer image and point readers
at hub.docker.com/r/rocm/flashinfer/tags for older ROCm/PyTorch combos.
The full table goes stale on every release.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 13:50
demandal25 and others added 2 commits May 21, 2026 13:51
The base environment is activated on shell start inside the rocm/flashinfer
images, so the explicit `micromamba activate base` call was misleading.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace <container-name>/<docker-image-tag> placeholders with the
flashinfer-rocm container name and the actual latest image tag so the
snippet is copy-pasteable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread README.md Outdated
Comment thread README.md
demandal25 and others added 2 commits May 21, 2026 13:59
Replace the wget-based download steps with a brief pointer to the
examples/ directory and a single run command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The Basic Usage snippet at the top of the README already shows the same
call pattern; the AITER-section duplicate added no extra information.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 14:00
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread README.md
Comment thread README.md Outdated
Comment thread README.md Outdated
demandal25 and others added 2 commits May 21, 2026 15:25
Hyperlink the first mention of CDNA3 to the ROCm MI300 microarchitecture
docs and CDNA4 to AMD's MI350 product page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the GPU product / ROCm doc links with the AMD CDNA3 and CDNA4
architecture whitepapers — the right reference for the architectures
themselves rather than the cards.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 15:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread README.md
Comment thread README.md
Comment thread README.md
demandal25 and others added 2 commits May 21, 2026 15:37
- Drop the "AMD ROCm port" redundancy with the title and lead with what
  ships in-tree (the HIP kernel set) versus what dispatches to AITER.
- Cross-link the Feature Support Matrix and AITER from the first
  paragraph so readers landing on the README see the structure
  immediately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Install / Feature Matrix / Build / AITER are what a new reader needs
first; the code snippet reads better as a closing example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 15:40
- Tighten Feature Matrix preamble and Legend; drop the duplicate AITER
  link (already in the intro).
- Collapse the AITER Support intro that overlapped with the matrix;
  cross-link Known Limitations instead of re-listing the conditions.
- Rewrite Known Limitations preamble to call out the two-group split
  (hard errors vs. silently-ignored kwargs) more directly.
- Split the dense CPX-mode pytest notes into labelled bullets.
- Drop the unused validate_flashinfer_rocm_arch import from the runtime
  helpers snippet and note (separately) that it's a build-time
  validator, not a runtime helper.
- Move the pre-commit / pytest contributing reminder out of the License
  paragraph into its own line.
- Fix "Python tests suite" → "Python test suite".

Verified against the codebase: env var names + defaults
(FLASHINFER_USE_TORCH_CUSTOM_OPS, FLASHINFER_HIP_FUSED_CASCADE,
FLASHINFER_LOGGING_LEVEL, FLASHINFER_DISABLE_JIT, ROCM_PATH/ROCM_HOME),
hip_utils / aiter_utils helper signatures, attention_reference.py
path, and the MI308X CPX-mode reference (decode.cuh:707).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread README.md Outdated
Comment thread README.md
Comment thread README.md
demandal25 and others added 2 commits May 21, 2026 16:06
MLA on ROCm previously forced the user to pass backend="aiter"
explicitly: the wrapper's __init__ raised ValueError on anything other
than "aiter", including the auto value used by every other ROCm kernel.
That left MLA as the odd one out in the public API even though it has
exactly one implementation to choose from on ROCm.

Accept both "auto" and "aiter" (default is now "auto" to match the rest
of the ROCm wrappers); any other value still raises with an updated
message. The behaviour is unchanged for callers who already pass
"aiter".

### Test plan

- New parametrized test covering backend="auto" / "aiter" construction.
- New test that backend="fa2" still raises ValueError (runs anywhere,
  no GPU required since the check fires before the AITER probe).
- Full tests/rocm_tests/test_mla_aiter_hip.py — 11 passed.
- pre-commit run -a — passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The original "Environment name varies …" wording was ambiguous in
context (the surrounding section is about the Docker image tag, not a
shell or micromamba env). Rewrite to spell out that it's the Docker
image tag that encodes the versions, and that the -t tag and the tag
passed to docker run must match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 21, 2026 16:08
The build/run blocks already show the matching -t tag and the docker
run image tag side-by-side; the extra explanatory note added noise
without new information.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

README.md:352

  • This guidance suggests “pass backend=\"fa2\" explicitly” when AITER ignores kwargs, but several AITER-enabled non-attention APIs don’t accept "fa2" (e.g., append_paged_kv_cache / rmsnorm use "native"). Please adjust the wording to reference the correct non-AITER backend(s) depending on the API being discussed.

**Conditions that fall back to `fa2` under `backend="auto"`:**

* GPU is not gfx942 or gfx950
* `kv_layout` is not `NHD`

Comment thread README.md
Comment thread README.md
Comment thread README.md
Comment thread README.md
- Feature matrix: add `pos_encoding_mode="NONE"` to batch decode AITER
  auto-routing criteria; add gfx942/gfx950 arch gate to the
  `append_paged_kv_cache` row.
- AITER Support: clarify the in-tree backend strings per-op
  (`fa2` for attention wrappers vs `native` for `append_paged_kv_cache`
  / `rmsnorm`) and call out the two backend-specific quirks (`rmsnorm`
  auto stays on HIP, batch decode auto avoids CUDA-graph / tensor cores).
- Known Limitations: promote `pos_encoding_mode != "NONE"` and batch
  decode's `use_cuda_graph` / `use_tensor_cores` from the silently-ignored
  group to the hard-error / fallback group; the AITER attention paths
  reject them outright.
- Runtime Helpers: add the missing `import torch` to the snippet and
  correct the `is_aiter_supported` comment — the function only checks
  ROCm build + GPU arch, not whether the `aiter` Python package can
  actually be imported.
- CLAUDE.md: update the README anchor link to follow the renamed
  "GPU, ROCm, and PyTorch Support" section so cross-references stay
  live.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@demandal25 demandal25 changed the title docs: refresh README for amd-flashinfer library consumers docs: update README for amd-flashinfer library consumers May 21, 2026
@demandal25 demandal25 merged commit 31ea6a9 into ROCm:amd-integration May 21, 2026
1 check passed
@demandal25 demandal25 deleted the update-readme branch May 21, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants