v0.6.4.post1+rocm
·
1405 commits
to main
since this release
What's Changed
- [BUGFIX] Corrected types for strides in triton FA (#274) by @maleksan85 in #276
- [OPT] improve rms_norm kernel by @kkHuang-amd in #258
- Cuda compile fix2 by @hliuca in #284
- use CK FA for glm-4v on navi3 by @jfactory07 in #281
- Disable custom all-reduce on two Navi GPUs by @hyoon1 in #287
- Base docker image by @gshtras in #290
- Added --output-json parameter in the P3l script. Using arg_utils to support all vllm args by @gshtras in #289
- devdocker README from https://github.com/powderluv/vllm-docs by @gshtras in #292
- Run clang-format on develop by @gshtras in #296
- Fix correctness regression (from PR#258) in Llama-3.2-90B-Vision-Instruct-FP8-KV test by @kkHuang-amd in #294
- Upstream merge 24/11/25 and 24/12/2 by @gshtras in #297
- Fix type hints for cython by @gshtras in #299
- fused_moe configs for MI325X by @JArnoldAMD in #300
- enable softcap and gemma2 by @hliuca in #288
- [vllm] Add support for FP8 in Triton FA kernel by @ilia-cher in #301
- Update test-template.j2 by @dhonnappa-amd in #283
- re-tune fp8 mixtral8x22B by @divakar-amd in #304
- rm old moe tune file. Add bash script for tuning reference by @divakar-amd in #305
- (temp workaround for Triton bug) by @ilia-cher in #306
- Always use 64 as the block size of moe_align kernel to avoid lds out of limit by @charlifu in #303
- Fix vllm_test_utils install. by @saienduri in #307
- Using ROCm6.3 release image as a base by @gshtras in #308
- Fix kernel cache miss and add RDNA configs by @hyoon1 in #246
- Update README.md by @t-parry in #309
- Fix max_seqlens_q/k initialization for Navi GPUs by @hyoon1 in #310
- Setting the value for the scpecilative decoding worker class on rocm platform by @gshtras in #313
- Upstream merge 24 12 09 by @gshtras in #314
- Triton version in the base docker by @gshtras in #315
- Navi docker by @gshtras in #316
- fix GemmTuner import in gradlib by @Rohan138 in #319
- Storing the installed commit hashes and customizations in a file by @gshtras in #320
- Option to override PYTORCH_ROCM_ARCH inherited from the base image by @gshtras in #321
- Update README.md by @t-parry in #322
- Disable auto enabling chunked prefill by @gshtras in #324
- Fix logging of the vLLM Config (vllm-project#11143) by @gshtras in #325
- Upstream merge 24 12 16 by @gshtras in #330
- Fix regression from #246 by @gshtras in #332
- Dynamic Scale Factor Calculations for Key/Value Scales With FP8 KV Caching by @micah-wil in #317
- Fixed the new condition for fp8 type by @gshtras in #333
- Mllama kv scale fix by @gshtras in #335
- Using the generic base image created by the vllm-ci pipeline by @gshtras in #336
- Properly initializing the new field in the attn metadata by @gshtras in #337
- Ingest FP8 attn scales and use them in ROCm FlashAttention by @mawong-amd in #338
- Library versions bump by @gshtras in #343
- Updated fused_moe configs for MI325X with Triton 3.2 by @JArnoldAMD in #345
- deepseek overflow fix by @Concurrensee in #349
New Contributors
- @hliuca made their first contribution in #284
- @jfactory07 made their first contribution in #281
- @hyoon1 made their first contribution in #287
- @ilia-cher made their first contribution in #301
- @t-parry made their first contribution in #309
- @micah-wil made their first contribution in #317
- @Concurrensee made their first contribution in #349
Full Changelog: v0.6.4+rocm...v0.6.4.post1+rocm