Release v0.6.4.post1+rocm · ROCm/vllm

What's Changed

[BUGFIX] Corrected types for strides in triton FA (#274) by @maleksan85 in #276
[OPT] improve rms_norm kernel by @kkHuang-amd in #258
Cuda compile fix2 by @hliuca in #284
use CK FA for glm-4v on navi3 by @jfactory07 in #281
Disable custom all-reduce on two Navi GPUs by @hyoon1 in #287
Base docker image by @gshtras in #290
Added --output-json parameter in the P3l script. Using arg_utils to support all vllm args by @gshtras in #289
devdocker README from https://github.com/powderluv/vllm-docs by @gshtras in #292
Run clang-format on develop by @gshtras in #296
Fix correctness regression (from PR#258) in Llama-3.2-90B-Vision-Instruct-FP8-KV test by @kkHuang-amd in #294
Upstream merge 24/11/25 and 24/12/2 by @gshtras in #297
Fix type hints for cython by @gshtras in #299
fused_moe configs for MI325X by @JArnoldAMD in #300
enable softcap and gemma2 by @hliuca in #288
[vllm] Add support for FP8 in Triton FA kernel by @ilia-cher in #301
Update test-template.j2 by @dhonnappa-amd in #283
re-tune fp8 mixtral8x22B by @divakar-amd in #304
rm old moe tune file. Add bash script for tuning reference by @divakar-amd in #305
(temp workaround for Triton bug) by @ilia-cher in #306
Always use 64 as the block size of moe_align kernel to avoid lds out of limit by @charlifu in #303
Fix vllm_test_utils install. by @saienduri in #307
Using ROCm6.3 release image as a base by @gshtras in #308
Fix kernel cache miss and add RDNA configs by @hyoon1 in #246
Update README.md by @t-parry in #309
Fix max_seqlens_q/k initialization for Navi GPUs by @hyoon1 in #310
Setting the value for the scpecilative decoding worker class on rocm platform by @gshtras in #313
Upstream merge 24 12 09 by @gshtras in #314
Triton version in the base docker by @gshtras in #315
Navi docker by @gshtras in #316
fix GemmTuner import in gradlib by @Rohan138 in #319
Storing the installed commit hashes and customizations in a file by @gshtras in #320
Option to override PYTORCH_ROCM_ARCH inherited from the base image by @gshtras in #321
Update README.md by @t-parry in #322
Disable auto enabling chunked prefill by @gshtras in #324
Fix logging of the vLLM Config (vllm-project#11143) by @gshtras in #325
Upstream merge 24 12 16 by @gshtras in #330
Fix regression from #246 by @gshtras in #332
Dynamic Scale Factor Calculations for Key/Value Scales With FP8 KV Caching by @micah-wil in #317
Fixed the new condition for fp8 type by @gshtras in #333
Mllama kv scale fix by @gshtras in #335
Using the generic base image created by the vllm-ci pipeline by @gshtras in #336
Properly initializing the new field in the attn metadata by @gshtras in #337
Ingest FP8 attn scales and use them in ROCm FlashAttention by @mawong-amd in #338
Library versions bump by @gshtras in #343
Updated fused_moe configs for MI325X with Triton 3.2 by @JArnoldAMD in #345
deepseek overflow fix by @Concurrensee in #349

New Contributors

@hliuca made their first contribution in #284
@jfactory07 made their first contribution in #281
@hyoon1 made their first contribution in #287
@ilia-cher made their first contribution in #301
@t-parry made their first contribution in #309
@micah-wil made their first contribution in #317
@Concurrensee made their first contribution in #349

Full Changelog: v0.6.4+rocm...v0.6.4.post1+rocm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.6.4.post1+rocm

What's Changed

New Contributors

Contributors