Skip to content

[Performance]: V1 engine runs slower than V0 on the MI300X #19692

Open
@mobicham

Description

@mobicham

Proposal to improve performance

I run a Llama3 8B inference benchmark on the MI300X with both V0 and V1 engines. I noticed that V1 is quite slower at decoding compared to V0. Normally, V1 is much faster than V0 on Nvidia.

One thing I noticed though is that, with V1, it doesn't print the Triton autotune output of the flash attn kernel, could be related to the attn implementation with V1.

Report of performance regression

Image

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

==============================
       PyTorch Info  
==============================
PyTorch version              : 2.8.0.dev20250615+rocm6.4
Is debug build               : False
CUDA used to build PyTorch   : N/A
ROCM used to build PyTorch   : 6.4.43482-0f2d60242

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : AMD Instinct MI300X (gfx942:sramecc+:xnack-)
Nvidia driver version        : Could not collect
cuDNN version                : Could not collect
HIP runtime version          : 6.4.43482
MIOpen runtime version       : 3.4.0
Is XNNPACK available         : True

==============================
         vLLM Info   
==============================
ROCM Version                 : 6.4.43483-a187df25c
Neuron SDK Version           : N/A
vLLM Version                 : 0.9.2.dev95+g26bc46ef8.d20250616 (git sha: 26bc46ef8, date: 20250616)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance-related issuesrocmRelated to AMD ROCm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions