Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

[ CI ] Upstream sync to v0.4.3 branch #377

Closed
wants to merge 77 commits into from

Conversation

robertgshaw2-redhat
Copy link
Collaborator

SUMMARY:

  • upstream sync to v0.4.3 of vllm
  • git cherry-pick f68470e803df575f294e67167b4b83adfe004cfa..1197e02141df1a7442f21ff6922c98ec0bba153e
  • vllm-project@f68470e
  • vllm-project@1197e02 (corresponds to upstream v0.4.3

robertgshaw2-redhat and others added 30 commits July 14, 2024 21:22
Allow dummy load format for fp8,
torch.uniform_ doesn't support FP8 at the moment

Co-authored-by: Mor Zusman <[email protected]>
Pass the CUDA stream into the CUTLASS GEMMs, to avoid future issues with CUDA graphs
…ct#4893)

The 2nd PR for vllm-project#4532.

This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
Co-authored-by: Varun Sundar Rabindranath <[email protected]>
youkaichao and others added 29 commits July 14, 2024 21:40
…#5112)

Co-authored-by: Alexey Kondratiev <[email protected]>
Co-authored-by: Alexei-V-Ivanov-AMD <[email protected]>
Co-authored-by: Alexei V. Ivanov <[email protected]>
Co-authored-by: omkarkakarparthi <okakarpa>
…e ::ordered_metadata modifier (introduced with PTX 8.5)" (vllm-project#5149)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.