Skip to content

llm-scaler-vllm PV release 1.3

Latest

Choose a tag to compare

@liu-shaojun liu-shaojun released this 30 Jan 06:59
6a86b24

Highlights

Resources

What’s new

  • vLLM:

    • Upgrade: vLLM upgrade to 0.11.1, Pytorch upgrade to 2.9. oneAPI upgrade to 2025.2.2(hotfix), oneccl upgrade to 2021.15.7.6.
    • 8 New models supported: Qwen3-Next-80B-A3B-Instruct, Qwen3-Next-80B-A3B-Thinking, InternVL3.5-30B-A3B, DeepSeek-OCR,PaddleOCR-VL, Seed-OSS-36B-Instruct, Qwen3-30B-A3B-Instruct-2507 and openai/whisper-large-v3.
    • Key bug fixes for timeout/accuracy issues found in long time stress run.
    • Key bug fixes communication accuracy issue on long run scenarios. Sub-communicator hang issue on oneCCL side.
    • vLLM 0.11.1 with new features: cpu kv cache offload, speculative decoding support with 2 more methods(medusa, suffix), experimental feature:FP8 kv cache, Experts parallelism is supported with scenarios TP+EP and DP+EP.
    • Bug fixes.
    • Supported sym_int4 for Qwen3-30B-A3B on TP 4/8.
    • Supported sym_int4 for Qwen3-235B-A22B on TP 16.
    • Added support for the PaddleOCR model.
    • Added support for GLM-4.6v-Flash.
    • Fixed crash errors with 2DP + 4TP configuration.
    • Fixed abnormal output observed during JMeter stress testing.
    • Fixed UR_ERROR_DEVICE_LOST errors triggered by frequent preemption under high load.
    • Fixed output errors for InternVL-38B.
    • Refine logic for profile_run to provide more GPU blocks by default