Bump vllm from 0.6.1.post2 to 0.6.2 #44

dependabot · 2024-09-30T02:17:11Z

Bumps vllm from 0.6.1.post2 to 0.6.2.

Release notes

v0.6.2

Highlights

Model Support
Support Llama 3.2 models (#8811, #8822)
vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16
Beam search have been soft deprecated. We are moving towards a version of beam search that's more performant and also simplifying vLLM's core. (#8684, #8763, #8713)

⚠️ You will see the following error now, this is breaking change!

Using beam search as a sampling parameter is deprecated, and will be removed in the future release. Please use the vllm.LLM.use_beam_search method for dedicated beam search instead, or set the environment variable VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 to suppress this error. For more details, see vllm-project/vllm#8306

Support for Solar Model (#8386), minicpm3 (#8297), LLaVA-Onevision model support (#8486)

Enhancements: pp for qwen2-vl (#8696), multiple images for qwen-vl (#8247), mistral function calling (#8515), bitsandbytes support for Gemma2 (#8338), tensor parallelism with bitsandbytes quantization (#8434)
Hardware Support

TPU: implement multi-step scheduling (#8489), use Ray for default distributed backend (#8389)

CPU: Enable mrope and support Qwen2-VL on CPU backend (#8770)

AMD: custom paged attention kernel for rocm (#8310), and fp8 kv cache support (#8577)

Production Engine

Initial support for priority sheduling (#5958)

Support Lora lineage and base model metadata management (#6315)

Batch inference for llm.chat() API (#8648)

Performance

Introduce MQLLMEngine for API Server, boost throughput 30% in single step and 7% in multistep (#8157, #8761, #8584)

Multi-step scheduling enhancements

Prompt logprobs support in Multi-step (#8199)

Add output streaming support to multi-step + async (#8335)

Add flashinfer backend (#7928)

Add cuda graph support during decoding for encoder-decoder models (#7631)

Others

Support sample from HF datasets and image input for benchmark_serving (#8495)

Progress in torch.compile integration (#8488, #8480, #8384, #8526, #8445)

What's Changed

[MISC] Dump model runner inputs when crashing by @comaniac in vllm-project/vllm#8305

[misc] remove engine_use_ray by @youkaichao in vllm-project/vllm#8126

[TPU] Use Ray for default distributed backend by @WoosukKwon in vllm-project/vllm#8389

Fix the AMD weight loading tests by @mgoin in vllm-project/vllm#8390

[Bugfix]: Fix the logic for deciding if tool parsing is used by @tomeras91 in vllm-project/vllm#8366

[Gemma2] add bitsandbytes support for Gemma2 by @blueyo0 in vllm-project/vllm#8338

[Misc] Raise error when using encoder/decoder model with cpu backend by @kevin314 in vllm-project/vllm#8355

[Misc] Use RoPE cache for MRoPE by @WoosukKwon in vllm-project/vllm#8396

[torch.compile] hide slicing under custom op for inductor by @youkaichao in vllm-project/vllm#8384

[Hotfix][VLM] Fixing max position embeddings for Pixtral by @ywang96 in vllm-project/vllm#8399

... (truncated)

Commits

7193774 [Misc] Support quantization of MllamaForCausalLM (#8822)
e2c6e0a [Doc] Update doc for Transformers 4.45 (#8817)
770ec60 [Model] Add support for the multi-modal Llama 3.2 model (#8811)
4f1ba08 Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#...
873edda [Misc] Support FP8 MoE for compressed-tensors (#8588)
64840df [Frontend] MQLLMEngine supports profiling. (#8761)
28e1299 rename PromptInputs and inputs with backward compatibility (#8760)
0c4d2ad [VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614)
c6f2485 [[Misc]] Add extra deps for openai server image (#8792)
300da09 [Kernel] Fullgraph and opcheck tests (#8479)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot merge will merge this PR after your CI passes on it
@dependabot squash and merge will squash and merge this PR after your CI passes on it
@dependabot cancel merge will cancel a previously requested merge and block automerging
@dependabot reopen will reopen this PR if it is closed
@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.6.1.post2 to 0.6.2. - [Release notes](https://github.com/vllm-project/vllm/releases) - [Commits](vllm-project/vllm@v0.6.1.post2...v0.6.2) --- updated-dependencies: - dependency-name: vllm dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]>

dependabot bot added the dependencies Pull requests that update a dependency file label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump vllm from 0.6.1.post2 to 0.6.2 #44

Bump vllm from 0.6.1.post2 to 0.6.2 #44

dependabot bot commented on behalf of github Sep 30, 2024

Bump vllm from 0.6.1.post2 to 0.6.2 #44

Are you sure you want to change the base?

Bump vllm from 0.6.1.post2 to 0.6.2 #44

Conversation

dependabot bot commented on behalf of github Sep 30, 2024

v0.6.2

Highlights

Model Support

Hardware Support

Production Engine

Performance

Others

What's Changed