Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump vllm from 0.6.1.post2 to 0.6.2 #44

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dependabot[bot]
Copy link

@dependabot dependabot bot commented on behalf of github Sep 30, 2024

Bumps vllm from 0.6.1.post2 to 0.6.2.

Release notes

Sourced from vllm's releases.

v0.6.2

Highlights

Model Support

  • Support Llama 3.2 models (#8811, #8822)

    vllm serve meta-llama/Llama-3.2-11B-Vision-Instruct --enforce-eager --max-num-seqs 16
    
  • Beam search have been soft deprecated. We are moving towards a version of beam search that's more performant and also simplifying vLLM's core. (#8684, #8763, #8713)

    • ⚠️ You will see the following error now, this is breaking change!

      Using beam search as a sampling parameter is deprecated, and will be removed in the future release. Please use the vllm.LLM.use_beam_search method for dedicated beam search instead, or set the environment variable VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 to suppress this error. For more details, see vllm-project/vllm#8306

  • Support for Solar Model (#8386), minicpm3 (#8297), LLaVA-Onevision model support (#8486)

  • Enhancements: pp for qwen2-vl (#8696), multiple images for qwen-vl (#8247), mistral function calling (#8515), bitsandbytes support for Gemma2 (#8338), tensor parallelism with bitsandbytes quantization (#8434)

Hardware Support

  • TPU: implement multi-step scheduling (#8489), use Ray for default distributed backend (#8389)
  • CPU: Enable mrope and support Qwen2-VL on CPU backend (#8770)
  • AMD: custom paged attention kernel for rocm (#8310), and fp8 kv cache support (#8577)

Production Engine

  • Initial support for priority sheduling (#5958)
  • Support Lora lineage and base model metadata management (#6315)
  • Batch inference for llm.chat() API (#8648)

Performance

  • Introduce MQLLMEngine for API Server, boost throughput 30% in single step and 7% in multistep (#8157, #8761, #8584)
  • Multi-step scheduling enhancements
    • Prompt logprobs support in Multi-step (#8199)
    • Add output streaming support to multi-step + async (#8335)
    • Add flashinfer backend (#7928)
  • Add cuda graph support during decoding for encoder-decoder models (#7631)

Others

  • Support sample from HF datasets and image input for benchmark_serving (#8495)
  • Progress in torch.compile integration (#8488, #8480, #8384, #8526, #8445)

What's Changed

... (truncated)

Commits
  • 7193774 [Misc] Support quantization of MllamaForCausalLM (#8822)
  • e2c6e0a [Doc] Update doc for Transformers 4.45 (#8817)
  • 770ec60 [Model] Add support for the multi-modal Llama 3.2 model (#8811)
  • 4f1ba08 Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#...
  • 873edda [Misc] Support FP8 MoE for compressed-tensors (#8588)
  • 64840df [Frontend] MQLLMEngine supports profiling. (#8761)
  • 28e1299 rename PromptInputs and inputs with backward compatibility (#8760)
  • 0c4d2ad [VLM][Bugfix] internvl with num_scheduler_steps > 1 (#8614)
  • c6f2485 [[Misc]] Add extra deps for openai server image (#8792)
  • 300da09 [Kernel] Fullgraph and opcheck tests (#8479)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [vllm](https://github.com/vllm-project/vllm) from 0.6.1.post2 to 0.6.2.
- [Release notes](https://github.com/vllm-project/vllm/releases)
- [Commits](vllm-project/vllm@v0.6.1.post2...v0.6.2)

---
updated-dependencies:
- dependency-name: vllm
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants