What's Changed
🚀 Features
- [dlinfer] feat: add DlinferFlashAttention to support qwen vl. by @Reinerzhou in #2952
💥 Improvements
- refactor PyTorchEngine check env by @grimoire in #2870
- refine multi-backend setup.py by @jinminxi104 in #2880
- Refactor VLM modules by @lvhan028 in #2810
- [dlinfer] only compile the language model in vl models by @tangzhiyi11 in #2893
- Optimize tp broadcast by @grimoire in #2889
- unfeeze torch version in dockerfile by @RunningLeon in #2906
- support tp > n_kv_heads for pt engine by @RunningLeon in #2872
- replicate kv for some models when tp is divisble by kv_head_num by @irexyc in #2874
- Fallback to pytorch engine when the model is quantized by smooth quant by @lvhan028 in #2953
- Torchrun launching multiple api_server by @AllentDan in #2402
🐞 Bug fixes
- [Feature] Support for loading lora adapter weights in safetensors format by @Galaxy-Husky in #2860
- fix cpu cache by @grimoire in #2881
- Fix args type in docstring by @Galaxy-Husky in #2888
- Fix llama3.1 chat template by @fzyzcjy in #2862
- Fix typo by @ghntd in #2916
- fix: Incorrect stats size during inference of throughput benchmark when concurrency > num_prompts by @pancak3 in #2928
- fix lora name and rearange wqkv for internlm2 by @RunningLeon in #2912
- [dlinfer] fix moe op for dlinfer. by @Reinerzhou in #2917
- [side effect] fix vlm quant failed by @lvhan028 in #2914
- fix torch_dtype by @RunningLeon in #2933
- support unaligned qkv heads by @grimoire in #2930
- fix mllama inference without image by @RunningLeon in #2947
- Support torch_dtype modification and update FAQs for AWQ quantization by @AllentDan in #2898
- Fix exception handler for proxy server by @AllentDan in #2901
- Fix torch_dtype in lite by @AllentDan in #2956
- [side-effect] bring back quantization of qwen2-vl, glm4v and etc. by @lvhan028 in #2954
- add a thread pool executor to control the vl engine traffic by @lvhan028 in #2970
- [side-effect] fix gradio demo error by @lvhan028 in #2976
🌐 Other
- [dlinfer] fix engine checker by @tangzhiyi11 in #2891
- Bump version to v0.6.5 by @lvhan028 in #2955
New Contributors
- @Galaxy-Husky made their first contribution in #2860
- @fzyzcjy made their first contribution in #2862
- @ghntd made their first contribution in #2916
- @pancak3 made their first contribution in #2928
Full Changelog: v0.6.4...0.6.5