v0.9.0
This release
- Adds suport for reranker models
- Adds support for vllm 0.10.1
- Adds extra debug options for tensor parallel operation
- Fixes a bug where VLLM_SPYRE_MAX_LOAD_PROCESSES did not work properly
What's Changed
- [GHA] 🐛 fix: Save HF models cache for all jobs by @yannicks1 in #400
- [GHA] 🎨 refactor test yaml by @yannicks1 in #401
- 🔥 remove FLEX_OVERWRITE_NMB_FRAME by @prashantgupta24 in #408
- [test] 🎨 fix test description string by @yannicks1 in #416
- [cb][test] fix scheduler constraint and add tests for batch x tkv limit by @yannicks1 in #417
- ⚡ Cache LLMs during tests by @joerunde in #396
- [CB][Tests] Reduce number of steps in scheduler steps tests by @sducouedic in #409
- 🎨 reword logs for loading model weights by @prashantgupta24 in #397
- 🔥 trim local envs not required anymore by @prashantgupta24 in #399
- 🎨 make hf_cache.json prettier by @joerunde in #422
- ⬆️ bump base image by @joerunde in #427
- ♻️ [tests] Full model testing by @prashantgupta24 in #428
- 🎨 add info about DT_DEEPRT_VERBOSE by @prashantgupta24 in #430
- 🐛 fixup compilation wrapper by @joerunde in #431
- 🔨 Add debug log redirection option by @joerunde in #429
- [doc] 👨🎨 Adding drawings explaining optimizations by @yannicks1 in #426
- [cb][test] add tests for volumetric constraint with prefill optimization by @yannicks1 in #425
- 🐛 solve undetected merge conflict with main by @yannicks1 in #432
- Add reranker support by @maxdebayser in #403
- 🎨 print relative tolerance diff in tests by @prashantgupta24 in #438
- Bump vllm to v0.10.1 and add compatibility code by @maxdebayser in #443
- fix VLLM_SPYRE_MAX_LOAD_PROCESSES to int instead of bool by @jberkhahn in #444
Full Changelog: v0.8.0...v0.9.0