Release v0.9.0 · vllm-project/vllm-spyre

This release

What's Changed

[GHA] 🐛 fix: Save HF models cache for all jobs by @yannicks1 in #400
[GHA] 🎨 refactor test yaml by @yannicks1 in #401
🔥 remove FLEX_OVERWRITE_NMB_FRAME by @prashantgupta24 in #408
[test] 🎨 fix test description string by @yannicks1 in #416
[cb][test] fix scheduler constraint and add tests for batch x tkv limit by @yannicks1 in #417
⚡ Cache LLMs during tests by @joerunde in #396
[CB][Tests] Reduce number of steps in scheduler steps tests by @sducouedic in #409
🎨 reword logs for loading model weights by @prashantgupta24 in #397
🔥 trim local envs not required anymore by @prashantgupta24 in #399
🎨 make hf_cache.json prettier by @joerunde in #422
⬆️ bump base image by @joerunde in #427
♻️ [tests] Full model testing by @prashantgupta24 in #428
🎨 add info about DT_DEEPRT_VERBOSE by @prashantgupta24 in #430
🐛 fixup compilation wrapper by @joerunde in #431
🔨 Add debug log redirection option by @joerunde in #429
[doc] 👨‍🎨 Adding drawings explaining optimizations by @yannicks1 in #426
[cb][test] add tests for volumetric constraint with prefill optimization by @yannicks1 in #425
🐛 solve undetected merge conflict with main by @yannicks1 in #432
Add reranker support by @maxdebayser in #403
🎨 print relative tolerance diff in tests by @prashantgupta24 in #438
Bump vllm to v0.10.1 and add compatibility code by @maxdebayser in #443
fix VLLM_SPYRE_MAX_LOAD_PROCESSES to int instead of bool by @jberkhahn in #444

Full Changelog: v0.8.0...v0.9.0