Skip to content

Commit a54e3aa

Browse files
committed
Acknowledgements update
Signed-off-by: PinSiang <[email protected]>
1 parent f75d39d commit a54e3aa

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

_posts/2025-10-26-zero_reload_model_switching_with_vllm_sleep_mode.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ Even with instant weight loading, every cold start pays hidden costs that Sleep
3737
| 4. GPU kernel JIT compilation | DeepGEMM, FlashInfer, TorchInductor | ❌ Every time | ✅ Preserved (after initial warmup) |
3838
| 5. Cache warm-up | First-request overhead | ❌ Every time | ⚡ Quick re-warm |
3939

40-
By keeping the process alive, Sleep Mode preserves infrastructure (#2-3) and avoids expensive reinitialization. This is why benchmarks show **Sleep Mode inference is 61-88% faster** than cold starts.
40+
By keeping the process alive, Sleep Mode preserves infrastructure (#2-4) and avoids expensive reinitialization. This is why benchmarks show **Sleep Mode inference is 61-88% faster** than cold starts.
4141

4242
**This post covers:**
4343
- Comprehensive benchmarks across model sizes (0.6B to 235B) and GPUs (A4000 to A100)
@@ -465,4 +465,4 @@ The future of LLM serving is multi-model. Sleep Mode makes it practical today.
465465

466466
## Acknowledgements
467467

468-
Special thanks to **Vensen Mu**, **Jeff Aw**, **Jun Kang Chow**, **Tun Jian Tan**, **Pin Siang Tan**, **Amir Balwel**, **Ye Hur Cheong** and **Zhiyao Cen**, **Kaichao You** for developing the Sleep Mode feature and inspiring this blog post.
468+
Special thanks to **Vensen Mu**, **Jeff Aw**, **Jun Kang Chow**, **Tun Jian Tan**, **Pin Siang Tan**, **Amir Balwel**, and **Ye Hur Cheong** for writing this blog post, and to **Zhiyao Cen** and **Kaichao You** for developing the Sleep Mode feature and inspire the blog.

0 commit comments

Comments
 (0)