Skip to content

Commit 6d31a59

Browse files
committed
Rename blog post and improve documentation clarity
- Rename blog post file to shorter name (sleep-mode.md) - Clarify security warning about dev mode requirement - Improve plot description to explain A→B→A→B switching pattern - Update sleepmode.png image Signed-off-by: PinSiang <[email protected]>
1 parent 12bdd34 commit 6d31a59

File tree

2 files changed

+5
-2
lines changed

2 files changed

+5
-2
lines changed
Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -96,15 +96,18 @@ curl -X POST 'localhost:8002/wake_up'
9696
> For Level 2 sleep, you must call `reload_weights` and `reset_prefix_cache` after waking. Level 1 sleep doesn't require these extra steps.
9797
9898
> [!WARNING]
99-
> **Security:** The `/sleep`, `/wake_up`, `/collective_rpc`, and `/reset_prefix_cache` endpoints should only be exposed on trusted networks or behind your proxy with authentication. These are administrative endpoints that can disrupt service if accessed by unauthorized users.
99+
> **Security:** The `/sleep`, `/wake_up`, `/collective_rpc`, and `/reset_prefix_cache` endpoints require `VLLM_SERVER_DEV_MODE=1` and should only be exposed in trusted networks. These administrative endpoints can disrupt service and are intended for closed environments like training clusters or backend applications.
100100
101101
## Performance Overview
102102

103103
Let's see how Sleep Mode performs compared to traditional model reloading.
104104

105105
### Sleep Mode L1 vs No Sleep Mode Performance
106106

107-
The interactive chart below compares the performance of vLLM with and without Sleep Mode enabled. With Level 1 Sleep Mode, models can be put to sleep and woken up without the costly reload overhead, enabling efficient GPU sharing between models.
107+
The interactive chart below shows the **total time to perform 5 model switches**: running inference on Model A, switching to Model B, running inference on Model B, then repeating this pattern (A→B→A→B→A→B).
108+
109+
**With Sleep Mode:** Models sleep/wake between switches, preserving infrastructure.
110+
**Without Sleep Mode:** Each switch requires a full vLLM restart and reload.
108111

109112
<div style="margin: 2rem 0;">
110113
<script src="https://cdn.plot.ly/plotly-2.32.0.min.js"></script>
999 KB
Loading

0 commit comments

Comments
 (0)