You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -232,7 +232,7 @@ Figure 3 provides comprehensive insight into baseline system behavior, displayin
232
232
233
233
**Critical Insights**:
234
234
-**Imbalance window**: Most severe imbalances occur within the first 12,000 iterations, as evidenced by the average token distribution showing that all context processing phases occur within this critical interval
235
-
-**Performance gap**: SOL TPS of 39,552 vs. actual TPS of 25,664 reveals a **35% efficiency loss**
235
+
-**Performance gap**: SOL TPS of 39,552 vs. actual TPS of 25,664 reveals a **54% relative performance gap**
236
236
-**System behavior**: After iteration 12,000, all requests transition to generation phase, naturally reducing imbalances
237
237
238
238
Figure 4 zooms into the critical imbalance period [100-12,000], revealing the dramatic instability in load distribution:
@@ -286,16 +286,6 @@ The complete ADP Balance strategy combines both context synchronization and batc
286
286
-**Near-theoretical efficiency**: Actual TPS (34,140) approaches SOL TPS (37,912)
287
287
-**System stability**: Dramatically reduced load variance across iterations
288
288
289
-
**Production Configuration**:
290
-
Users can enable the full ADP Balance strategy by adding the following configuration:
291
-
292
-
```yaml
293
-
attention_dp_config:
294
-
enable_balance: true
295
-
batching_wait_iters: 10
296
-
timeout_iters: 50
297
-
```
298
-
299
289
The effectiveness of our complete ADP Balance implementation is clearly demonstrated in Figure 6. The visualization reveals how the combination of context synchronization and batch equilibration mechanisms achieves near-optimal load balancing throughout the critical execution window.
300
290
301
291
<divalign="center">
@@ -316,6 +306,16 @@ The effectiveness of our complete ADP Balance implementation is clearly demonstr
316
306
- ⚠️ **Iteration overhead**: Waiting mechanisms increase total iteration count
Understanding the performance trade-offs inherent in our ADP Balance strategy is crucial for production deployment decisions. Figure 7 presents a comprehensive Pareto frontier analysis that maps the relationship between system throughput (TPS per GPU) and Time-To-First-Token (TTFT) across varying workload intensities and parameter configurations.
0 commit comments