Skip to content

Commit 3f474ce

Browse files
yunruisyufeiwu-nv
authored andcommitted
[None][doc] Paragraph adjustment and fix statistic (NVIDIA#8568)
Signed-off-by: yunruis <[email protected]> Signed-off-by: yufeiwu-nv <[email protected]>
1 parent 3295860 commit 3f474ce

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ Figure 3 provides comprehensive insight into baseline system behavior, displayin
232232

233233
**Critical Insights**:
234234
- **Imbalance window**: Most severe imbalances occur within the first 12,000 iterations, as evidenced by the average token distribution showing that all context processing phases occur within this critical interval
235-
- **Performance gap**: SOL TPS of 39,552 vs. actual TPS of 25,664 reveals a **35% efficiency loss**
235+
- **Performance gap**: SOL TPS of 39,552 vs. actual TPS of 25,664 reveals a **54% relative performance gap**
236236
- **System behavior**: After iteration 12,000, all requests transition to generation phase, naturally reducing imbalances
237237

238238
Figure 4 zooms into the critical imbalance period [100-12,000], revealing the dramatic instability in load distribution:
@@ -286,16 +286,6 @@ The complete ADP Balance strategy combines both context synchronization and batc
286286
- **Near-theoretical efficiency**: Actual TPS (34,140) approaches SOL TPS (37,912)
287287
- **System stability**: Dramatically reduced load variance across iterations
288288

289-
**Production Configuration**:
290-
Users can enable the full ADP Balance strategy by adding the following configuration:
291-
292-
```yaml
293-
attention_dp_config:
294-
enable_balance: true
295-
batching_wait_iters: 10
296-
timeout_iters: 50
297-
```
298-
299289
The effectiveness of our complete ADP Balance implementation is clearly demonstrated in Figure 6. The visualization reveals how the combination of context synchronization and batch equilibration mechanisms achieves near-optimal load balancing throughout the critical execution window.
300290

301291
<div align="center">
@@ -316,6 +306,16 @@ The effectiveness of our complete ADP Balance implementation is clearly demonstr
316306
- ⚠️ **Iteration overhead**: Waiting mechanisms increase total iteration count
317307
- ⚠️ **TTFT impact**: Strategic delays affect time-to-first-token metrics
318308

309+
**Production Configuration**:
310+
Users can enable the full ADP Balance strategy by adding the following configuration:
311+
312+
```yaml
313+
attention_dp_config:
314+
enable_balance: true
315+
batching_wait_iters: 10
316+
timeout_iters: 50
317+
```
318+
319319
### Pareto Analysis: Throughput-Latency Trade-off Optimization
320320
321321
Understanding the performance trade-offs inherent in our ADP Balance strategy is crucial for production deployment decisions. Figure 7 presents a comprehensive Pareto frontier analysis that maps the relationship between system throughput (TPS per GPU) and Time-To-First-Token (TTFT) across varying workload intensities and parameter configurations.

0 commit comments

Comments
 (0)