Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions docs/source/blogs/tech_blog/blog10_ADP_Balance_Strategy.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ Figure 3 provides comprehensive insight into baseline system behavior, displayin

**Critical Insights**:
- **Imbalance window**: Most severe imbalances occur within the first 12,000 iterations, as evidenced by the average token distribution showing that all context processing phases occur within this critical interval
- **Performance gap**: SOL TPS of 39,552 vs. actual TPS of 25,664 reveals a **35% efficiency loss**
- **Performance gap**: SOL TPS of 39,552 vs. actual TPS of 25,664 reveals a **54% relative performance gap**
- **System behavior**: After iteration 12,000, all requests transition to generation phase, naturally reducing imbalances

Figure 4 zooms into the critical imbalance period [100-12,000], revealing the dramatic instability in load distribution:
Expand Down Expand Up @@ -286,16 +286,6 @@ The complete ADP Balance strategy combines both context synchronization and batc
- **Near-theoretical efficiency**: Actual TPS (34,140) approaches SOL TPS (37,912)
- **System stability**: Dramatically reduced load variance across iterations

**Production Configuration**:
Users can enable the full ADP Balance strategy by adding the following configuration:

```yaml
attention_dp_config:
enable_balance: true
batching_wait_iters: 10
timeout_iters: 50
```

The effectiveness of our complete ADP Balance implementation is clearly demonstrated in Figure 6. The visualization reveals how the combination of context synchronization and batch equilibration mechanisms achieves near-optimal load balancing throughout the critical execution window.

<div align="center">
Expand All @@ -316,6 +306,16 @@ The effectiveness of our complete ADP Balance implementation is clearly demonstr
- ⚠️ **Iteration overhead**: Waiting mechanisms increase total iteration count
- ⚠️ **TTFT impact**: Strategic delays affect time-to-first-token metrics

**Production Configuration**:
Users can enable the full ADP Balance strategy by adding the following configuration:

```yaml
attention_dp_config:
enable_balance: true
batching_wait_iters: 10
timeout_iters: 50
```

### Pareto Analysis: Throughput-Latency Trade-off Optimization

Understanding the performance trade-offs inherent in our ADP Balance strategy is crucial for production deployment decisions. Figure 7 presents a comprehensive Pareto frontier analysis that maps the relationship between system throughput (TPS per GPU) and Time-To-First-Token (TTFT) across varying workload intensities and parameter configurations.
Expand Down