-
Notifications
You must be signed in to change notification settings - Fork 20
E2E Tests and Documentation Update for Capacity Model #295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ev-shindin
wants to merge
9
commits into
llm-d-incubation:main
Choose a base branch
from
ev-shindin:capacity-tests
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,600
−15
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
78e28f2 to
0a7a1d1
Compare
Implements comprehensive e2e test suite for capacity-based autoscaling with backwards compatibility for controllers without MetricsAvailable condition. Changes: - Fix BeforeSuite to not skip tests in CAPACITY-ONLY mode - Add capacity configuration validation test (5 test cases) - Add capacity scale-up detection test (5 test cases) - Add capacity scale-down safety test (5 test cases) - Add comprehensive test documentation Test Results (pokprod cluster): - 10 capacity model tests passed ✅ - 7 proactive mode tests skipped (expected) - 0 failures ✅ Backwards Compatibility: - Tests validate OptimizationReady condition (present in all versions) - Removed MetricsAvailable dependency (not in deployed controller) - Tests work with both old and new controller versions Configuration: - Request rate: 15 req/s (conservative) - Number of prompts: 3000 - Total duration: ~28 minutes for all 3 test suites - Cluster: pokprod001 (OpenShift) Files: - Modified: test/e2e-openshift/e2e_suite_test.go - Added: test/e2e-openshift/capacity_*_test.go (3 files) - Added: test/e2e-openshift/CAPACITY_TESTS_README.md
Implements multi-phase load test validating complete autoscaling lifecycle with progressive scale-up, scale-down, and return to baseline. Test Phases (~33 minutes total): - Phase 1: Low load (10 req/s) - baseline behavior - Phase 2: Medium load (20 req/s) - scale-up trigger - Phase 3: High load (30 req/s) - further scale-up - Phase 4: Return to medium (20 req/s) - gradual scale-down - Phase 5: Cooldown (no load) - return to baseline Features: - Continuous monitoring every 20 seconds - Validates replica count ranges for each phase - Verifies OptimizationReady stays True throughout - Provides detailed lifecycle summary Total prompts: 12,000 Total duration: ~33 minutes
WHAT: - Simplified capacity-scaling-config.md to focus on EPP saturationDetector configuration, aligned with gateway-api-inference-extension reference - Added threshold alignment best practices section WHY: - Provide clear guidance on coordinating WVA and EPP threshold configuration - Focus on the three key saturation detector thresholds that matter for capacity-based scaling - Explain benefits of aligned thresholds for reduced request drops and consistent capacity assessment HOW: Best Practices section: - Added "Coordinating with InferenceScheduler (EPP)" section - Explained why threshold alignment matters (reduced drops, consistent capacity assessment, improved GPU utilization, faster response) - Provided side-by-side comparison of WVA and EPP configurations - Highlighted key threshold mappings: * kvCacheThreshold (WVA) ↔ kvCacheUtilThreshold (EPP) = 0.8 * queueLengthThreshold (WVA) ↔ queueDepthThreshold (EPP) = 5 EPP Configuration section: - Removed verbose EPP plugin and profile configuration sections - Added concise EPP Configuration Overview with three main sections: 1. Saturation Detector - Monitors cluster overload (relevant for WVA alignment) 2. Scheduling Plugins - Request routing logic 3. Scheduling Profiles - Weighted combinations of scoring plugins - Focused saturationDetector section on three key thresholds: * queueDepthThreshold (5) - Backend waiting queue size threshold * kvCacheUtilThreshold (0.8) - KV cache utilization threshold * metricsStalenessThreshold (200ms) - Maximum age for pod metrics - Added configuration notes: * All parameters are optional with documented defaults * EPP configuration is read only on startup (requires pod restart) * Unlike WVA, EPP does not currently support live ConfigMap updates VERIFICATION: - Documentation now accurately reflects upstream EPP saturation detector configuration from gateway-api-inference-extension project - Clear guidance on threshold alignment for coordinated WVA/EPP behavior Refs: https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/site-src/guides/epp-configuration/config-text.md
Update test parameters to generate sufficient load to exceed the 80% KV cache utilization threshold and trigger capacity-based scale-up: - e2e_suite_test.go: Increase default REQUEST_RATE from 20 to 50 req/s - capacity_lifecycle_test.go: Update load phases: - Phase 2 (medium): 20 → 50 req/s - Phase 3 (high): 30 → 70 req/s - Phase 4 (return): 20 → 50 req/s Previous tests with 20 req/s only reached 63.8% KV cache utilization, below the 80% threshold (kvCacheThreshold: 0.8 in capacity-scaling-config). The capacity analyzer correctly determined no scaling was needed. With these increased rates (50-70 req/s), tests should now exceed the threshold and trigger actual scale-up decisions, enabling validation of: - Scale-up behavior under load - LastRunTime updates when decisions are applied - Full lifecycle scaling (up and down)
WHAT: - Updated lifecycle test to track peak replicas during monitoring period instead of checking final replicas after system scales down WHY: - Test was failing because it checked final replicas (1) instead of peak (2), even though capacity model correctly scaled up during load (1→2) then back down after load completed (2→1) - Controller logs confirmed capacity model working correctly: * 70.3% KV cache triggered scale-up (1→2 replicas) * System properly scaled down after load (2→1 replicas) HOW: - Added peakReplicas tracking variable initialized to startReplicas - Updated monitoring loop to track peak: if currentReplicas > peakReplicas - Changed validation to check peakReplicas instead of finalReplicas - Updated success message format: "start → peak → final replicas" VERIFICATION: - Test code compiles successfully - Peak tracking will now correctly validate scale-up behavior - System behavior is correct; test now validates the right metric
- Remove unnecessary fmt.Sprintf wrapper - Remove ineffectual assignment to finalReplicas in monitoring loop
11e5d9b to
667fd6a
Compare
- Remove unused kubernetesLabelPattern variable and regexp import - Remove redundant nested if statement in hybrid mode logic - Remove empty else branch in mode selection logic All golangci-lint issues resolved.
667fd6a to
a7e1929
Compare
# Conflicts: # docs/capacity-scaling-config.md
- Remove explicit int32 type from peakReplicas declaration (type is inferred from startReplicas) - Fix indentation to match surrounding code Resolves staticcheck ST1023 warning.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive e2e test coverage for the capacity-based autoscaling model and enhances documentation to guide threshold coordination between WVA and InferenceScheduler components.
Changes
1. E2E Test Suite for Capacity Model
Implements a complete test suite validating capacity-based autoscaling functionality:
Test Files Added (4)
test/e2e-openshift/capacity_config_test.go(283 lines)Configuration validation smoke test - verifies ConfigMap, thresholds, and controller initialization
test/e2e-openshift/capacity_scaleup_test.go(287 lines)Scale-up detection test - validates capacity analyzer detects load and triggers scale-up
test/e2e-openshift/capacity_scaledown_test.go(234 lines)Scale-down safety test - ensures safe gradual scale-down with minimum replica guarantees
test/e2e-openshift/capacity_lifecycle_test.go(377 lines)Full lifecycle test - validates complete scaling cycle through 5 load phases:
Test Documentation
test/e2e-openshift/CAPACITY_TESTS_README.md(398 lines)Comprehensive test documentation with usage instructions, troubleshooting, and configuration examples
Infrastructure Fix
test/e2e-openshift/e2e_suite_test.go(modified)Fixed BeforeSuite logic that was incorrectly skipping all tests in CAPACITY-ONLY mode
2. Documentation Enhancement
docs/capacity-scaling-config.md(+65 lines)Added "Best Practices: Coordinating with InferenceScheduler" section with:
Why Threshold Alignment Matters (4 key benefits)
Configuration Details
capacity-scaling-config)Default Values (Aligned):
kvCacheThreshold: 0.80queueLengthThreshold: 5kvCacheUtilThreshold: 0.80queueDepthThreshold: 5Test Results
Executed successfully on pokprod001 OpenShift cluster:
Test Configuration:
Validated Functionality
OptimizationReadycondition accuracyBreaking Changes
None. All changes are additive (new tests and documentation).
Files Changed
Added (5 files)
test/e2e-openshift/capacity_config_test.gotest/e2e-openshift/capacity_scaleup_test.gotest/e2e-openshift/capacity_scaledown_test.gotest/e2e-openshift/capacity_lifecycle_test.gotest/e2e-openshift/CAPACITY_TESTS_README.mdModified (2 files)
test/e2e-openshift/e2e_suite_test.go(7 lines changed)docs/capacity-scaling-config.md(65 lines added)Statistics
How to Run Tests
All capacity model tests
Individual test suites
Dependencies
capacity-scaling-config