|
| 1 | +# Justified Ensure* Usage in E2E Tests |
| 2 | + |
| 3 | +This document catalogs the remaining `Ensure*` function usage in the e2e test suite and provides justification for each case. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +As part of the test isolation refactoring effort, we've established that `Create*` functions with unique resource names should be preferred over `Ensure*` functions. However, certain scenarios justify the use of `Ensure*` semantics. |
| 8 | + |
| 9 | +## Justification Criteria |
| 10 | + |
| 11 | +`Ensure*` usage is justified when: |
| 12 | + |
| 13 | +1. **Testing Recovery Behavior**: The test explicitly validates controller recovery when resources are deleted/recreated |
| 14 | +2. **Shared Test Infrastructure**: Resources are truly shared across all tests (e.g., namespace-level configuration) |
| 15 | +3. **Development/Debugging**: Temporary usage during development with clear TODO comments |
| 16 | +4. **Idempotent Retries**: Test framework retries require idempotent setup |
| 17 | + |
| 18 | +## Current Justified Usage |
| 19 | + |
| 20 | +### 1. Error Recovery Tests (smoke_test.go) |
| 21 | + |
| 22 | +**Location**: `test/e2e/smoke_test.go:1068-1070` |
| 23 | + |
| 24 | +```go |
| 25 | +// JUSTIFIED: Testing controller recovery behavior when deployment is recreated |
| 26 | +By("Recreating the deployment") |
| 27 | +err = fixtures.EnsureModelService(ctx, k8sClient, cfg.LLMDNamespace, |
| 28 | + errorTestModelServiceName, errorTestPoolName, cfg.ModelID, |
| 29 | + cfg.UseSimulator, cfg.MaxNumSeqs) |
| 30 | +``` |
| 31 | + |
| 32 | +**Justification**: This test specifically validates that the WVA controller can recover when a deployment is deleted and recreated. The `Ensure*` pattern is intentional here to simulate real-world scenarios where deployments might be recreated by operators or other controllers. |
| 33 | + |
| 34 | +**Alternative Considered**: Using `Delete` + `Create` explicitly, but `Ensure*` better represents the real-world scenario being tested. |
| 35 | + |
| 36 | +**Status**: ✅ Justified - Keep with documentation |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +### 2. Remaining Unjustified Usage |
| 41 | + |
| 42 | +The following `Ensure*` calls should be migrated to `Create*` with unique names: |
| 43 | + |
| 44 | +#### smoke_test.go |
| 45 | + |
| 46 | +| Line | Function | Context | Migration Priority | |
| 47 | +|------|----------|---------|-------------------| |
| 48 | +| 135 | `EnsureModelService` | Basic VA lifecycle setup | High | |
| 49 | +| 151 | `EnsureService` | Basic VA lifecycle setup | High | |
| 50 | +| 168 | `EnsureServiceMonitor` | Basic VA lifecycle setup | High | |
| 51 | +| 197 | `EnsureVariantAutoscalingWithDefaults` | Basic VA lifecycle setup | High | |
| 52 | +| 210 | `EnsureScaledObject` | Basic VA lifecycle setup | High | |
| 53 | +| 213 | `EnsureHPA` | Basic VA lifecycle setup | High | |
| 54 | +| 590 | `EnsureBurstLoadJob` | Scale-up test | High | |
| 55 | +| 974 | `EnsureModelService` | Error test setup | Medium | |
| 56 | +| 997 | `EnsureVariantAutoscalingWithDefaults` | Error test setup | Medium | |
| 57 | + |
| 58 | +**Migration Plan**: |
| 59 | +- Use `utils.NewTestResourceNames("smoke", "basic")` for basic lifecycle tests |
| 60 | +- Use `utils.NewTestResourceNames("smoke", "scaleup")` for scale-up tests |
| 61 | +- Use `utils.NewTestResourceNames("smoke", "error")` for error handling tests |
| 62 | + |
| 63 | +#### saturation_test.go |
| 64 | + |
| 65 | +| Line | Function | Context | Migration Priority | |
| 66 | +|------|----------|---------|-------------------| |
| 67 | +| 42 | `EnsureModelService` | Single variant test | High | |
| 68 | +| 58 | `EnsureService` | Single variant test | High | |
| 69 | +| 74 | `EnsureServiceMonitor` | Single variant test | High | |
| 70 | +| 129 | `EnsureVariantAutoscaling` | Single variant test | High | |
| 71 | +| 145 | `EnsureScaledObject` | Single variant test | High | |
| 72 | +| 148 | `EnsureHPA` | Single variant test | High | |
| 73 | +| 341-349 | Multiple `Ensure*` | Multi-variant test A | High | |
| 74 | +| 352-360 | Multiple `Ensure*` | Multi-variant test B | High | |
| 75 | +| 375-380 | `EnsureVariantAutoscaling` | Multi-variant VAs | High | |
| 76 | +| 387-395 | `EnsureScaledObject`/`EnsureHPA` | Multi-variant scalers | High | |
| 77 | +| 451-456 | `EnsureBurstLoadJob` | Load generation | Medium | |
| 78 | + |
| 79 | +**Migration Plan**: |
| 80 | +- Use `utils.NewTestResourceNames("saturation", "single")` for single variant |
| 81 | +- Use `utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")` and `"b"` for multi-variant |
| 82 | + |
| 83 | +#### scale_to_zero_test.go |
| 84 | + |
| 85 | +| Line | Function | Context | Migration Priority | |
| 86 | +|------|----------|---------|-------------------| |
| 87 | +| 47 | `EnsureModelService` | Scale-to-zero test | High | |
| 88 | +| 63 | `EnsureService` | Scale-to-zero test | High | |
| 89 | +| 79 | `EnsureServiceMonitor` | Scale-to-zero test | High | |
| 90 | +| 108 | `EnsureVariantAutoscaling` | Scale-to-zero test | High | |
| 91 | +| 119 | `EnsureScaledObject` | Scale-to-zero test | High | |
| 92 | +| 122 | `EnsureHPA` | Scale-to-zero test | High | |
| 93 | +| 352 | `EnsureModelService` | Disabled test | Medium | |
| 94 | +| 357 | `EnsureService` | Disabled test | Medium | |
| 95 | +| 362 | `EnsureServiceMonitor` | Disabled test | Medium | |
| 96 | +| 375 | `EnsureVariantAutoscaling` | Disabled test | Medium | |
| 97 | +| 385 | `EnsureScaledObject` | Disabled test | Medium | |
| 98 | +| 388 | `EnsureHPA` | Disabled test | Medium | |
| 99 | + |
| 100 | +**Migration Plan**: |
| 101 | +- Use `utils.NewTestResourceNames("scale-to-zero", "idle")` for main test |
| 102 | +- Use `utils.NewTestResourceNames("scale-to-zero", "disabled")` for disabled test |
| 103 | + |
| 104 | +#### scale_from_zero_test.go |
| 105 | + |
| 106 | +| Line | Function | Context | Migration Priority | |
| 107 | +|------|----------|---------|-------------------| |
| 108 | +| 94 | `EnsureModelService` | Scale-from-zero test | High | |
| 109 | +| 108 | `EnsureService` | Scale-from-zero test | High | |
| 110 | +| 112 | `EnsureServiceMonitor` | Scale-from-zero test | High | |
| 111 | +| 141 | `EnsureVariantAutoscaling` | Scale-from-zero test | High | |
| 112 | +| 152 | `EnsureScaledObject` | Scale-from-zero test | High | |
| 113 | +| 155 | `EnsureHPA` | Scale-from-zero test | High | |
| 114 | + |
| 115 | +**Migration Plan**: |
| 116 | +- Use `utils.NewTestResourceNames("scale-from-zero", "startup")` |
| 117 | + |
| 118 | +#### parallel_load_scaleup_test.go |
| 119 | + |
| 120 | +| Line | Function | Context | Migration Priority | |
| 121 | +|------|----------|---------|-------------------| |
| 122 | +| 92 | `EnsureModelService` | Parallel load test | High | |
| 123 | +| 108 | `EnsureService` | Parallel load test | High | |
| 124 | +| 124 | `EnsureServiceMonitor` | Parallel load test | High | |
| 125 | +| 153 | `EnsureVariantAutoscalingWithDefaults` | Parallel load test | High | |
| 126 | +| 186 | `EnsureScaledObject` | Parallel load test | High | |
| 127 | +| 192 | `EnsureHPA` | Parallel load test | High | |
| 128 | +| 321 | `EnsureParallelLoadJobs` | Load generation | Medium | |
| 129 | + |
| 130 | +**Migration Plan**: |
| 131 | +- Use `utils.NewTestResourceNames("parallel", "scaleup")` |
| 132 | + |
| 133 | +#### limiter_test.go |
| 134 | + |
| 135 | +| Line | Function | Context | Migration Priority | |
| 136 | +|------|----------|---------|-------------------| |
| 137 | +| 43-51 | Multiple `Ensure*` | Limiter test variant A | High | |
| 138 | +| 72-80 | Multiple `Ensure*` | Limiter test variant B | High | |
| 139 | +| 114-123 | `EnsureVariantAutoscaling` | Limiter VAs | High | |
| 140 | +| 133-141 | `EnsureScaledObject`/`EnsureHPA` | Limiter scalers | High | |
| 141 | + |
| 142 | +**Migration Plan**: |
| 143 | +- Use `utils.NewTestResourceNamesWithVariant("limiter", "accelerator", "nvidia")` |
| 144 | +- Use `utils.NewTestResourceNamesWithVariant("limiter", "accelerator", "amd")` |
| 145 | + |
| 146 | +#### pod_scraping_test.go |
| 147 | + |
| 148 | +| Line | Function | Context | Migration Priority | |
| 149 | +|------|----------|---------|-------------------| |
| 150 | +| 34 | `EnsureModelService` | Pod scraping test | Low | |
| 151 | +| 39 | `EnsureService` | Pod scraping test | Low | |
| 152 | + |
| 153 | +**Migration Plan**: |
| 154 | +- Use `utils.NewTestResourceNames("pod-scraping", "metrics")` |
| 155 | + |
| 156 | +## Migration Priority |
| 157 | + |
| 158 | +### High Priority (Core Test Functionality) |
| 159 | +- smoke_test.go: Basic VA lifecycle (lines 135-213) |
| 160 | +- saturation_test.go: All tests |
| 161 | +- scale_to_zero_test.go: Main test |
| 162 | +- scale_from_zero_test.go: All tests |
| 163 | +- parallel_load_scaleup_test.go: All tests |
| 164 | +- limiter_test.go: All tests |
| 165 | + |
| 166 | +### Medium Priority (Secondary Tests) |
| 167 | +- smoke_test.go: Error handling tests |
| 168 | +- scale_to_zero_test.go: Disabled test |
| 169 | +- Load generation jobs |
| 170 | + |
| 171 | +### Low Priority (Infrastructure Tests) |
| 172 | +- pod_scraping_test.go |
| 173 | + |
| 174 | +## Migration Tracking |
| 175 | + |
| 176 | +### Completed |
| 177 | +- ✅ Infrastructure and utilities created |
| 178 | +- ✅ Documentation written |
| 179 | +- ✅ Fixture functions documented |
| 180 | + |
| 181 | +### In Progress |
| 182 | +- 🔄 Migration guide created |
| 183 | +- 🔄 Justified usage documented |
| 184 | + |
| 185 | +### Pending |
| 186 | +- ⏳ Actual test file migrations (54 Ensure* calls across 7 files) |
| 187 | + |
| 188 | +## Adding New Justified Usage |
| 189 | + |
| 190 | +If you need to add new `Ensure*` usage, follow this process: |
| 191 | + |
| 192 | +1. **Document in Code**: Add a comment explaining why `Ensure*` is needed |
| 193 | + ```go |
| 194 | + // JUSTIFIED ENSURE* USAGE: [Reason] |
| 195 | + // Alternative considered: [What was considered and why it doesn't work] |
| 196 | + err := fixtures.EnsureModelService(...) |
| 197 | + ``` |
| 198 | + |
| 199 | +2. **Update This Document**: Add an entry to the "Current Justified Usage" section |
| 200 | + |
| 201 | +3. **Code Review**: Ensure reviewers understand and approve the justification |
| 202 | + |
| 203 | +4. **Periodic Review**: Revisit justified usage quarterly to see if it can be eliminated |
| 204 | + |
| 205 | +## See Also |
| 206 | + |
| 207 | +- [Migration Guide](./e2e-test-isolation-migration-guide.md) |
| 208 | +- [Refactoring Plan](./e2e-test-isolation-refactoring.md) |
| 209 | +- [Implementation Summary](./e2e-test-isolation-implementation-summary.md) |
0 commit comments