Skip to content

Commit f69910a

Browse files
committed
test/e2e: migrate from Ensure* to Create* pattern for test isolation
Replace delete-and-recreate Ensure* helpers with fail-if-exists Create* functions across all e2e test files. This eliminates the illusion that multiple tests can safely share resources and establishes clear single- owner semantics for test resources. Key changes: - Introduce utils.NewTestResourceNames() for unique resource naming - Migrate 59 Ensure* calls to Create* across 7 test files - Consolidate cleanup from scattered DeferCleanup to AfterAll blocks - Enhance BeforeSuite to clean leftover resources from interrupted runs - Add comprehensive documentation and migration guides Each test now owns uniquely-named resources, preventing cross-test mutations and enabling future parallel test execution. The only remaining Ensure* calls are for transient load generation utilities, which are documented as justified exceptions. Signed-off-by: Sathvik <Sathvik.S@ibm.com>
1 parent 89ef12b commit f69910a

16 files changed

Lines changed: 2740 additions & 175 deletions
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Justified Ensure* Usage in E2E Tests
2+
3+
This document catalogs the remaining `Ensure*` function usage in the e2e test suite and provides justification for each case.
4+
5+
## Overview
6+
7+
As part of the test isolation refactoring effort, we've established that `Create*` functions with unique resource names should be preferred over `Ensure*` functions. However, certain scenarios justify the use of `Ensure*` semantics.
8+
9+
## Justification Criteria
10+
11+
`Ensure*` usage is justified when:
12+
13+
1. **Testing Recovery Behavior**: The test explicitly validates controller recovery when resources are deleted/recreated
14+
2. **Shared Test Infrastructure**: Resources are truly shared across all tests (e.g., namespace-level configuration)
15+
3. **Development/Debugging**: Temporary usage during development with clear TODO comments
16+
4. **Idempotent Retries**: Test framework retries require idempotent setup
17+
18+
## Current Justified Usage
19+
20+
### 1. Error Recovery Tests (smoke_test.go)
21+
22+
**Location**: `test/e2e/smoke_test.go:1068-1070`
23+
24+
```go
25+
// JUSTIFIED: Testing controller recovery behavior when deployment is recreated
26+
By("Recreating the deployment")
27+
err = fixtures.EnsureModelService(ctx, k8sClient, cfg.LLMDNamespace,
28+
errorTestModelServiceName, errorTestPoolName, cfg.ModelID,
29+
cfg.UseSimulator, cfg.MaxNumSeqs)
30+
```
31+
32+
**Justification**: This test specifically validates that the WVA controller can recover when a deployment is deleted and recreated. The `Ensure*` pattern is intentional here to simulate real-world scenarios where deployments might be recreated by operators or other controllers.
33+
34+
**Alternative Considered**: Using `Delete` + `Create` explicitly, but `Ensure*` better represents the real-world scenario being tested.
35+
36+
**Status**: ✅ Justified - Keep with documentation
37+
38+
---
39+
40+
### 2. Remaining Unjustified Usage
41+
42+
The following `Ensure*` calls should be migrated to `Create*` with unique names:
43+
44+
#### smoke_test.go
45+
46+
| Line | Function | Context | Migration Priority |
47+
|------|----------|---------|-------------------|
48+
| 135 | `EnsureModelService` | Basic VA lifecycle setup | High |
49+
| 151 | `EnsureService` | Basic VA lifecycle setup | High |
50+
| 168 | `EnsureServiceMonitor` | Basic VA lifecycle setup | High |
51+
| 197 | `EnsureVariantAutoscalingWithDefaults` | Basic VA lifecycle setup | High |
52+
| 210 | `EnsureScaledObject` | Basic VA lifecycle setup | High |
53+
| 213 | `EnsureHPA` | Basic VA lifecycle setup | High |
54+
| 590 | `EnsureBurstLoadJob` | Scale-up test | High |
55+
| 974 | `EnsureModelService` | Error test setup | Medium |
56+
| 997 | `EnsureVariantAutoscalingWithDefaults` | Error test setup | Medium |
57+
58+
**Migration Plan**:
59+
- Use `utils.NewTestResourceNames("smoke", "basic")` for basic lifecycle tests
60+
- Use `utils.NewTestResourceNames("smoke", "scaleup")` for scale-up tests
61+
- Use `utils.NewTestResourceNames("smoke", "error")` for error handling tests
62+
63+
#### saturation_test.go
64+
65+
| Line | Function | Context | Migration Priority |
66+
|------|----------|---------|-------------------|
67+
| 42 | `EnsureModelService` | Single variant test | High |
68+
| 58 | `EnsureService` | Single variant test | High |
69+
| 74 | `EnsureServiceMonitor` | Single variant test | High |
70+
| 129 | `EnsureVariantAutoscaling` | Single variant test | High |
71+
| 145 | `EnsureScaledObject` | Single variant test | High |
72+
| 148 | `EnsureHPA` | Single variant test | High |
73+
| 341-349 | Multiple `Ensure*` | Multi-variant test A | High |
74+
| 352-360 | Multiple `Ensure*` | Multi-variant test B | High |
75+
| 375-380 | `EnsureVariantAutoscaling` | Multi-variant VAs | High |
76+
| 387-395 | `EnsureScaledObject`/`EnsureHPA` | Multi-variant scalers | High |
77+
| 451-456 | `EnsureBurstLoadJob` | Load generation | Medium |
78+
79+
**Migration Plan**:
80+
- Use `utils.NewTestResourceNames("saturation", "single")` for single variant
81+
- Use `utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")` and `"b"` for multi-variant
82+
83+
#### scale_to_zero_test.go
84+
85+
| Line | Function | Context | Migration Priority |
86+
|------|----------|---------|-------------------|
87+
| 47 | `EnsureModelService` | Scale-to-zero test | High |
88+
| 63 | `EnsureService` | Scale-to-zero test | High |
89+
| 79 | `EnsureServiceMonitor` | Scale-to-zero test | High |
90+
| 108 | `EnsureVariantAutoscaling` | Scale-to-zero test | High |
91+
| 119 | `EnsureScaledObject` | Scale-to-zero test | High |
92+
| 122 | `EnsureHPA` | Scale-to-zero test | High |
93+
| 352 | `EnsureModelService` | Disabled test | Medium |
94+
| 357 | `EnsureService` | Disabled test | Medium |
95+
| 362 | `EnsureServiceMonitor` | Disabled test | Medium |
96+
| 375 | `EnsureVariantAutoscaling` | Disabled test | Medium |
97+
| 385 | `EnsureScaledObject` | Disabled test | Medium |
98+
| 388 | `EnsureHPA` | Disabled test | Medium |
99+
100+
**Migration Plan**:
101+
- Use `utils.NewTestResourceNames("scale-to-zero", "idle")` for main test
102+
- Use `utils.NewTestResourceNames("scale-to-zero", "disabled")` for disabled test
103+
104+
#### scale_from_zero_test.go
105+
106+
| Line | Function | Context | Migration Priority |
107+
|------|----------|---------|-------------------|
108+
| 94 | `EnsureModelService` | Scale-from-zero test | High |
109+
| 108 | `EnsureService` | Scale-from-zero test | High |
110+
| 112 | `EnsureServiceMonitor` | Scale-from-zero test | High |
111+
| 141 | `EnsureVariantAutoscaling` | Scale-from-zero test | High |
112+
| 152 | `EnsureScaledObject` | Scale-from-zero test | High |
113+
| 155 | `EnsureHPA` | Scale-from-zero test | High |
114+
115+
**Migration Plan**:
116+
- Use `utils.NewTestResourceNames("scale-from-zero", "startup")`
117+
118+
#### parallel_load_scaleup_test.go
119+
120+
| Line | Function | Context | Migration Priority |
121+
|------|----------|---------|-------------------|
122+
| 92 | `EnsureModelService` | Parallel load test | High |
123+
| 108 | `EnsureService` | Parallel load test | High |
124+
| 124 | `EnsureServiceMonitor` | Parallel load test | High |
125+
| 153 | `EnsureVariantAutoscalingWithDefaults` | Parallel load test | High |
126+
| 186 | `EnsureScaledObject` | Parallel load test | High |
127+
| 192 | `EnsureHPA` | Parallel load test | High |
128+
| 321 | `EnsureParallelLoadJobs` | Load generation | Medium |
129+
130+
**Migration Plan**:
131+
- Use `utils.NewTestResourceNames("parallel", "scaleup")`
132+
133+
#### limiter_test.go
134+
135+
| Line | Function | Context | Migration Priority |
136+
|------|----------|---------|-------------------|
137+
| 43-51 | Multiple `Ensure*` | Limiter test variant A | High |
138+
| 72-80 | Multiple `Ensure*` | Limiter test variant B | High |
139+
| 114-123 | `EnsureVariantAutoscaling` | Limiter VAs | High |
140+
| 133-141 | `EnsureScaledObject`/`EnsureHPA` | Limiter scalers | High |
141+
142+
**Migration Plan**:
143+
- Use `utils.NewTestResourceNamesWithVariant("limiter", "accelerator", "nvidia")`
144+
- Use `utils.NewTestResourceNamesWithVariant("limiter", "accelerator", "amd")`
145+
146+
#### pod_scraping_test.go
147+
148+
| Line | Function | Context | Migration Priority |
149+
|------|----------|---------|-------------------|
150+
| 34 | `EnsureModelService` | Pod scraping test | Low |
151+
| 39 | `EnsureService` | Pod scraping test | Low |
152+
153+
**Migration Plan**:
154+
- Use `utils.NewTestResourceNames("pod-scraping", "metrics")`
155+
156+
## Migration Priority
157+
158+
### High Priority (Core Test Functionality)
159+
- smoke_test.go: Basic VA lifecycle (lines 135-213)
160+
- saturation_test.go: All tests
161+
- scale_to_zero_test.go: Main test
162+
- scale_from_zero_test.go: All tests
163+
- parallel_load_scaleup_test.go: All tests
164+
- limiter_test.go: All tests
165+
166+
### Medium Priority (Secondary Tests)
167+
- smoke_test.go: Error handling tests
168+
- scale_to_zero_test.go: Disabled test
169+
- Load generation jobs
170+
171+
### Low Priority (Infrastructure Tests)
172+
- pod_scraping_test.go
173+
174+
## Migration Tracking
175+
176+
### Completed
177+
- ✅ Infrastructure and utilities created
178+
- ✅ Documentation written
179+
- ✅ Fixture functions documented
180+
181+
### In Progress
182+
- 🔄 Migration guide created
183+
- 🔄 Justified usage documented
184+
185+
### Pending
186+
- ⏳ Actual test file migrations (54 Ensure* calls across 7 files)
187+
188+
## Adding New Justified Usage
189+
190+
If you need to add new `Ensure*` usage, follow this process:
191+
192+
1. **Document in Code**: Add a comment explaining why `Ensure*` is needed
193+
```go
194+
// JUSTIFIED ENSURE* USAGE: [Reason]
195+
// Alternative considered: [What was considered and why it doesn't work]
196+
err := fixtures.EnsureModelService(...)
197+
```
198+
199+
2. **Update This Document**: Add an entry to the "Current Justified Usage" section
200+
201+
3. **Code Review**: Ensure reviewers understand and approve the justification
202+
203+
4. **Periodic Review**: Revisit justified usage quarterly to see if it can be eliminated
204+
205+
## See Also
206+
207+
- [Migration Guide](./e2e-test-isolation-migration-guide.md)
208+
- [Refactoring Plan](./e2e-test-isolation-refactoring.md)
209+
- [Implementation Summary](./e2e-test-isolation-implementation-summary.md)

test/e2e/README.md

Lines changed: 105 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -433,38 +433,125 @@ kubectl delete job -n llm-d-sim -l test-resource=true
433433
1. Create a new test file in `test/e2e/`
434434
2. Use fixtures from `test/e2e/fixtures/` to create resources
435435
3. Add appropriate Ginkgo labels (`smoke`, `full`, `flaky`)
436-
4. Ensure BeforeAll/AfterAll cleanup is implemented
437-
5. Update this README with test description
436+
4. **Use unique resource names** via `utils.TestResourceNames` helpers
437+
5. **Implement per-test cleanup** in `AfterEach` blocks
438+
6. Update this README with test description
438439

439-
### Example Test Template
440+
### Test Isolation Guidelines
441+
442+
**IMPORTANT:** To ensure proper test isolation and prevent cross-test dependencies:
443+
444+
1. **Use Unique Names**: Always use `utils.TestResourceNames` helpers to generate unique resource names
445+
2. **Prefer Create* over Ensure***: Use `fixtures.Create*()` functions instead of `fixtures.Ensure*()`
446+
3. **Clean Up Per Test**: Implement `AfterEach` cleanup for test-specific resources
447+
4. **Document Ensure* Usage**: If you must use `Ensure*`, add a comment explaining why
448+
449+
See [E2E Test Isolation Refactoring](../../docs/developer-guide/e2e-test-isolation-refactoring.md) for detailed guidelines.
450+
451+
### Example Test Template (Recommended Pattern)
440452

441453
```go
442-
var _ = Describe("My New Test", Label("full"), Ordered, func() {
454+
var _ = Describe("My New Test", Label("full"), func() {
455+
var names utils.TestResourceNames
456+
457+
BeforeEach(func() {
458+
// Generate unique names for this test
459+
names = utils.NewTestResourceNames("mysuite", "mytest")
460+
})
461+
462+
AfterEach(func() {
463+
By("Cleaning up test resources")
464+
// Clean up in reverse order of creation
465+
_ = fixtures.DeleteHPA(ctx, k8sClient, cfg.LLMDNamespace, names.Base)
466+
_ = fixtures.DeleteVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace, names.VA)
467+
_ = fixtures.DeleteService(ctx, k8sClient, cfg.LLMDNamespace, names.Base)
468+
_ = fixtures.DeleteModelService(ctx, k8sClient, cfg.LLMDNamespace, names.Base)
469+
_ = fixtures.DeleteServiceMonitor(ctx, crClient, cfg.MonitoringNS, names.Base)
470+
})
471+
472+
It("should do something", func() {
473+
By("Creating model service with unique name")
474+
err := fixtures.CreateModelService(ctx, k8sClient, cfg.LLMDNamespace,
475+
names.Base, poolName, cfg.ModelID, cfg.UseSimulator, cfg.MaxNumSeqs)
476+
Expect(err).NotTo(HaveOccurred())
477+
478+
By("Creating service")
479+
err = fixtures.CreateService(ctx, k8sClient, cfg.LLMDNamespace,
480+
names.Base, names.Deployment, 8000)
481+
Expect(err).NotTo(HaveOccurred())
482+
483+
By("Creating VariantAutoscaling")
484+
err = fixtures.CreateVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace,
485+
names.VA, names.Deployment, cfg.ModelID, "A100", 30.0, cfg.ControllerInstance)
486+
Expect(err).NotTo(HaveOccurred())
487+
488+
// Test implementation...
489+
})
490+
})
491+
```
492+
493+
### Example: Multi-Variant Test
494+
495+
```go
496+
var _ = Describe("Multi-Variant Test", Label("full"), func() {
443497
var (
444-
poolName = "my-test-pool"
445-
vaName = "my-test-va"
498+
namesA utils.TestResourceNames
499+
namesB utils.TestResourceNames
446500
)
447501

448-
BeforeAll(func() {
449-
// Create test resources using fixtures
450-
err := fixtures.CreateInferencePool(ctx, crClient, cfg.LLMDNamespace, poolName, 8000)
451-
Expect(err).NotTo(HaveOccurred())
452-
// ... create more resources
502+
BeforeEach(func() {
503+
// Generate unique names for each variant
504+
namesA = utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")
505+
namesB = utils.NewTestResourceNamesWithVariant("saturation", "multi", "b")
453506
})
454507

455-
AfterAll(func() {
456-
// Clean up test resources
457-
_ = crClient.Delete(ctx, &v1alpha1.VariantAutoscaling{
458-
ObjectMeta: metav1.ObjectMeta{Name: vaName, Namespace: cfg.LLMDNamespace},
459-
})
508+
AfterEach(func() {
509+
By("Cleaning up variant A resources")
510+
_ = fixtures.DeleteVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace, namesA.VA)
511+
_ = fixtures.DeleteModelService(ctx, k8sClient, cfg.LLMDNamespace, namesA.Base)
512+
513+
By("Cleaning up variant B resources")
514+
_ = fixtures.DeleteVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace, namesB.VA)
515+
_ = fixtures.DeleteModelService(ctx, k8sClient, cfg.LLMDNamespace, namesB.Base)
460516
})
461517

462-
It("should do something", func() {
463-
// Test implementation
518+
It("should scale variants independently", func() {
519+
// Create variant A (cheaper)
520+
err := fixtures.CreateModelService(ctx, k8sClient, cfg.LLMDNamespace,
521+
namesA.Base, poolA, cfg.ModelID, cfg.UseSimulator, cfg.MaxNumSeqs)
522+
Expect(err).NotTo(HaveOccurred())
523+
524+
// Create variant B (more expensive)
525+
err = fixtures.CreateModelService(ctx, k8sClient, cfg.LLMDNamespace,
526+
namesB.Base, poolB, cfg.ModelID, cfg.UseSimulator, cfg.MaxNumSeqs)
527+
Expect(err).NotTo(HaveOccurred())
528+
529+
// Test implementation...
464530
})
465531
})
466532
```
467533

534+
### Naming Utilities
535+
536+
The `test/utils` package provides helpers for generating unique resource names:
537+
538+
```go
539+
// Simple test with single resource set
540+
names := utils.NewTestResourceNames("smoke", "basic")
541+
// names.Base = "smoke-basic"
542+
// names.Deployment = "smoke-basic-decode"
543+
// names.Service = "smoke-basic-service"
544+
// names.VA = "smoke-basic-va"
545+
546+
// Multi-variant test
547+
namesA := utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")
548+
namesB := utils.NewTestResourceNamesWithVariant("saturation", "multi", "b")
549+
550+
// Custom naming
551+
name := utils.TestResourceName("mysuite", "mytest", "model")
552+
// Returns: "mysuite-mytest-model"
553+
```
554+
468555
## See Also
469556

470557
- [Developer Testing Guide](../../docs/developer-guide/testing.md)

test/e2e/fixtures/hpa_builder.go

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,10 @@ import (
1414
"k8s.io/utils/ptr"
1515
)
1616

17-
// CreateHPA creates a HorizontalPodAutoscaler for WVA integration. Fails if it already exists.
17+
// CreateHPA creates a HorizontalPodAutoscaler for WVA integration.
18+
//
19+
// Enforces single ownership: FAILS if HPA already exists. Use unique names for test isolation.
20+
// Clean up with DeleteHPA() in AfterEach.
1821
func CreateHPA(
1922
ctx context.Context,
2023
k8sClient *kubernetes.Clientset,
@@ -37,6 +40,9 @@ func DeleteHPA(ctx context.Context, k8sClient *kubernetes.Clientset, namespace,
3740
}
3841

3942
// EnsureHPA creates or replaces the HPA (idempotent for test setup).
43+
//
44+
// WARNING: Delete-and-recreate semantics. PREFER CreateHPA() with unique names.
45+
// See EnsureModelService() for when to use Ensure* functions.
4046
func EnsureHPA(
4147
ctx context.Context,
4248
k8sClient *kubernetes.Clientset,

0 commit comments

Comments
 (0)