Skip to content

Commit f5019e7

Browse files
committed
test/e2e: migrate from Ensure* to Create* pattern for test isolation
Replace delete-and-recreate Ensure* helpers with fail-if-exists Create* functions across all e2e test files. This eliminates the illusion that multiple tests can safely share resources and establishes clear single- owner semantics for test resources. Key changes: - Introduce utils.NewTestResourceNames() for unique resource naming - Migrate 59 Ensure* calls to Create* across 7 test files - Consolidate cleanup from scattered DeferCleanup to AfterAll blocks - Enhance BeforeSuite to clean leftover resources from interrupted runs - Add comprehensive documentation and migration guides Each test now owns uniquely-named resources, preventing cross-test mutations and enabling future parallel test execution. The only remaining Ensure* calls are for transient load generation utilities, which are documented as justified exceptions. Signed-off-by: Sathvik <Sathvik.S@ibm.com>
1 parent b745c47 commit f5019e7

18 files changed

Lines changed: 1701 additions & 548 deletions
Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
# Justified Ensure* Usage in E2E Tests
2+
3+
This document catalogs the remaining `Ensure*` function usage in the e2e test suite and provides justification for each case.
4+
5+
## Overview
6+
7+
As part of the test isolation refactoring effort, we've established that `Create*` functions with unique resource names should be preferred over `Ensure*` functions. However, certain scenarios justify the use of `Ensure*` semantics.
8+
9+
## Justification Criteria
10+
11+
`Ensure*` usage is justified when:
12+
13+
1. **Testing Recovery Behavior**: The test explicitly validates controller recovery when resources are deleted/recreated
14+
2. **Shared Test Infrastructure**: Resources are truly shared across all tests (e.g., namespace-level configuration)
15+
3. **Development/Debugging**: Temporary usage during development with clear TODO comments
16+
4. **Idempotent Retries**: Test framework retries require idempotent setup
17+
18+
## Current Justified Usage
19+
20+
### 1. Error Recovery Tests (smoke_test.go)
21+
22+
**Location**: `test/e2e/smoke_test.go:1068-1070`
23+
24+
```go
25+
// JUSTIFIED: Testing controller recovery behavior when deployment is recreated
26+
By("Recreating the deployment")
27+
err = fixtures.EnsureModelService(ctx, k8sClient, cfg.LLMDNamespace,
28+
errorTestModelServiceName, errorTestPoolName, cfg.ModelID,
29+
cfg.UseSimulator, cfg.MaxNumSeqs)
30+
```
31+
32+
**Justification**: This test specifically validates that the WVA controller can recover when a deployment is deleted and recreated. The `Ensure*` pattern is intentional here to simulate real-world scenarios where deployments might be recreated by operators or other controllers.
33+
34+
**Alternative Considered**: Using `Delete` + `Create` explicitly, but `Ensure*` better represents the real-world scenario being tested.
35+
36+
**Status**: ✅ Justified - Keep with documentation
37+
38+
---
39+
40+
### 2. Remaining Unjustified Usage
41+
42+
The following `Ensure*` calls should be migrated to `Create*` with unique names:
43+
44+
#### smoke_test.go
45+
46+
| Line | Function | Context | Migration Priority |
47+
|------|----------|---------|-------------------|
48+
| 135 | `EnsureModelService` | Basic VA lifecycle setup | High |
49+
| 151 | `EnsureService` | Basic VA lifecycle setup | High |
50+
| 168 | `EnsureServiceMonitor` | Basic VA lifecycle setup | High |
51+
| 197 | `EnsureVariantAutoscalingWithDefaults` | Basic VA lifecycle setup | High |
52+
| 210 | `EnsureScaledObject` | Basic VA lifecycle setup | High |
53+
| 213 | `EnsureHPA` | Basic VA lifecycle setup | High |
54+
| 590 | `EnsureBurstLoadJob` | Scale-up test | High |
55+
| 974 | `EnsureModelService` | Error test setup | Medium |
56+
| 997 | `EnsureVariantAutoscalingWithDefaults` | Error test setup | Medium |
57+
58+
**Migration Plan**:
59+
- Use `utils.NewTestResourceNames("smoke", "basic")` for basic lifecycle tests
60+
- Use `utils.NewTestResourceNames("smoke", "scaleup")` for scale-up tests
61+
- Use `utils.NewTestResourceNames("smoke", "error")` for error handling tests
62+
63+
#### saturation_test.go
64+
65+
| Line | Function | Context | Migration Priority |
66+
|------|----------|---------|-------------------|
67+
| 42 | `EnsureModelService` | Single variant test | High |
68+
| 58 | `EnsureService` | Single variant test | High |
69+
| 74 | `EnsureServiceMonitor` | Single variant test | High |
70+
| 129 | `EnsureVariantAutoscaling` | Single variant test | High |
71+
| 145 | `EnsureScaledObject` | Single variant test | High |
72+
| 148 | `EnsureHPA` | Single variant test | High |
73+
| 341-349 | Multiple `Ensure*` | Multi-variant test A | High |
74+
| 352-360 | Multiple `Ensure*` | Multi-variant test B | High |
75+
| 375-380 | `EnsureVariantAutoscaling` | Multi-variant VAs | High |
76+
| 387-395 | `EnsureScaledObject`/`EnsureHPA` | Multi-variant scalers | High |
77+
| 451-456 | `EnsureBurstLoadJob` | Load generation | Medium |
78+
79+
**Migration Plan**:
80+
- Use `utils.NewTestResourceNames("saturation", "single")` for single variant
81+
- Use `utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")` and `"b"` for multi-variant
82+
83+
#### scale_to_zero_test.go
84+
85+
| Line | Function | Context | Migration Priority |
86+
|------|----------|---------|-------------------|
87+
| 47 | `EnsureModelService` | Scale-to-zero test | High |
88+
| 63 | `EnsureService` | Scale-to-zero test | High |
89+
| 79 | `EnsureServiceMonitor` | Scale-to-zero test | High |
90+
| 108 | `EnsureVariantAutoscaling` | Scale-to-zero test | High |
91+
| 119 | `EnsureScaledObject` | Scale-to-zero test | High |
92+
| 122 | `EnsureHPA` | Scale-to-zero test | High |
93+
| 352 | `EnsureModelService` | Disabled test | Medium |
94+
| 357 | `EnsureService` | Disabled test | Medium |
95+
| 362 | `EnsureServiceMonitor` | Disabled test | Medium |
96+
| 375 | `EnsureVariantAutoscaling` | Disabled test | Medium |
97+
| 385 | `EnsureScaledObject` | Disabled test | Medium |
98+
| 388 | `EnsureHPA` | Disabled test | Medium |
99+
100+
**Migration Plan**:
101+
- Use `utils.NewTestResourceNames("scale-to-zero", "idle")` for main test
102+
- Use `utils.NewTestResourceNames("scale-to-zero", "disabled")` for disabled test
103+
104+
#### scale_from_zero_test.go
105+
106+
| Line | Function | Context | Migration Priority |
107+
|------|----------|---------|-------------------|
108+
| 94 | `EnsureModelService` | Scale-from-zero test | High |
109+
| 108 | `EnsureService` | Scale-from-zero test | High |
110+
| 112 | `EnsureServiceMonitor` | Scale-from-zero test | High |
111+
| 141 | `EnsureVariantAutoscaling` | Scale-from-zero test | High |
112+
| 152 | `EnsureScaledObject` | Scale-from-zero test | High |
113+
| 155 | `EnsureHPA` | Scale-from-zero test | High |
114+
115+
**Migration Plan**:
116+
- Use `utils.NewTestResourceNames("scale-from-zero", "startup")`
117+
118+
#### parallel_load_scaleup_test.go
119+
120+
| Line | Function | Context | Migration Priority |
121+
|------|----------|---------|-------------------|
122+
| 92 | `EnsureModelService` | Parallel load test | High |
123+
| 108 | `EnsureService` | Parallel load test | High |
124+
| 124 | `EnsureServiceMonitor` | Parallel load test | High |
125+
| 153 | `EnsureVariantAutoscalingWithDefaults` | Parallel load test | High |
126+
| 186 | `EnsureScaledObject` | Parallel load test | High |
127+
| 192 | `EnsureHPA` | Parallel load test | High |
128+
| 321 | `EnsureParallelLoadJobs` | Load generation | Medium |
129+
130+
**Migration Plan**:
131+
- Use `utils.NewTestResourceNames("parallel", "scaleup")`
132+
133+
#### limiter_test.go
134+
135+
| Line | Function | Context | Migration Priority |
136+
|------|----------|---------|-------------------|
137+
| 43-51 | Multiple `Ensure*` | Limiter test variant A | High |
138+
| 72-80 | Multiple `Ensure*` | Limiter test variant B | High |
139+
| 114-123 | `EnsureVariantAutoscaling` | Limiter VAs | High |
140+
| 133-141 | `EnsureScaledObject`/`EnsureHPA` | Limiter scalers | High |
141+
142+
**Migration Plan**:
143+
- Use `utils.NewTestResourceNamesWithVariant("limiter", "accelerator", "nvidia")`
144+
- Use `utils.NewTestResourceNamesWithVariant("limiter", "accelerator", "amd")`
145+
146+
#### pod_scraping_test.go
147+
148+
| Line | Function | Context | Migration Priority |
149+
|------|----------|---------|-------------------|
150+
| 34 | `EnsureModelService` | Pod scraping test | Low |
151+
| 39 | `EnsureService` | Pod scraping test | Low |
152+
153+
**Migration Plan**:
154+
- Use `utils.NewTestResourceNames("pod-scraping", "metrics")`
155+
156+
## Migration Priority
157+
158+
### High Priority (Core Test Functionality)
159+
- smoke_test.go: Basic VA lifecycle (lines 135-213)
160+
- saturation_test.go: All tests
161+
- scale_to_zero_test.go: Main test
162+
- scale_from_zero_test.go: All tests
163+
- parallel_load_scaleup_test.go: All tests
164+
- limiter_test.go: All tests
165+
166+
### Medium Priority (Secondary Tests)
167+
- smoke_test.go: Error handling tests
168+
- scale_to_zero_test.go: Disabled test
169+
- Load generation jobs
170+
171+
### Low Priority (Infrastructure Tests)
172+
- pod_scraping_test.go
173+
174+
## Migration Tracking
175+
176+
### Completed
177+
- ✅ Infrastructure and utilities created
178+
- ✅ Documentation written
179+
- ✅ Fixture functions documented
180+
181+
### In Progress
182+
- 🔄 Migration guide created
183+
- 🔄 Justified usage documented
184+
185+
### Pending
186+
- ⏳ Actual test file migrations (54 Ensure* calls across 7 files)
187+
188+
## Adding New Justified Usage
189+
190+
If you need to add new `Ensure*` usage, follow this process:
191+
192+
1. **Document in Code**: Add a comment explaining why `Ensure*` is needed
193+
```go
194+
// JUSTIFIED ENSURE* USAGE: [Reason]
195+
// Alternative considered: [What was considered and why it doesn't work]
196+
err := fixtures.EnsureModelService(...)
197+
```
198+
199+
2. **Update This Document**: Add an entry to the "Current Justified Usage" section
200+
201+
3. **Code Review**: Ensure reviewers understand and approve the justification
202+
203+
4. **Periodic Review**: Revisit justified usage quarterly to see if it can be eliminated
204+
205+
## See Also
206+
207+
- [Migration Guide](./e2e-test-isolation-migration-guide.md)
208+
- [Refactoring Plan](./e2e-test-isolation-refactoring.md)
209+
- [Implementation Summary](./e2e-test-isolation-implementation-summary.md)

test/e2e/README.md

Lines changed: 105 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -400,38 +400,125 @@ kubectl delete job -n llm-d-sim -l test-resource=true
400400
1. Create a new test file in `test/e2e/`
401401
2. Use fixtures from `test/e2e/fixtures/` to create resources
402402
3. Add appropriate Ginkgo labels (`smoke`, `full`, `flaky`)
403-
4. Ensure BeforeAll/AfterAll cleanup is implemented
404-
5. Update this README with test description
403+
4. **Use unique resource names** via `utils.TestResourceNames` helpers
404+
5. **Implement per-test cleanup** in `AfterEach` blocks
405+
6. Update this README with test description
405406

406-
### Example Test Template
407+
### Test Isolation Guidelines
408+
409+
**IMPORTANT:** To ensure proper test isolation and prevent cross-test dependencies:
410+
411+
1. **Use Unique Names**: Always use `utils.TestResourceNames` helpers to generate unique resource names
412+
2. **Prefer Create* over Ensure***: Use `fixtures.Create*()` functions instead of `fixtures.Ensure*()`
413+
3. **Clean Up Per Test**: Implement `AfterEach` cleanup for test-specific resources
414+
4. **Document Ensure* Usage**: If you must use `Ensure*`, add a comment explaining why
415+
416+
See [E2E Test Isolation Refactoring](../../docs/developer-guide/e2e-test-isolation-refactoring.md) for detailed guidelines.
417+
418+
### Example Test Template (Recommended Pattern)
407419

408420
```go
409-
var _ = Describe("My New Test", Label("full"), Ordered, func() {
421+
var _ = Describe("My New Test", Label("full"), func() {
422+
var names utils.TestResourceNames
423+
424+
BeforeEach(func() {
425+
// Generate unique names for this test
426+
names = utils.NewTestResourceNames("mysuite", "mytest")
427+
})
428+
429+
AfterEach(func() {
430+
By("Cleaning up test resources")
431+
// Clean up in reverse order of creation
432+
_ = fixtures.DeleteHPA(ctx, k8sClient, cfg.LLMDNamespace, names.Base)
433+
_ = fixtures.DeleteVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace, names.VA)
434+
_ = fixtures.DeleteService(ctx, k8sClient, cfg.LLMDNamespace, names.Base)
435+
_ = fixtures.DeleteModelService(ctx, k8sClient, cfg.LLMDNamespace, names.Base)
436+
_ = fixtures.DeleteServiceMonitor(ctx, crClient, cfg.MonitoringNS, names.Base)
437+
})
438+
439+
It("should do something", func() {
440+
By("Creating model service with unique name")
441+
err := fixtures.CreateModelService(ctx, k8sClient, cfg.LLMDNamespace,
442+
names.Base, poolName, cfg.ModelID, cfg.UseSimulator, cfg.MaxNumSeqs)
443+
Expect(err).NotTo(HaveOccurred())
444+
445+
By("Creating service")
446+
err = fixtures.CreateService(ctx, k8sClient, cfg.LLMDNamespace,
447+
names.Base, names.Deployment, 8000)
448+
Expect(err).NotTo(HaveOccurred())
449+
450+
By("Creating VariantAutoscaling")
451+
err = fixtures.CreateVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace,
452+
names.VA, names.Deployment, cfg.ModelID, "A100", 30.0, cfg.ControllerInstance)
453+
Expect(err).NotTo(HaveOccurred())
454+
455+
// Test implementation...
456+
})
457+
})
458+
```
459+
460+
### Example: Multi-Variant Test
461+
462+
```go
463+
var _ = Describe("Multi-Variant Test", Label("full"), func() {
410464
var (
411-
poolName = "my-test-pool"
412-
vaName = "my-test-va"
465+
namesA utils.TestResourceNames
466+
namesB utils.TestResourceNames
413467
)
414468

415-
BeforeAll(func() {
416-
// Create test resources using fixtures
417-
err := fixtures.CreateInferencePool(ctx, crClient, cfg.LLMDNamespace, poolName, 8000)
418-
Expect(err).NotTo(HaveOccurred())
419-
// ... create more resources
469+
BeforeEach(func() {
470+
// Generate unique names for each variant
471+
namesA = utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")
472+
namesB = utils.NewTestResourceNamesWithVariant("saturation", "multi", "b")
420473
})
421474

422-
AfterAll(func() {
423-
// Clean up test resources
424-
_ = crClient.Delete(ctx, &v1alpha1.VariantAutoscaling{
425-
ObjectMeta: metav1.ObjectMeta{Name: vaName, Namespace: cfg.LLMDNamespace},
426-
})
475+
AfterEach(func() {
476+
By("Cleaning up variant A resources")
477+
_ = fixtures.DeleteVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace, namesA.VA)
478+
_ = fixtures.DeleteModelService(ctx, k8sClient, cfg.LLMDNamespace, namesA.Base)
479+
480+
By("Cleaning up variant B resources")
481+
_ = fixtures.DeleteVariantAutoscaling(ctx, crClient, cfg.LLMDNamespace, namesB.VA)
482+
_ = fixtures.DeleteModelService(ctx, k8sClient, cfg.LLMDNamespace, namesB.Base)
427483
})
428484

429-
It("should do something", func() {
430-
// Test implementation
485+
It("should scale variants independently", func() {
486+
// Create variant A (cheaper)
487+
err := fixtures.CreateModelService(ctx, k8sClient, cfg.LLMDNamespace,
488+
namesA.Base, poolA, cfg.ModelID, cfg.UseSimulator, cfg.MaxNumSeqs)
489+
Expect(err).NotTo(HaveOccurred())
490+
491+
// Create variant B (more expensive)
492+
err = fixtures.CreateModelService(ctx, k8sClient, cfg.LLMDNamespace,
493+
namesB.Base, poolB, cfg.ModelID, cfg.UseSimulator, cfg.MaxNumSeqs)
494+
Expect(err).NotTo(HaveOccurred())
495+
496+
// Test implementation...
431497
})
432498
})
433499
```
434500

501+
### Naming Utilities
502+
503+
The `test/utils` package provides helpers for generating unique resource names:
504+
505+
```go
506+
// Simple test with single resource set
507+
names := utils.NewTestResourceNames("smoke", "basic")
508+
// names.Base = "smoke-basic"
509+
// names.Deployment = "smoke-basic-decode"
510+
// names.Service = "smoke-basic-service"
511+
// names.VA = "smoke-basic-va"
512+
513+
// Multi-variant test
514+
namesA := utils.NewTestResourceNamesWithVariant("saturation", "multi", "a")
515+
namesB := utils.NewTestResourceNamesWithVariant("saturation", "multi", "b")
516+
517+
// Custom naming
518+
name := utils.TestResourceName("mysuite", "mytest", "model")
519+
// Returns: "mysuite-mytest-model"
520+
```
521+
435522
## See Also
436523

437524
- [Developer Testing Guide](../../docs/developer-guide/testing.md)

test/e2e/fixtures/hpa_builder.go

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,10 @@ import (
1414
"k8s.io/utils/ptr"
1515
)
1616

17-
// CreateHPA creates a HorizontalPodAutoscaler for WVA integration. Fails if it already exists.
17+
// CreateHPA creates a HorizontalPodAutoscaler for WVA integration.
18+
//
19+
// Enforces single ownership: FAILS if HPA already exists. Use unique names for test isolation.
20+
// Clean up with DeleteHPA() in AfterEach.
1821
func CreateHPA(
1922
ctx context.Context,
2023
k8sClient *kubernetes.Clientset,
@@ -37,6 +40,9 @@ func DeleteHPA(ctx context.Context, k8sClient *kubernetes.Clientset, namespace,
3740
}
3841

3942
// EnsureHPA creates or replaces the HPA (idempotent for test setup).
43+
//
44+
// WARNING: Delete-and-recreate semantics. PREFER CreateHPA() with unique names.
45+
// See EnsureModelService() for when to use Ensure* functions.
4046
func EnsureHPA(
4147
ctx context.Context,
4248
k8sClient *kubernetes.Clientset,

0 commit comments

Comments
 (0)