-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Performance Requirements for thv-registry-api
Overview
Define performance requirements and testing methodology for thv-registry-api after stacklok/toolhive#2301 refactoring.
Performance Targets [EXAMPLE TARGETS TO BE DISCUSSED]
Synchronization Performance
- Git source sync: < 30s for 1000 servers
- ConfigMap source sync: < 5s for 1000 servers
- Registry API source sync: < 15s for 1000 servers
- Hash-based change detection: < 1s
- Filter application: < 2s for 1000 servers
HTTP API Performance
Requirements for 95th percentile/99th percentile:
- GET /v1/registry: < 100ms / < 200ms @ 100 req/s
- GET /v1/registry/{id}: < 50ms / < 100ms @ 200 req/s
- GET /registry/status: < 50ms / < 100ms @ 50 req/s
- POST /registry/sync: < 100ms / < 200ms @ 10 req/s
Resource Limits
- Memory: < 256MB steady state, < 512MB during sync
- CPU: < 0.5 cores idle, < 1.5 cores during sync
Configuration Reload
- Config change detection: < 3s after file write
- Reload and sync trigger: < 5s total
- Zero downtime during reload
Test Scenarios
- Large Registry Sync (Git)
- 1000 servers with tag filtering
- Measure: sync time, memory, CPU
- HTTP API Load Test
- 100 req/s for 60s using vegeta or hey
- Verify p95/p99 latency targets
- Concurrent Sync and Queries
- Continuous query load during periodic sync
- Verify < 50% latency increase during sync
- Configuration Reload
- Modify config file, measure detection and sync trigger time
- Long-Running Stability
- 24-hour test with periodic sync
- Monitor for memory leaks and goroutine leaks
Implementation Tasks
- Add Go benchmarks for filtering, sync, and config reload
- Create integration tests with performance assertions
- Expose Prometheus metrics (sync duration, HTTP latency, memory, goroutines)
- Enable pprof endpoints for CPU/memory profiling
- Add CI performance regression tests using benchstat
- Document profiling procedures in CONTRIBUTING.md
Acceptance Criteria
- ✅ All sync operations meet time targets for 1000 servers
- ✅ HTTP API latency targets met under load
- ✅ Memory stays within limits (no leaks over 24h)
- ✅ Config reload < 5s end-to-end
- ✅ No API downtime during sync or reload
- ✅ CI performance tests prevent regressions > 20%
Monitoring
Key metrics to expose (Telemetry)
- sync_duration_seconds{source_type}
- sync_server_count{source_type}
- http_request_duration_seconds{endpoint,method}
- process_resident_memory_bytes
- go_goroutines
- config_reload_duration_seconds
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request