Registry API Performance

# Performance Requirements for thv-registry-api
## Overview
Define performance requirements and testing methodology for `thv-registry-api` after stacklok/toolhive#2301 refactoring.
## Performance Targets [EXAMPLE TARGETS TO BE DISCUSSED]

### Synchronization Performance
- Git source sync: < 30s for 1000 servers
- ConfigMap source sync: < 5s for 1000 servers
- Registry API source sync: < 15s for 1000 servers
- Hash-based change detection: < 1s
- Filter application: < 2s for 1000 servers

### HTTP API Performance
Requirements for 95th percentile/99th percentile:
- GET /v1/registry: < 100ms / < 200ms @ 100 req/s
- GET /v1/registry/{id}: < 50ms / < 100ms @ 200 req/s
- GET /registry/status: < 50ms / < 100ms @ 50 req/s
- POST /registry/sync: < 100ms / < 200ms @ 10 req/s

### Resource Limits

- **Memory**: < 256MB steady state, < 512MB during sync
- **CPU**: < 0.5 cores idle, < 1.5 cores during sync

### Configuration Reload

- Config change detection: < 3s after file write
- Reload and sync trigger: < 5s total
- Zero downtime during reload

## Test Scenarios

1. Large Registry Sync (Git)
  
  - 1000 servers with tag filtering
  - Measure: sync time, memory, CPU

2. HTTP API Load Test
  
  - 100 req/s for 60s using vegeta or hey
  - Verify p95/p99 latency targets

3. Concurrent Sync and Queries

  - Continuous query load during periodic sync
  - Verify < 50% latency increase during sync
  
4. Configuration Reload

  - Modify config file, measure detection and sync trigger time
  
5. Long-Running Stability

  - 24-hour test with periodic sync
  - Monitor for memory leaks and goroutine leaks
 
## Implementation Tasks

-  Add Go benchmarks for filtering, sync, and config reload
-  Create integration tests with performance assertions
-  Expose Prometheus metrics (sync duration, HTTP latency, memory, goroutines)
-  Enable pprof endpoints for CPU/memory profiling
-  Add CI performance regression tests using benchstat
-  Document profiling procedures in CONTRIBUTING.md

## Acceptance Criteria

- ✅ All sync operations meet time targets for 1000 servers
- ✅ HTTP API latency targets met under load
- ✅ Memory stays within limits (no leaks over 24h)
- ✅ Config reload < 5s end-to-end
- ✅ No API downtime during sync or reload
- ✅ CI performance tests prevent regressions > 20%

## Monitoring
### Key metrics to expose (Telemetry)

- sync_duration_seconds{source_type}
- sync_server_count{source_type}
- http_request_duration_seconds{endpoint,method}
- process_resident_memory_bytes
- go_goroutines
- config_reload_duration_seconds


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Registry API Performance #45

Performance Requirements for thv-registry-api

Overview

Performance Targets [EXAMPLE TARGETS TO BE DISCUSSED]

Synchronization Performance

HTTP API Performance

Resource Limits

Configuration Reload

Test Scenarios

Implementation Tasks

Acceptance Criteria

Monitoring

Key metrics to expose (Telemetry)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Registry API Performance #45

Description

Performance Requirements for thv-registry-api

Overview

Performance Targets [EXAMPLE TARGETS TO BE DISCUSSED]

Synchronization Performance

HTTP API Performance

Resource Limits

Configuration Reload

Test Scenarios

Implementation Tasks

Acceptance Criteria

Monitoring

Key metrics to expose (Telemetry)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions