Skip to content

Registry API Performance #45

@dmartinol

Description

@dmartinol

Performance Requirements for thv-registry-api

Overview

Define performance requirements and testing methodology for thv-registry-api after stacklok/toolhive#2301 refactoring.

Performance Targets [EXAMPLE TARGETS TO BE DISCUSSED]

Synchronization Performance

  • Git source sync: < 30s for 1000 servers
  • ConfigMap source sync: < 5s for 1000 servers
  • Registry API source sync: < 15s for 1000 servers
  • Hash-based change detection: < 1s
  • Filter application: < 2s for 1000 servers

HTTP API Performance

Requirements for 95th percentile/99th percentile:

  • GET /v1/registry: < 100ms / < 200ms @ 100 req/s
  • GET /v1/registry/{id}: < 50ms / < 100ms @ 200 req/s
  • GET /registry/status: < 50ms / < 100ms @ 50 req/s
  • POST /registry/sync: < 100ms / < 200ms @ 10 req/s

Resource Limits

  • Memory: < 256MB steady state, < 512MB during sync
  • CPU: < 0.5 cores idle, < 1.5 cores during sync

Configuration Reload

  • Config change detection: < 3s after file write
  • Reload and sync trigger: < 5s total
  • Zero downtime during reload

Test Scenarios

  1. Large Registry Sync (Git)
  • 1000 servers with tag filtering
  • Measure: sync time, memory, CPU
  1. HTTP API Load Test
  • 100 req/s for 60s using vegeta or hey
  • Verify p95/p99 latency targets
  1. Concurrent Sync and Queries
  • Continuous query load during periodic sync
  • Verify < 50% latency increase during sync
  1. Configuration Reload
  • Modify config file, measure detection and sync trigger time
  1. Long-Running Stability
  • 24-hour test with periodic sync
  • Monitor for memory leaks and goroutine leaks

Implementation Tasks

  • Add Go benchmarks for filtering, sync, and config reload
  • Create integration tests with performance assertions
  • Expose Prometheus metrics (sync duration, HTTP latency, memory, goroutines)
  • Enable pprof endpoints for CPU/memory profiling
  • Add CI performance regression tests using benchstat
  • Document profiling procedures in CONTRIBUTING.md

Acceptance Criteria

  • ✅ All sync operations meet time targets for 1000 servers
  • ✅ HTTP API latency targets met under load
  • ✅ Memory stays within limits (no leaks over 24h)
  • ✅ Config reload < 5s end-to-end
  • ✅ No API downtime during sync or reload
  • ✅ CI performance tests prevent regressions > 20%

Monitoring

Key metrics to expose (Telemetry)

  • sync_duration_seconds{source_type}
  • sync_server_count{source_type}
  • http_request_duration_seconds{endpoint,method}
  • process_resident_memory_bytes
  • go_goroutines
  • config_reload_duration_seconds

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions