QueryLens is a reproducible local systems project for PostgreSQL performance observability. It collects query telemetry, fingerprints normalized SQL, captures safe plan snapshots, and detects deterministic regressions. It now includes reliability primitives: idempotent ingestion, retry/backoff, and DLQ routing.
PostgreSQL (pg_stat_statements, pgvector)
-> C++ collector (libpqxx, protobuf, Kafka producer)
-> Redpanda/Kafka topics (query-telemetry, collector-heartbeats, telemetry-dlq)
-> FastAPI + aiokafka consumer (idempotent persistence + regression engine)
-> PostgreSQL querylens schema
-> React dashboard
-> Prometheus/Grafana
See:
docs/ARCHITECTURE.mddocs/OPERATIONS.mddocs/BENCHMARKS.mddocs/REGRESSION_EVALUATION.md
- C++ telemetry collector with:
- SQL normalization and SHA-256 fingerprinting
- vector operator detection (
<=>,<->,<#>) - safe EXPLAIN gating for SELECT/WITH
- protobuf telemetry event publishing to Kafka
- FastAPI control plane with:
- Kafka consumer (
aiokafka) - deterministic regression detection (8 classes incl. vector index bypass)
- idempotent event ingestion (
event_idunique key) - retry/backoff and DLQ routing
- Prometheus
/metrics
- Kafka consumer (
- PostgreSQL schema/migrations with snapshot + regression + DLQ tables
- Prometheus + Grafana provisioning and alert rules
- Demo workflows (
make demo) - Benchmark/evaluation harnesses:
- ingestion benchmark script
- regression evaluation script
- Exactly-once delivery semantics
- Kubernetes deployment manifests
- gRPC service APIs
- Managed cloud production deployment
make setup
make build
make up
make migrate
make seed
make test
make demomake benchmark N=10000
make benchmark N=50000
make benchmark-100k
make regression-evalOutputs:
benchmark_results/querylens_benchmark_<N>.json/.csvbenchmark_results/regression_eval.json/.csv
querylens_duplicate_events_totalquerylens_ingest_retries_totalquerylens_dlq_events_totalquerylens_telemetry_persist_failures_totalquerylens_kafka_consumer_lag
Prometheus alert rules:
- high consumer lag
- persistence failures
- DLQ events
- critical regressions
- high API p95 latency
Defined in infra/prometheus/alerts.yml.
- Built a PostgreSQL observability platform that streams C++-collected query telemetry into Kafka and applies deterministic regression detection on persisted metric/plan snapshots.
- Hardened ingestion reliability with idempotent event keys, bounded retry/backoff, and DLQ routing to avoid silent event loss during consumer persistence failures.
- Added reproducible systems evaluation harnesses for ingestion throughput/latency/lag recovery and rule-engine precision/recall/F1 on seeded regression scenarios.
- Operationalized the stack with Prometheus metrics, Grafana provisioning, alert rules, and Docker Compose workflows for end-to-end reproducible demos.