Measures pg_duckpipe CDC throughput using sysbench workloads.
# Prerequisites: sysbench, PostgreSQL 18 with pg_duckpipe installed
brew install sysbench # or: apt install sysbench
# Run the default benchmark (1 thread, 30 s, 1 table x 100k rows, append-only)
./benchmark/suite/bench.sh
# Customize
./benchmark/suite/bench.sh --threads 4 --duration 60 --table-size 100000 --workload oltp_read_writebench.sh starts a temporary PostgreSQL instance (port 5556), runs the benchmark, and prints results. No cleanup needed.
Run all 4 benchmark scenarios with a single command and get an automated analysis report:
# Full suite (30s per scenario, ~5 min total)
./benchmark/suite/bench_suite.sh
# Quick smoke test (15s per scenario)
./benchmark/suite/bench_suite.sh --duration 15
# View the generated report
cat benchmark/suite/results/report.md| # | Name | Tables | Workload | Evaluates |
|---|---|---|---|---|
| 1 | Single-table append | 1 x 100k | oltp_insert |
Baseline throughput |
| 2 | Multi-table append | 4 x 100k | oltp_insert |
Parallel flush scaling |
| 3 | Single-table mixed DML | 1 x 100k | oltp_read_write |
UPDATE/DELETE correctness |
| 4 | Multi-table mixed DML | 4 x 100k | oltp_read_write |
Combined stress |
The suite automatically generates benchmark/suite/results/report.md containing:
- Summary Table — all scenarios side-by-side (snapshot rate, TPS, lag, catch-up, consistency)
- Flush Performance — per-scenario flush stats (count, avg/p50/p99 latency, rows/flush, phase breakdown)
- Snapshot Timings — per-table snapshot durations from PG logs
- WAL Processing — WAL cycle time statistics
- Issues & Observations — errors, warnings, backpressure events, anomalies
You can also run the analysis standalone on existing logs:
python3 benchmark/suite/analyze_results.py --results-dir benchmark/suite/results| Workload | Description | Flag |
|---|---|---|
oltp_insert |
Append-only inserts (default) | --workload oltp_insert |
oltp_read_write |
Mixed INSERT/UPDATE/DELETE | --workload oltp_read_write |
- Snapshot throughput: time to sync the initial table data to DuckLake
- Streaming throughput: OLTP TPS during continuous CDC
- Replication lag: average and peak WAL lag during the OLTP phase
- Catch-up time: how long after OLTP stops until DuckLake is fully caught up
- Consistency check: verifies source and target data match after sync
If you prefer to control each step:
# Start a temporary PostgreSQL instance on port 5556
./benchmark/suite/start_db.sh
# Run benchmark (default: oltp_insert)
python3 benchmark/suite/run_sysbench.py \
--db-url "host=localhost port=5556 user=postgres dbname=postgres"
# Mixed workload with custom parameters
python3 benchmark/suite/run_sysbench.py \
--workload oltp_read_write \
--tables 4 --table-size 50000 \
--threads 8 --duration 60 \
--db-url "host=localhost port=5556 user=postgres dbname=postgres"
# Stop
./benchmark/suite/stop_db.shControl post-run data verification with --consistency-mode:
| Mode | Behavior |
|---|---|
auto (default) |
safe for append-only, full for mixed |
safe |
Row count comparison only |
full |
Row-by-row value comparison |
off |
Skip verification |
Always use --duration 30 (the default) or longer for results committed to benchmark/suite/results/report.md.
Shorter durations (e.g. 15s) are fine for smoke tests but produce unstable numbers —
fewer flush cycles mean higher variance in latency percentiles and throughput measurements.
See benchmark/suite/results/report.md for the latest full suite results.
Long-running tests (hours) that catch problems invisible in short benchmarks: memory leaks, WAL slot growth, throughput degradation, and crash recovery bugs.
# 1-hour sustained insert (default)
docker compose -f benchmark/soak/docker-compose.soak.yml up --build
# 2-minute smoke test
DURATION=120 docker compose -f benchmark/soak/docker-compose.soak.yml up --build
# Pick a scenario
SCENARIO=sustained-mixed DURATION=3600 docker compose -f benchmark/soak/docker-compose.soak.yml up --build
# Teardown (removes DB volume)
docker compose -f benchmark/soak/docker-compose.soak.yml down -v| Scenario | Workload | Tables | Chaos | Tests for |
|---|---|---|---|---|
sustained-insert |
INSERT only | 4 | None | Baseline stability |
sustained-mixed |
90% INSERT + 10% UPDATE | 4 | None | Mixed DML correctness |
multi-table-insert |
INSERT only | 8 | None | Flush scaling, commit contention |
multi-table-mixed |
Mixed DML | 8 | None | Combined stress |
table-lifecycle |
INSERT only | 4 | Remove/re-add tables every 2 min | Dynamic table add/remove |
error-recovery |
INSERT only | 4 | Stop/start worker every 2 min | Crash recovery, WAL resumption |
| Criterion | Threshold |
|---|---|
| Throughput CV | < 30% |
| Permanent errors | None |
| Final consistency | PASS (source = target row counts) |
| Slot growth | < 1 MB/hr (tests > 1hr) |
Results land in benchmark/soak/soak_results/<scenario>_<timestamp>/ with metrics.csv, events.log, and report.md.
For AI-assisted test execution, see benchmark/soak/RUNBOOK.md.