Benchmark

Measures pg_duckpipe CDC throughput using sysbench workloads.

Quick Start

# Prerequisites: sysbench, PostgreSQL 18 with pg_duckpipe installed
brew install sysbench   # or: apt install sysbench

# Run the default benchmark (1 thread, 30 s, 1 table x 100k rows, append-only)
./benchmark/suite/bench.sh

# Customize
./benchmark/suite/bench.sh --threads 4 --duration 60 --table-size 100000 --workload oltp_read_write

bench.sh starts a temporary PostgreSQL instance (port 5556), runs the benchmark, and prints results. No cleanup needed.

Benchmark Suite

Run all 4 benchmark scenarios with a single command and get an automated analysis report:

# Full suite (30s per scenario, ~5 min total)
./benchmark/suite/bench_suite.sh

# Quick smoke test (15s per scenario)
./benchmark/suite/bench_suite.sh --duration 15

# View the generated report
cat benchmark/suite/results/report.md

Scenarios

#	Name	Tables	Workload	Evaluates
1	Single-table append	1 x 100k	`oltp_insert`	Baseline throughput
2	Multi-table append	4 x 100k	`oltp_insert`	Parallel flush scaling
3	Single-table mixed DML	1 x 100k	`oltp_read_write`	UPDATE/DELETE correctness
4	Multi-table mixed DML	4 x 100k	`oltp_read_write`	Combined stress

Analysis Report

The suite automatically generates benchmark/suite/results/report.md containing:

Summary Table — all scenarios side-by-side (snapshot rate, TPS, lag, catch-up, consistency)
Flush Performance — per-scenario flush stats (count, avg/p50/p99 latency, rows/flush, phase breakdown)
Snapshot Timings — per-table snapshot durations from PG logs
WAL Processing — WAL cycle time statistics
Issues & Observations — errors, warnings, backpressure events, anomalies

You can also run the analysis standalone on existing logs:

python3 benchmark/suite/analyze_results.py --results-dir benchmark/suite/results

Workloads

Workload	Description	Flag
`oltp_insert`	Append-only inserts (default)	`--workload oltp_insert`
`oltp_read_write`	Mixed INSERT/UPDATE/DELETE	`--workload oltp_read_write`

What It Measures

Snapshot throughput: time to sync the initial table data to DuckLake
Streaming throughput: OLTP TPS during continuous CDC
Replication lag: average and peak WAL lag during the OLTP phase
Catch-up time: how long after OLTP stops until DuckLake is fully caught up
Consistency check: verifies source and target data match after sync

Manual Steps

If you prefer to control each step:

# Start a temporary PostgreSQL instance on port 5556
./benchmark/suite/start_db.sh

# Run benchmark (default: oltp_insert)
python3 benchmark/suite/run_sysbench.py \
  --db-url "host=localhost port=5556 user=postgres dbname=postgres"

# Mixed workload with custom parameters
python3 benchmark/suite/run_sysbench.py \
  --workload oltp_read_write \
  --tables 4 --table-size 50000 \
  --threads 8 --duration 60 \
  --db-url "host=localhost port=5556 user=postgres dbname=postgres"

# Stop
./benchmark/suite/stop_db.sh

Consistency modes

Control post-run data verification with --consistency-mode:

Mode	Behavior
`auto` (default)	`safe` for append-only, `full` for mixed
`safe`	Row count comparison only
`full`	Row-by-row value comparison
`off`	Skip verification

Duration

Always use --duration 30 (the default) or longer for results committed to benchmark/suite/results/report.md. Shorter durations (e.g. 15s) are fine for smoke tests but produce unstable numbers — fewer flush cycles mean higher variance in latency percentiles and throughput measurements.

Sample Results

See benchmark/suite/results/report.md for the latest full suite results.

Soak Tests

Long-running tests (hours) that catch problems invisible in short benchmarks: memory leaks, WAL slot growth, throughput degradation, and crash recovery bugs.

# 1-hour sustained insert (default)
docker compose -f benchmark/soak/docker-compose.soak.yml up --build

# 2-minute smoke test
DURATION=120 docker compose -f benchmark/soak/docker-compose.soak.yml up --build

# Pick a scenario
SCENARIO=sustained-mixed DURATION=3600 docker compose -f benchmark/soak/docker-compose.soak.yml up --build

# Teardown (removes DB volume)
docker compose -f benchmark/soak/docker-compose.soak.yml down -v

Scenarios

Scenario	Workload	Tables	Chaos	Tests for
`sustained-insert`	INSERT only	4	None	Baseline stability
`sustained-mixed`	90% INSERT + 10% UPDATE	4	None	Mixed DML correctness
`multi-table-insert`	INSERT only	8	None	Flush scaling, commit contention
`multi-table-mixed`	Mixed DML	8	None	Combined stress
`table-lifecycle`	INSERT only	4	Remove/re-add tables every 2 min	Dynamic table add/remove
`error-recovery`	INSERT only	4	Stop/start worker every 2 min	Crash recovery, WAL resumption

Pass/Fail Criteria

Criterion	Threshold
Throughput CV	< 30%
Permanent errors	None
Final consistency	PASS (source = target row counts)
Slot growth	< 1 MB/hr (tests > 1hr)

Results land in benchmark/soak/soak_results/<scenario>_<timestamp>/ with metrics.csv, events.log, and report.md.

For AI-assisted test execution, see benchmark/soak/RUNBOOK.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark

Quick Start

Benchmark Suite

Scenarios

Analysis Report

Workloads

What It Measures

Manual Steps

Consistency modes

Duration

Sample Results

Soak Tests

Scenarios

Pass/Fail Criteria

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Benchmark

Quick Start

Benchmark Suite

Scenarios

Analysis Report

Workloads

What It Measures

Manual Steps

Consistency modes

Duration

Sample Results

Soak Tests

Scenarios

Pass/Fail Criteria