This repository contains benchmarks for various database systems using the TPC-H lineitem table.
Result Recording Database (Optional): If set, benchmark results are recorded to a PostgreSQL database. If not set, results are printed to stdout instead.
If using local development, initialize the result recording database:
./result-recording/init_result_recording_db.shThen set the connection string:
export BENCHMARK_RESULT_DSN="postgresql://benchmark:benchmark123@localhost:5433/benchmark-result"If using a remote database, set the connection string:
export BENCHMARK_RESULT_DSN="postgresql://user:password@host:port/database"Benchmarks are run using pytest. Each runner is a test that spins up Docker Compose services, executes the benchmark, and tears down.
Note: Initial Docker builds may take long time due to turbodbc compiling from source. This is a one-time cost and subsequent builds use Docker cache and complete in seconds.
Options:
--scale <factor>: TPC-H scale factor (default: 1)--iterations <num>: Number of iterations per benchmark (default: 1)--run-type <type>: Run type label stored with results, e.g.local,ci(default: local)--mode <mode>: Benchmark mode —all,ingest, orquery(default: all)
Examples:
# List all available tests
pytest --collect-only
# Run all benchmarks
pytest --scale 1
# Run a specific database
pytest -m bigquery --scale 1
# Run all ADBC tests across all databases
pytest -m adbc --scale 0.01
# Run with multiple iterations
pytest --scale 0.01 --iterations 10
# Run query benchmarks only
pytest --scale 0.01 --mode query
# Run ingest benchmarks only
pytest --scale 0.01 --mode ingestRequired Environment Variables:
export BIGQUERY_PROJECT_ID=your-project-id
export BIGQUERY_DATASET_ID=your-dataset-idAuthentication: Before running BigQuery benchmarks, authenticate with Google Cloud:
gcloud auth login
gcloud auth application-default loginRun Command:
pytest -m bigquery --scale 1 --mode queryNote: arrow-odbc, turbodbc, and pyodbc require --mode query as ingest operations hang. If run without --mode query, these benchmarks will fail.
Run Command:
pytest -m duckdb --scale 1Note: arrow-odbc and turbodbc benchmarks are skipped due to DuckDB ODBC driver crashing. Can be enabled by removing the @pytest.mark.skip decorator from the respective test functions in test_duckdb.py.
These benchmarks represent our current understanding of optimal usage patterns for each driver and method. However, we acknowledge that we may not be implementing the most efficient approach in all cases.
We welcome feedback! These benchmarks are continuously evolving and and community input is invaluable. If you have suggestions, please open an issue with your suggestions. We're committed to making these benchmarks as accurate and representative as possible.