Benchmarks of various genomic ranges operations
Pre-requisites
- pyenv
➜ polars-bio-bench git:(init) ✗ pyenv --version
pyenv 2.5.0
- poetry
➜ polars-bio-bench git:(init) ✗ poetry --version
Poetry (version 2.0.0)
pyenv install 3.12.8
pyenv local 3.12.8
poetry env use 3.12
poetry update
Please note that you need at least 64GB of RAM to run the full benchmarks. For the default 16-32GB should be enough.
All the benchmarking scenarios are defined in the conf/benchmark_*.yaml
files. By default, the conf/benchmark_small.yaml
file is used.
If you would like to run the benchmarks with a different configuration file, you can specify it using the --bench-config
option.
export BENCH_DATA_ROOT=/tmp/polars-bio-bench/
poetry run python src/run-benchmarks.py --help
INFO:polars_bio:Creating BioSessionContext
Usage: run-benchmarks.py [OPTIONS]
Options:
--bench-config TEXT Benchmark config file (default:
conf/benchmark_small.yaml)
--help Show this message and exit.
conf/benchmark_small.yaml
- small dataset, small number of operations for nearest and overlap, native DataFusion inputconf/benchmark_dataframes.yaml
- as above but with DataFrames (Polars/Pandas) as inputconf/benchmark_large.yaml
- large dataset, large number of operations for nearest and overlap, native DataFusion input