Bayesian inference for small biological systems. A teaching-oriented Python toolkit that fits classical biological models — Michaelis-Menten enzyme kinetics, Hill dose-response, and a logistic classifier — to noisy experimental data using Markov Chain Monte Carlo (MCMC), with honest uncertainty quantification and convergence diagnostics.
This is an educational portfolio project. It is not clinical software, and the simulated datasets are not drawn from real experiments.
Most biological measurements are noisy: enzyme assays vary between days,
gene-expression replicates disagree, and sample sizes are small. A point
estimate (Vmax = 7.9, full stop) gives a false sense of precision. A
posterior distribution instead answers the more useful question:
Given my prior belief, my model, and these noisy measurements, what values of the parameters are plausible, and how plausible is each?
Concretely:
- A prior
p(θ)encodes what you believed about the parameters before seeing the data — for instance, "enzyme rate constants are positive and rarely larger than 100". - A likelihood
p(data | θ)says how compatible the observed data is with a hypothesized parameter value. - Bayes' rule combines them into the posterior
p(θ | data) ∝ p(data | θ) · p(θ)— the updated belief after the data. - Because the posterior usually has no closed form, we approximate it by drawing samples from it. Markov Chain Monte Carlo is a family of algorithms that produces such samples by random-walking through parameter space in a way that statistically targets the posterior.
The samples can then be summarized in any way you like: posterior means, credible intervals, plots of fitted curves with uncertainty bands, etc.
A drug-response model that says EC50 = 1.0 is much less useful than one
that says EC50 = 1.0 with 95% credible interval [0.6, 1.7]. The interval
tells you whether the next experiment is likely to confirm the result and
whether two compounds are meaningfully different or only nominally so.
For small-n biology — three replicates per dose, twelve enzyme readings
— uncertainty bars are not optional decoration; they are the result.
The Michaelis-Menten equation describes the rate v of an enzyme-catalyzed
reaction as a function of substrate concentration [S]:
v = Vmax · [S] / (Km + [S])
where Vmax is the maximum reaction velocity (achieved when the enzyme
is saturated) and Km is the substrate concentration at which v = Vmax/2.
BayesBio simulates noisy (S, v) pairs and infers Vmax, Km, and the
measurement-noise scale σ.
The Hill equation models a sigmoidal response of expression y to inducer
dose d:
y = baseline + (Emax - baseline) · d^n / (EC50^n + d^n)
EC50 is the dose at half-maximal response, n is the Hill coefficient
(steepness / cooperativity), and baseline/Emax set the floor and ceiling.
BayesBio infers all four together.
A small synthetic dataset of three "biomarker" features per sample and a binary outcome (e.g., tumor responds vs. does not respond). BayesBio fits a Bayesian logistic regression and reports posterior credible intervals on each coefficient — directly showing which features are clearly informative and which are not.
Clone the repo and install in editable mode:
git clone https://github.com/yourname/BayesBio.git
cd BayesBio
python -m pip install -e .Or, if you only want the runtime dependencies:
python -m pip install -r requirements.txtPython 3.10+ is required.
# 1. Generate a synthetic dataset
python -m bayesbio.cli simulate \
--model michaelis_menten \
--out examples/enzyme_kinetics.csv
# 2. Fit the model and write samples + diagnostics + plots + report
python -m bayesbio.cli fit \
--model michaelis_menten \
--input examples/enzyme_kinetics.csv \
--outdir outputs/enzyme_fit \
--truth-json examples/enzyme_kinetics.truth.jsonThe other models work the same way:
python -m bayesbio.cli fit \
--model hill_response \
--input examples/dose_response.csv \
--outdir outputs/dose_response_fit \
--truth-json examples/dose_response.truth.json
python -m bayesbio.cli fit \
--model logistic_regression \
--input examples/cancer_signaling_toy.csv \
--outdir outputs/cancer_model_fit \
--truth-json examples/cancer_signaling_toy.truth.jsonYou can also re-summarize an existing posterior_samples.csv:
python -m bayesbio.cli diagnose \
--samples outputs/enzyme_fit/posterior_samples.csv \
--chains 4 \
--out outputs/enzyme_fit/diagnostics.jsonEvery fit command produces the following inside --outdir:
| File | Contents |
|---|---|
posterior_samples.csv |
One row per draw, columns = chain + each parameter |
parameter_summary.csv |
Mean / sd / quantiles / ESS / R-hat per parameter |
diagnostics.json |
Acceptance rates, ESS, R-hat, free-text warnings |
posterior_histograms.png |
Marginal posterior for each parameter (truth overlaid if provided) |
trace_plots.png |
Per-chain trace, one row per parameter (eyeball convergence) |
model_fit.png |
Observed data + posterior draws of the model curve (continuous models) |
ppc.png |
Posterior predictive check vs. observed (continuous models) |
coefficient_intervals.png |
Forest plot of posterior intervals (logistic regression only) |
final_report.md |
Self-contained Markdown report tying everything together |
BayesBio ships with a from-scratch random-walk Metropolis-Hastings
sampler in bayesbio/samplers.py. The algorithm is
heavily commented so you can read it as a study aid. Briefly:
- Start at an initial parameter vector
θ. - Propose
θ_new = θ + N(0, S)whereSis a per-parameter standard deviation vector (the proposal scale). - Compute the log posterior at
θ_new. - Accept the move with probability
min(1, exp(log_post_new - log_post_old)). - Two-stage warm-up adaptation:
- Stage 1 tunes a single global step size by diminishing-step Robbins-Monro (Andrieu & Thoms, 2008) toward a target acceptance rate of 0.234 (Roberts, Gelman & Gilks, 1997).
- Stage 2 additionally rescales
Sper-parameter using the running standard deviation of the warm-up samples, so parameters on very different scales (e.g.baseline ~ 5andEmax ~ 80) all mix.
- Adaptation is frozen at the end of warm-up — only the post-warmup draws are returned, keeping the sampling phase a valid Markov chain.
- Multiple independent chains are run from different inits so the R-hat convergence diagnostic can flag poor mixing.
Random-walk MH with diagonal-covariance adaptation is still less efficient than gradient-based samplers like NUTS (PyMC, Stan) on strongly correlated posteriors; for those, consider running the same likelihood and priors through PyMC.
The default Normal(0, 1.5) prior on logistic-regression coefficients
is meaningful only for unit-scale features. The CLI therefore
standardizes feature columns (zero mean, unit standard deviation)
before sampling and back-transforms the posterior samples into the
original feature space before saving and plotting them. The user-facing
posterior_samples.csv and intervals are therefore directly
interpretable as "expected change in log-odds per one-unit change in the
raw feature".
diagnostics.json always contains:
- Acceptance rate per chain. Healthy is roughly 0.20-0.50 for random-walk MH.
- Effective sample size (ESS) per parameter, computed via Geyer's initial monotone sequence estimator. ESS far below the raw draw count indicates strong autocorrelation.
- Split-R-hat per parameter (only when ≥2 chains are run). Values meaningfully above 1.01 suggest the chains have not converged to the same distribution.
- A list of free-text warnings when any of the above breach the rules of thumb (R-hat > 1.05, ESS < 100, acceptance outside [0.10, 0.70]).
pytest -qThe tests cover the deterministic models, log-likelihood functions, prior validation, sampler shape/recovery, and CLI argument parsing.
- Synthetic data only. All bundled datasets are drawn from the generative model itself. Real enzyme assays often violate Gaussian noise assumptions (multiplicative / proportional noise, plate effects, background subtraction); the included Gaussian-noise likelihood is the textbook teaching choice but should be revisited before being applied to real assays. A log-normal noise option would be the natural extension.
- Random-walk Metropolis-Hastings is a teaching sampler. It is much less efficient than NUTS (PyMC, Stan) on strongly correlated or higher-dimensional posteriors, even with the per-parameter scale adaptation described above.
- R-hat is the BDA3 split version, not the rank-normalized variant
recommended by Vehtari et al. (2021). For production-grade diagnostics
on harder problems, defer to
arviz.rhat. - No model comparison. WAIC / LOO / Bayes factors are out of scope.
- No hierarchical / partial-pooling models. Each fit is for a single experiment; pooling across plates / batches would be a natural extension.
- Not clinical software. The "cancer signaling" example is an entirely synthetic teaching dataset.
- An optional PyMC backend (NUTS) for the same models, for cross-checking.
- Log-normal observation noise option for enzyme/dose-response data.
- Hierarchical (partial-pooling) version of the dose-response model.
- WAIC / LOO model comparison via
arviz. - A small Snakemake or Nextflow workflow that runs all three fits and collects the reports into a single index.
MIT — see LICENSE.