Skip to content

zerotonin/ThermoFooty

Repository files navigation

ThermoFooty

tests docs release Python License: MIT Ruff DOI Pre-registration: OSF AsPredicted Companion: ThermoStrife Companion: reRandomStats

Status: scaffold (v0.1.0-dev0). Pre-registration is locked at OSF; data ingestion and analysis pipelines are under construction per the project dev plan (lab-internal). The Zenodo DOI badge above will be minted at the first tagged release.

Does on-pitch player aggression rise with the day-of-match temperature anomaly at the stadium, and does the same heat signal extend from the pitch to the supporters? ThermoFooty pre-registers and executes a natural-experiment test of the heat-aggression hypothesis on European soccer.

The design exploits the fact that fixtures are scheduled before weather is realised, eliminating the Field-1992 outdoor-opportunity confound that limits modern crime-data designs. The same scheduled- fixture identification underlies every analysis in the project, from the Big-5 European league panel (H1: ~150 000+ matches) to the tournament panel (H6/H6b on Qatar 2022 Stadium 974 as a within- tournament natural-control on cooled vs naturally-ventilated venues).

The full pre-registered design — primary confirmatory test plus 17 auxiliary hypotheses across three independently BH-FDR-corrected batteries — is locked at OSF (10.17605/OSF.IO/YZVAK) with an AsPredicted one-pager cross-post for the H1 confirmatory test (aspredicted.org/av2un9.pdf).

Position in the lab's heat-aggression programme

ThermoFooty is one chapter of a three-track cross-species programme:

  • ThermoKourtDrosophila track. Behavioural-arena heat-aggression assays under controlled thermal manipulation.
  • ThermoStrife (Zenodo DOI 10.5281/zenodo.20371612) — human-data track, historical-uprisings analysis. 112-event case-crossover panel 1750–2024 with four-tier weather backfill; headline OR = 1.089 per +1 °C above local same-month baseline.
  • ThermoFooty (this repo) — human-data track, soccer panel. Pre-registered natural-experiment test on scheduled fixtures 1970–2026, addressing the small-n + selection-bias critiques of the ThermoStrife historical panel.

The three tracks publish separately but share the conceptual hypothesis and (where appropriate) code: the four-tier weather cascade ThermoFooty uses is vendored verbatim from ThermoStrife v0.1.1, and every statistical estimator routes through reRandomStats v0.2.0+ (case_crossover, model_comparison, dose_response).

Pre-registered hypotheses (summary)

Battery Hypothesis Quick description
PRIMARY H1 Per-match red-card-for-violent-conduct odds rise with stadium-day Tmax anomaly. Time-stratified case-crossover conditional logit on Big-5 1970–2026. Single confirmatory test, uncorrected α = 0.05, one-sided.
LEAGUE auxiliary (7 tests, BH FDR q = 0.05) H2 Crowd-violence arrests (pooled UK Home Office + ZIS-Jahresberichte) rise with the same anomaly exposure.
H3 Heat coefficient attenuated in closed-roof / cooled stadia.
H4 / H4b Heat × stakes interaction on player cards / crowd arrests.
H5 Within-player FE: same player carded more in hot matches.
H0_spec Aggression-set cards rise faster than non-card fouls (mechanism specificity).
H_league_het LRT for cross-league slope heterogeneity.
DOSE-RESPONSE (4 tests, BH FDR q = 0.05) H_break_pop / H_break_player Segmented regression + Davies test + 4PL Hill rescue; population and per-player breakpoints.
H_mobility_transfer / H_mobility_dual Player-transfer natural experiment on absolute-vs-anomaly exposure.
TOURNAMENT (6 tests, BH FDR q = 0.05) H6 Cooled-stadia attenuation in pooled tournament panel.
H6b Qatar 2022 Stadium 974 (naturally ventilated, n=7) vs the seven cooled venues (n=57).
H7 / H7c Hot-vs-cool host World Cups (Qatar excluded; Qatar as own descriptive category).
H8 / H_omnibus Tournament-family / tournament-edition heterogeneity LRTs.

Full specifications are locked in the OSF pre-registration (10.17605/OSF.IO/YZVAK); the lab's internal source draft is mirrored there verbatim.

Repository layout

ThermoFooty/
├── pyproject.toml
├── CITATION.cff
├── environment.yml
├── LICENSE                          ← MIT
├── README.md
├── data → $THERMOFOOTY_DATA_ROOT             ← symlink (gitignored)
├── db/
│   ├── schema.sql                   ← committed canonical DDL
│   └── migrations/                  ← alembic-lite NNNN_<slug>.sql
├── thermofooty/                     ← Python package
│   ├── __init__.py
│   ├── constants.py                 ← Wong palette, paths, type aliases
│   ├── config.py                    ← THERMOFOOTY_DATA_ROOT env var
│   ├── db/                          ← SQLite session, schema-version check
│   ├── sources/                     ← football_data_uk, fbref, home_office, zis
│   ├── weather/                     ← vendored cascade from ThermoStrife v0.1.1
│   ├── lookup.py                    ← (stadium, date) → AnomalyFetch
│   ├── panel.py                     ← analysis_panel materialiser
│   ├── inference.py                 ← thin wrapper around reRandomStats
│   └── viz.py                       ← Wong-palette figures
├── scripts/                         ← ingestion + analysis CLI scripts
├── tests/
├── docs/                            ← Sphinx docs
└── .github/workflows/               ← tests + docs + release + network-tests

Data layout

All data lives off-repo under $THERMOFOOTY_DATA_ROOT/, exposed via the gitignored data/ symlink. Set the env var to wherever your fast storage lives (an external NVMe, a network mount, the HPC scratch directory, …).

Each entry is tagged with the data family it belongs to: [db] canonical store · [match] football events + lineups · [crowd] crowd-violence reports · [stadium] geometry + metadata · [weather] temperature sources · [derived] materialised analysis tables · [ops] logs & housekeeping.

$THERMOFOOTY_DATA_ROOT/
├── db/
│   └── thermofooty.sqlite           ← [db]      canonical SQLite (built from db/schema.sql)
├── raw/
│   ├── football_data_uk/            ← [match]   season-per-CSV downloads
│   ├── fbref_html/                  ← [match]   scraped match-report HTML cache
│   ├── home_office_pdfs/            ← [crowd]   UK arrests bulletins
│   ├── zis_jahresberichte/          ← [crowd]   Bundespolizei annual reports
│   ├── stadia/                      ← [stadium] coordinate CSVs, lineup overrides
│   └── observatories/hadcet/        ← [weather] HadCET daily totals files
├── cache/
│   ├── meteostat/                   ← [weather] parquet per (station, year-month)
│   ├── era5/                        ← [weather] parquet per (cell, year-month)
│   ├── twentycr/                    ← [weather] parquet per (cell, year)
│   └── fbref_parsed/                ← [match]   parsed JSON per match (dedupe key)
├── derived/
│   └── analysis_panel.parquet       ← [derived] materialised join per ingestion pass
└── logs/                            ← [ops]     ingestion + analysis logs

Installation

Two supported routes — pick one. The conda route is the reproducible default; the pip route is lighter if you already have a Python ≥ 3.11 on $PATH.

Conda env (recommended)

environment.yml pins Python 3.11 + every dependency the cascade, ingestion, and inference layers need, plus rerandomstats from the locked v0.2.0 tag. One command bootstraps the whole stack:

git clone https://github.com/zerotonin/ThermoFooty.git
cd ThermoFooty

conda env create -f environment.yml      # creates the `thermofooty` env
conda activate thermofooty
pip install -e . --no-deps               # adds ThermoFooty itself in editable mode

# Point `data/` at wherever your fast storage lives.
export THERMOFOOTY_DATA_ROOT=/path/to/your/ThermoFooty
ln -sf "$THERMOFOOTY_DATA_ROOT" data

To refresh after a dependency bump: conda env update -f environment.yml --prune.

Pip only (lighter)

git clone https://github.com/zerotonin/ThermoFooty.git
cd ThermoFooty
pip install -e ".[all]"

# Point `data/` at wherever your fast storage lives.
export THERMOFOOTY_DATA_ROOT=/path/to/your/ThermoFooty
ln -sf "$THERMOFOOTY_DATA_ROOT" data

Prefer not to rely on the env var being exported in every shell? Copy the committed template and fill it in — local_paths.json is gitignored so absolute paths never leak into a commit:

cp local_paths.template.json local_paths.json
# edit local_paths.json -> set data_root to your absolute path

Resolution order, first hit wins: env var → local_paths.json → in-repo data/ symlink.

Python ≥ 3.11 required (meteostat 2.x dropped 3.10). For the ERA5 fallback tier you additionally need a free Copernicus CDS API key in ~/.cdsapirc (gitignored).

Citation

If you use ThermoFooty in published work, please cite both the software (version DOI to appear on first GitHub Release) and the underlying OSF pre-registration:

Geurten, B. R. H. (2026). ThermoFooty: heat as an acute trigger of on-pitch aggression — pre-registered natural-experiment test on European soccer. OSF. https://doi.org/10.17605/OSF.IO/YZVAK

Full citation metadata in CITATION.cff. Companion citations for ThermoStrife and reRandomStats are listed in the same file under references.

Authors

Bart R. H. Geurten — Department of Zoology, University of Otago, Dunedin, New Zealand. ORCID 0000-0002-1816-3241.

License

MIT — see LICENSE.

Releases

No releases published

Packages

 
 
 

Contributors

Languages