City-Scale Traffic Forecasting on METR-LA

Team City Scale AI — Nengjia Li, Udula Abeykoon, Anirudh Bharadwaj Vangara, Enhe Bai, Ryan Rana University of Waterloo × Queen's University · Borealis AI / Let's Solve It · 2026

Result

60-min test MAE = 3.283 on METR-LA — #1 among reproducible methods as of May 2026.

Horizon	MAE	RMSE	MAPE
15 min	2.611	4.970	6.78 %
30 min	2.918	5.834	8.06 %
60 min	3.283	6.812	9.61 %

Beats every other public method with working code, including MLCAFormer (3.30), STAEformer (3.34), STD-MAE (3.40), ST-SSDL (3.37), and FUSE-Traffic (3.39). The published numbers above this (TITAN 3.08, TESTAM+ 2.99) have no working public code; see REPORT.md for the full audit.

Architecture

A 4-seed ensemble of STAEformer (CIKM 2023) reproduced on our pipeline, followed by the ST-TTC test-time spectral calibrator (NeurIPS 2025 Spotlight).

4 × STAEformer (seeds 42, 1, 2, 3) ─┐
                                     ├─► uniform-average normalized predictions
                                     │     ↓
                                     │   ST-TTC FFT amplitude+phase calibration
                                     │     (streaming flash-update at test time)
                                     ▼
                          test predictions in raw mph

The full STAEformer architecture (152-dim model, 3 temporal + 3 spatial Transformer layers, 80-dim adaptive embedding, mixed-projection output) is in models/staeformer.py. The ST-TTC calibrator (1 656 parameters total) is in scripts/eval_stae_ensemble.py.

Reproduce the Result (2.5 hours on a single H200)

# 1. Install dependencies
pip install --no-build-isolation \
    "transformers<4.45" causal-conv1d==1.4.0 mamba-ssm==2.2.2 \
    h5py tables pandas scipy einops

# 2. Train 4 STAEformer seeds sequentially (~2 h)
python scripts/train_staeformer.py --tag stae_repro --seed 42 --batch_size 16
bash scripts/run_stae_seeds_v2.sh   # seeds 1, 2, 3

# 3. Final 4-seed ensemble + ST-TTC eval (~5 min)
python scripts/eval_stae_ensemble.py --use_ttc \
    --stae_ckpts "results/staeformer/stae_*/best_stae_s*.pth"
# Expected output: 60-min MAE 3.283

Repository Structure

├── REPORT.md                          full technical report
├── README.md                          this file
├── requirements.txt
│
├── data/                              METR-LA raw files (.h5, .pkl)
├── cache/                             cached GFT artifacts (auto-generated)
│
├── src/                               core pipeline
│   ├── data_utils.py                  H5 + adjacency loaders
│   ├── graph_utils.py                 Laplacian / adjacency utilities
│   ├── gft.py                         Graph Fourier Transform
│   ├── preprocess_v2.py               masked z-score + TOD/DOW features
│   └── dataset_v2.py                  sliding-window dataset
│
├── models/                            architectures
│   ├── staeformer.py                  the headline backbone
│   ├── spectral_ssm.py                our novel SSSM v1–v8 (bi-axis Mamba)
│   ├── graph_wavenet.py               GWNet ensemble member
│   └── hybrid.py                      STAEformer + spectral branch (ablation)
│
├── scripts/                           training & eval
│   ├── train_staeformer.py            STAEformer training (paper-faithful + ablation flags)
│   ├── train_sssm.py                  Spectral State Space Model variants
│   ├── train_gwnet.py                 GraphWaveNet training
│   ├── train_hybrid.py                hybrid training
│   ├── eval_stae_ensemble.py          HEADLINE: 4-seed STAE + ST-TTC eval
│   ├── eval_full_ensemble.py          multi-architecture ensemble with val-weighting
│   ├── eval_ensemble.py               older v4 SSSM ensemble (legacy)
│   └── run_stae_seeds_v2.sh           sequential 4-seed training driver
│
├── legacy/                            preserved earlier work + failed exploration
│   └── README.md                      what's there and why
│
└── docs/
    └── ssh-note.md                    Skynet cluster notes (original team)

What This Repository Contains

Active pipeline (gives the 3.283 result):

src/, models/{staeformer,spectral_ssm,graph_wavenet,hybrid}.py, scripts/train_*.py, scripts/eval_*.py.

Novel research contribution (not used by the headline result, but documented in the report):

models/spectral_ssm.py — the bi-axis spectral Mamba family v1–v8. Mamba scanning along both time and spectral-mode axes is, to our knowledge, absent from every other published METR-LA model. It plateaued at 60-min test MAE 3.82, which we documented as a structural ceiling and then pivoted away from.

Preserved earlier work and negative results in legacy/:

The original project's SpectralGRU + SpectralMambaReal pipeline.
Attempted reproductions that did NOT work (STGormer reproduction came in at 3.58 vs paper's 3.10).
Older exploratory scripts (multi-window, calendar prior, etc.).

The Bigger Picture

We document a reproducibility crisis on METR-LA: every "above-3.30" published claim from 2024–2025 has a serious issue (paper withdrawn from venue, code never released, baseline numbers disagree with peer papers, or reproducers fail to match). Our 3.283 is the best result that anyone can actually run end-to-end from public code. See REPORT.md § 6 for the full audit.

Citation

If you use this work, please cite:

City Scale AI / Borealis AI Let's Solve It 2026, "Spectral State Space Models and a Reproducible
SOTA Pipeline for METR-LA". Internal report. University of Waterloo × Queen's University, 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.claude		.claude
configs		configs
docs		docs
legacy		legacy
models		models
paper		paper
plans		plans
results/disr		results/disr
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REPORT.md		REPORT.md
RESEARCH_LOG.md		RESEARCH_LOG.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

City-Scale Traffic Forecasting on METR-LA

Result

Architecture

Reproduce the Result (2.5 hours on a single H200)

Repository Structure

What This Repository Contains

The Bigger Picture

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

City-Scale Traffic Forecasting on METR-LA

Result

Architecture

Reproduce the Result (2.5 hours on a single H200)

Repository Structure

What This Repository Contains

The Bigger Picture

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages