This file is read automatically by Claude Code at the start of every session. It explains what this project is, how to work in it, and what conventions to follow.
Trading-Crab is a market regime classification and prediction pipeline written in Python.
The core idea: macro-economic time series (quarterly, ~1950–present) are used to label each calendar quarter with a "market regime" (e.g. Stagflation, Growth Boom, Rising-Rate Slowdown) using unsupervised clustering. Those labels then feed supervised models that (a) predict today's regime from currently-available data, (b) predict regime transitions 1–8 quarters forward, and (c) rank asset-class performance within each regime to produce portfolio recommendations.
End goal: a weekly automated report that says "current regime is X, these assets are green, hold / buy / sell."
The reference implementation lives in legacy/. Two layers of reference exist:
legacy/unified_script.py— the original 1249-line monolith; ground truth for every algorithm, formula, and parameter choice.legacy/*.pymodular scripts — a refactored version of the monolith organized into:config.py,data_ingestion.py,feature_engineering.py,clustering.py,regime_analysis.py,supervised.py,asset_returns.py,portfolio.py,plotting.py,pipeline.py. These are used as the design reference for thesrc/trading_crab_lib/package. Do not modify legacy files.
The modular pipeline in src/ and pipelines/ should do everything
that script does, organized more cleanly, with checkpointing, CLI flags, and
dedicated plotting notebooks.
trading-crab/
├── CLAUDE.md ← you are here
├── README.md ← project overview (user-facing)
├── scratch/README.md ← extended design notes
├── .env.example ← copy to .env, fill in FRED_API_KEY
├── pyproject.toml ← pip-installable package (src layout)
│
├── config/
│ ├── settings.yaml ← ALL tuneable parameters live here
│ └── regime_labels.yaml ← manually-pinned regime names (edit after clustering)
│
├── data/ ← gitignored; created at runtime
│ ├── raw/ ← macro_raw.parquet, asset_prices.parquet
│ ├── processed/ ← features.parquet (after step 02)
│ ├── regimes/ ← cluster_labels.parquet, profiles.parquet, …
│ └── checkpoints/ ← timestamped pickle checkpoints (see CheckpointManager)
│
├── legacy/ ← reference implementation; do not modify
│ ├── unified_script.py ← THE reference — all logic must be reachable here
│ └── step{1-5}_*.ipynb ← original Jupyter notebooks
│
├── notebooks/ ← plotting/exploration notebooks (one per pipeline stage)
│ ├── 01_ingestion.ipynb
│ ├── 02_features.ipynb
│ ├── 03_clustering.ipynb
│ ├── 04_regimes.ipynb
│ ├── 05_prediction.ipynb
│ ├── 06_assets.ipynb
│ ├── 07_pairplot.ipynb
│ ├── 08_diagnostics.ipynb
│ └── 09_raw_series.ipynb
│
├── pipelines/ ← runnable pipeline steps
│ ├── 01_ingest.py
│ ├── 02_features.py
│ ├── 03_cluster.py
│ ├── 04_regime_label.py
│ ├── 05_predict.py
│ ├── 06_asset_returns.py
│ └── 07_dashboard.py
│
├── run_pipeline.py ← master entry point with --steps / --refresh / --plots
│
├── outputs/ ← gitignored; created at runtime
│ ├── models/ ← pickled sklearn models
│ ├── plots/ ← saved figures (PNG/PDF)
│ └── reports/ ← dashboard.csv, weekly summaries
│
└── src/trading_crab_lib/ ← installable Python package (`pip install trading-crab-lib`)
├── __init__.py ← ROOT, CONFIG_DIR, DATA_DIR, OUTPUT_DIR; load, CheckpointManager, RunConfig
├── paths.py ← LibraryPaths, resolve_library_paths (consumer installs)
├── checkpoints.py ← CheckpointManager (parquet + manifest under data/checkpoints/)
├── config.py ← load(), setup_logging(), load_portfolio()
├── runtime.py ← RunConfig dataclass (verbose, plots, refresh flags)
├── transforms.py ← ratios, log, select, gap-fill, derivatives, engineer_all
├── clustering.py ← reduce_pca, evaluate_kmeans, fit_clusters, gap statistic, …
├── cluster_comparison.py
├── gmm.py
├── density.py
├── spectral.py
├── regime.py ← build_profiles, suggest_names, build_transition_matrix, …
├── prediction.py ← orchestration helpers
├── prediction/ ← classifier, TSCV, feature gating, model metrics
│ └── classifier.py
├── ingestion/ ← multpl, fred, assets, grok, …
├── asset_returns.py ← quarterly returns, rank by regime, proxy fallback
├── reporting.py ← dashboard signals, portfolio blending, generate_recommendation, weekly report
├── tactics.py
├── diagnostics.py
├── email.py
└── plotting.py ← ALL visualization helpers (used by notebooks + pipelines)
python run_pipeline.py --refresh --recompute --plotspython run_pipeline.py --steps 3,4,5,6,7 --plotspython pipelines/01_ingest.py
python pipelines/02_features.py
python pipelines/03_cluster.py
python pipelines/04_regime_label.py
python pipelines/05_predict.py
python pipelines/06_asset_returns.py
python pipelines/07_dashboard.py| Flag | Effect |
|---|---|
--refresh |
Re-scrape multpl.com + re-hit FRED API (slow, ~10 min) |
--recompute |
Recompute features from cached raw data (skips scraping) |
--plots |
Generate all matplotlib figures and save to outputs/plots/ |
--verbose |
Set logging level to DEBUG |
--steps 1,3,5 |
Run only the listed step numbers |
--no-constrained |
Skip k-means-constrained (if not installed) |
--market-code NAME |
Load market_code from grok, clustered, predicted, or any saved checkpoint |
--save-market-code |
After step 3, save balanced_cluster as market_code_clustered checkpoint |
--show-plots |
Call plt.show() in addition to saving (avoid in headless/CI) |
pip install -e ".[dev]"
jupyter lab notebooks/Fresh clone in Cursor / VS Code: create .venv, install dev deps, and select the workspace interpreter — see docs/CURSOR.md (.venv is gitignored by design).
# 1. Install package + dev extras
pip install -e ".[dev]"
# 2. Optional but recommended for balanced clustering
pip install k-means-constrained
# 3. Set FRED API key (free at fred.stlouisfed.org/docs/api/api_key.html)
cp .env.example .env
# edit .env: FRED_API_KEY=your_key_here
# 4. Verify
python -c "from trading_crab_lib.config import load; print(load()['data'])"| Package | Purpose |
|---|---|
fredapi |
FRED macroeconomic data |
lxml |
Fast HTML parsing for multpl.com scraper |
yfinance |
ETF/equity price history |
scipy |
BPoly.from_derivatives for gap filling |
scikit-learn |
PCA, KMeans, RandomForest |
k-means-constrained |
Balanced-size clustering (optional) |
matplotlib / seaborn |
All visualization |
pyarrow |
Parquet checkpoint I/O |
Every pipeline step checks CheckpointManager.is_fresh(name) before recomputing.
Checkpoints are stored as parquet files under data/checkpoints/ with a manifest
tracking creation timestamp and config hash. Pass --refresh or --recompute to
force regeneration. This is the most important usability feature for day-to-day
development — scraping 46 URLs every run is ~10 minutes.
Preservation secondaries (macro_raw_secondary, features_secondary,
features_supervised_secondary) are wide snapshots written from steps 1–2 so
downstream dropna(axis=1) in memory does not erase column history. They update
when missing, on --refresh / --recompute (or --refresh-preservation to force),
and are skipped after partial macro column repair; CheckpointManager.clear_all()
keeps them unless you clear() them by name.
All runtime behaviour is controlled by a RunConfig dataclass (not hardcoded in
modules). Construct it once in run_pipeline.py or any pipeline step, and pass it
through. Key flags mirror the legacy script:
@dataclass
class RunConfig:
verbose: bool = False
generate_plots: bool = False
generate_pairplot: bool = False # seaborn pairplot (slow)
generate_scatter_matrix: bool = False # pandas scatter_matrix (slow)
refresh_source_datasets: bool = False # re-scrape multpl + FRED
recompute_derived_datasets: bool = False # recompute features from cached raw
save_plots: bool = True # save figures to outputs/plots/
show_plots: bool = False # plt.show() (use False in CI/headless)GDP (fred_gdp) and GNP (fred_gnp) are shifted +1 quarter in fred.py to prevent
look-ahead bias. The raw BEA release comes ~30 days after quarter end, so at the end
of Q1 you cannot know Q1 GDP. This is set per-series in config/settings.yaml
(shift: true).
- Cross-asset ratios (10 derived columns: div_yield2, price_gdp, credit_spread, etc.)
- Log transforms (23 columns → log_{col})
- Narrow to
initial_features(36 columns + market_code) - Bernstein polynomial gap filling (interior NaNs) + Taylor extrapolation (edges)
- Smoothed derivatives via
np.gradienton day-number time axis (d1, d2, d3 per column) - Narrow to
clustering_features(69 columns + market_code)
Steps 3 and 6 are controlled by initial_features and clustering_features lists in
config/settings.yaml. Edit those lists there — not in the Python code.
The legacy analysis established 5 PCA components as the working baseline.
n_pca_components: 5 in settings.yaml. Do not switch to variance-threshold
PCA without benchmarking first — it changes the cluster geometry.
fit_clusters() always returns both cluster (best-k from silhouette, capped at
k_cap) and balanced_cluster (size-constrained at balanced_k). Downstream
steps default to balanced_cluster for regime labeling because equal-size clusters
are better for per-regime statistics with limited data.
All visualization helpers live in src/trading_crab_lib/plotting.py. Notebooks import
from there — they do not define plotting logic inline. Every plot function accepts
run_cfg: RunConfig and honours save_plots / show_plots. Output filenames are
standardized as outputs/plots/{step}_{description}.png.
Five-regime color palette from the legacy script:
CUSTOM_COLORS = ["#0000d0", "#d00000", "#f48c06", "#8338ec", "#50a000"]Use plotting.REGIME_CMAP everywhere for consistency.
Scraped via lxml cssselect from #datatable. All URLs and value_type metadata
are in config/settings.yaml under multpl.datasets. Do not hardcode URLs in Python.
Rate-limited to 2 seconds per request (RATE_LIMIT_SECONDS).
Current: GDP (shifted +1Q), GNP (shifted +1Q), BAA, AAA, CPI (CPIAUCSL), GS10, TB3MS.
Planned additions (see ROADMAP.md Tier 1):
- VIXCLS (VIX, 1990+), UNRATE (unemployment, 1948+), M2NS (money supply, 1959+)
- T10Y2Y (10Y-2Y spread), GS2 (2Y Treasury), HOUST (housing starts), UMCSENT
Requires FRED_API_KEY in .env. Free registration at fred.stlouisfed.org.
Gold spot price back to 1915, WTI crude oil back to 1946, silver, copper.
See ROADMAP.md Tier 1 item 1.5 and src/trading_crab_lib/ingestion/macrotrends.py (to be created).
Scraping approach: extract embedded JSON from <script>var rawData={...}</script> tags.
SPY, GLD, TLT, USO, QQQ, IWM, VNQ, AGG — monthly adjusted close, resampled to
quarterly. Fetched in ingestion/assets.py. No API key required.
data/grok_quarter_classifications_20260216.pickle — an external LLM-assisted
classification of quarters used as a visual reference overlay in notebooks. Not used
for model training. Loaded via ingestion/grok.py (or directly in notebooks).
All tuneable parameters are in config/settings.yaml. Key sections:
| Section | Key parameters |
|---|---|
data |
start_date, end_date, frequency |
fred.series |
per-series name + shift flag |
multpl.datasets |
list of [name, description, url, value_type] |
features.log_columns |
columns to log-transform |
features.initial_features |
columns retained before gap fill |
features.clustering_features |
final columns fed to PCA |
features.derivative_window |
rolling mean window for np.gradient smoothing |
clustering.n_pca_components |
fixed at 5 |
clustering.n_clusters_search |
upper bound for k-sweep (default 12) |
clustering.k_cap |
max k accepted from silhouette (default 5) |
clustering.balanced_k |
k for size-constrained KMeans (default 5) |
prediction.forward_horizons_quarters |
[1, 2, 4, 8] |
prediction.cv_splits |
5 (TimeSeriesSplit folds) |
prediction.dt_max_depth |
8 (DecisionTree depth) |
prediction.rf_max_depth |
12 (RandomForest max depth) |
- The feature pipeline order — cross-ratios → log → select → gap-fill → deriv → select. The Bernstein gap fill must happen AFTER log transform so it interpolates in log space.
- Publication-lag shifts — GDP and GNP must always be shifted. Do not remove without explicit approval.
clustering_featureslist — this is analytically determined. Changes here change the clustering geometry and invalidate any manually pinnedregime_labels.yaml.n_pca_components = 5— changing this changes which regimes you find. Benchmark first.- Submodules are reference-only — do not push commits into submodules; treat them as read-only sources for alignment checks.
You may
git pull/update submodules locally, but any code changes must be made in the main repo unless we explicitly discuss otherwise. - Saving to
.envor committing API keys — never. Use.env.exampleonly.
Cross-reference legacy/unified_script.py and the legacy/*.py modules for
ground truth. Items marked ✓ are verified as matching in src/. Items marked
✗ are known gaps that still need to be implemented or aligned.
- Scraping — lxml
cssselect("#datatable tr"), user-agent string, 2s rate limit - FRED — per-series
shift, quarterly resample with.last() - Cross-ratios — exact 10 formulas (div_yield2, price_div, price_gdp, price_gdp2, price_gnp2, div_minus_baa, credit_spread, real_price2, real_price3, real_price_gdp2)
- Log transform —
np.log(col.clip(lower=1e-9)) - Gap filling —
BPoly.from_derivativeswith 4 boundary conditions per side (value + d1 + d2 + d3); Taylor extrapolation for leading/trailing edges - Derivatives —
np.gradienton matplotlib day-number axis + centered rolling mean of window=5 before and after each gradient call - PCA —
StandardScaler→PCA(n_components=5)→ re-StandardScalerbefore KMeans - K-sweep —
range(2, 13)withn_init=50, silhouette + CH + DB - Balanced clustering —
KMeansConstrained(size_min=bucket-2, size_max=bucket+2) - Color palette —
["#0000d0", "#d00000", "#f48c06", "#8338ec", "#50a000"]
- ✓
DecisionTreeClassifiertraining for interpretability — implemented inclassifier.py - ✓
TimeSeriesSplitcross-validation — implemented via_tscv_scores()helper - ✓ Portfolio construction —
generate_recommendation()/ blended portfolios inreporting.py - ✓ Macro-data fallback for asset returns —
compute_proxy_returns()inasset_returns.py - ✓ Confusion matrix heatmap (walk-forward CV) —
plot_regime_confusion_matrix()inplotting.py; data fromoutputs/reports/model_metrics/confusion_matrices.parquet; PNGoutputs/plots/05_confusion_matrix.pngwhen step 5 runs with plots (run_pipeline.py --steps 5 --plotsorpipelines/05_predict.py --plots). Legacygenerate_classification_report()console formatting is not duplicated; sklearnclassification_report()string remains in logs only.
- ✓ Real ETF price data via yfinance (SPY, GLD, TLT, USO, QQQ, IWM, VNQ, AGG) instead of macro-data proxies
- ✓
CheckpointManagerwith parquet + manifest (vs. ad-hoc pickle/CSV) - ✓
RunConfigdataclass for clean flag management - ✓ All config in
settings.yaml(vs. hardcoded Python constants) - ✓ Full CLI in
run_pipeline.pywith--steps,--refresh,--recompute, etc. - ✓ Dedicated exploration notebooks (01–09)
- Python 3.10+ — use
match,|union types,X | NonenotOptional[X] - Type hints on all public functions
loggingeverywhere, noprint()in library code (only inpipelines/andrun_pipeline.py)- No bare
except:— always catch specific exception types - All file paths via
pathlib.Path, never string concatenation
- DataFrames: noun describing contents (
features,pca_df,clustered,returns) - Series: noun describing the single variable (
labels,cluster) - Functions: verb_noun (
fetch_all,apply_log_transforms,build_profiles) - Config keys:
snake_casethroughout YAML
- Stored under
data/checkpoints/{name}.parquet(DataFrames) or{name}.pkl(models) - Preservation secondaries: see Checkpoint system above; names in
PRESERVATION_CHECKPOINT_NAMES(trading_crab_lib.checkpoints). - Always prefer parquet over pickle for DataFrames (smaller, typed, readable)
- Pickle only for sklearn models (no parquet-serializable alternative)
- Never commit data files —
data/andoutputs/are in.gitignore
pytest tests/ -v
make lint # Ruff: lint + format check (same paths as CI)Tests live under tests/. Unit tests should not require network access — mock
requests.get for scraping tests and FRED API calls. Use fixtures from tests/conftest.py.
- Conventional format:
feat:,fix:,refactor:,docs:,test:,chore: - Example:
feat: add yfinance asset price ingestion (step 06) - Branch: always
claude/description-sessionID— never push directly tomain
See STATE.md for a full breakdown of what runs, what's tested, and what output
files are produced. See ROADMAP.md for prioritized feature backlog.
See ARCHITECTURE.md for design decisions. See PITFALLS.md for known gotchas.
- Steps 01–07 run end-to-end; pipeline verified on real data (all 7 steps)
CheckpointManager— fully implemented; parquet + manifestRunConfig— fully implemented, includingfrom_args()factoryrun_pipeline.py— master runner with full CLI (all flags implemented)ingestion/assets.py— yfinance ETF price fetcher + 3-phase fallback chain (stooq → OpenBB → macro proxy)plotting.py— 18 visualization helpers covering all 7 pipeline steps (includesplot_regime_confusion_matrix)notebooks/01–09— all notebooks present; 03_clustering expanded with 28 investigation cells- Requirements — minimum-bound strategy, Python 3.10+ compatible
from __future__ import annotations— present in all source files usingX | Ysyntax- Unit tests — 377+ collected tests (8 skipped: HDBSCAN) covering all modules including clustering investigation suite
- Gap 1 —
TimeSeriesSplitCV inclassifier.py(5-fold walk-forward) - Gap 2 —
DecisionTreeClassifierinclassifier.py(max_depth=8) - Gap 3 —
reporting/portfolio.py— simple + blended portfolio + BUY/SELL/HOLD - Gap 4 —
compute_proxy_returns()fallback inassets/returns.py - Causal smoothing —
engineer_all(causal=True/False)+ dual parquet outputs from step 2 - Clustering investigation suite — GMM (
gmm.py), DBSCAN/HDBSCAN (density.py), Spectral (spectral.py), multi-method comparison (cluster_comparison.py), gap statistic + SVD + PCA sweep inclustering.py - Config-driven var lists —
plotting.sample_series,plotting.key_indicators,assets.etfsall insettings.yaml - 16 ETFs — expanded from 8 to 16 (added HYG, XLK, XLP, XLE, GDX, TIP, BIL, EDV)
pythonpath = ["src", "."]inpyproject.tomlpytest config — tests run withoutpip install -e .(repo root on path forrun_pipelineimports in pipeline tests).
- Additional FRED series — VIX (VIXCLS), unemployment (UNRATE), M2 (M2NS), yield spreads (T10Y2Y, T10Y3M, GS2), housing starts (HOUST), consumer sentiment (UMCSENT)
- macrotrends.net scraper — gold/oil spot prices back to 1915/1946
- LightGBM classifier — alongside RF + DT in
classifier.py - Expand test suite — classifier, portfolio, dashboard
end_date: nullin settings.yaml → use today's date at runtime- Per-asset regime probability models ("Putting it all together — Part I")
- Weekly automated report with AI-written narrative via Claude API
Note: Yield-curve spreads
yc_*andbuild_forward_window_probabilities()are shipped — seetransforms.py,regime.py, and rootROADMAP.md§1.3–1.4.
profiler.pynaming heuristics silently skip 4 features (10yr_ustreas,fred_gs10,fred_tb3ms,div_minus_baa) because only their derivatives are inclustering_features. Graceful fallback is intentional.- Only 7 FRED series currently ingested; many high-value series (VIX, unemployment, M2, yield curve) are not yet fetched.
- ETF data starts 1993-2006; pre-1993 gold and oil regime analysis uses proxy columns only. macrotrends.net backfill would extend coverage to 1915+ for gold.
- Clustering uses KMeans which treats each quarter independently; HMM would model temporal autocorrelation natively (Tier 2 roadmap item).
Full comparison of legacy/*.py vs src/trading_crab_lib/ completed March 2026.
- ✓ Gap 1 — TimeSeriesSplit CV (
legacy/supervised.py→classifier.py) - ✓ Gap 2 — DecisionTreeClassifier (
legacy/supervised.py→classifier.py) - ✓ Gap 3 — Portfolio construction (
legacy/portfolio.py→reporting/portfolio.py) - ✓ Gap 4 — Macro-data proxy returns fallback (
legacy/asset_returns.py→assets/returns.py) - ✓ Gap 5 — Causal/backward rolling windows for supervised learning (
transforms.py) - ✓ Gap 6 — Empirical forward-window probabilities —
build_forward_window_probabilities()inregime.py,pipelines/04_regime_label.pywritesdata/regimes/forward_window_probabilities.parquet; legacy namecompute_forward_probabilities()inlegacy/regime_analysis.py - ✓ Gap 7 — Confusion matrix visualization —
plot_regime_confusion_matrix()inplotting.py(aggregated out-of-fold counts fromconfusion_matrices.parquet); wired inrun_pipeline.pystep 5 whengenerate_plots; optionalpipelines/05_predict.py --plots
Legacy generate_classification_report() prints a per-class confusion matrix in
text form. We log sklearn’s classification_report() string for the current-regime
model but do not replicate the legacy print layout. Heatmap visualization of
CV confusion counts is implemented (Gap 7).
# Check what checkpoints exist
ls data/checkpoints/
# Run just the clustering step with plots
python run_pipeline.py --steps 3 --plots --verbose
# Smoke test step 5 after steps 1–4 (checks models + outputs/reports/model_metrics/*)
bash scripts/smoke_step5.sh
# Optional: run steps 1–5 in one invocation (needs data ingest prerequisites)
# SMOKE_FULL_PIPELINE=1 bash scripts/smoke_step5.sh
# Reload raw data from pickles (skip re-scraping) and recompute everything
python run_pipeline.py --recompute --plots
# Start fresh (re-scrape multpl + FRED, recompute all)
python run_pipeline.py --refresh --recompute --plots
# Launch notebooks
jupyter lab notebooks/
# Quick sanity check (no network, loads a checkpoint)
python -c "
from trading_crab_lib.checkpoints import CheckpointManager
cm = CheckpointManager()
print(cm.list())
"
# Print current dashboard (requires steps 01-06 to have run)
python pipelines/07_dashboard.py