Skip to content

Latest commit

 

History

History
527 lines (432 loc) · 25.8 KB

File metadata and controls

527 lines (432 loc) · 25.8 KB

CLAUDE.md — Project Guide for Claude Code

This file is read automatically by Claude Code at the start of every session. It explains what this project is, how to work in it, and what conventions to follow.


What This Project Is

Trading-Crab is a market regime classification and prediction pipeline written in Python.

The core idea: macro-economic time series (quarterly, ~1950–present) are used to label each calendar quarter with a "market regime" (e.g. Stagflation, Growth Boom, Rising-Rate Slowdown) using unsupervised clustering. Those labels then feed supervised models that (a) predict today's regime from currently-available data, (b) predict regime transitions 1–8 quarters forward, and (c) rank asset-class performance within each regime to produce portfolio recommendations.

End goal: a weekly automated report that says "current regime is X, these assets are green, hold / buy / sell."

The reference implementation lives in legacy/. Two layers of reference exist:

  • legacy/unified_script.py — the original 1249-line monolith; ground truth for every algorithm, formula, and parameter choice.
  • legacy/*.py modular scripts — a refactored version of the monolith organized into: config.py, data_ingestion.py, feature_engineering.py, clustering.py, regime_analysis.py, supervised.py, asset_returns.py, portfolio.py, plotting.py, pipeline.py. These are used as the design reference for the src/trading_crab_lib/ package. Do not modify legacy files.

The modular pipeline in src/ and pipelines/ should do everything that script does, organized more cleanly, with checkpointing, CLI flags, and dedicated plotting notebooks.


Repository Layout

trading-crab/
├── CLAUDE.md                      ← you are here
├── README.md                      ← project overview (user-facing)
├── scratch/README.md              ← extended design notes
├── .env.example                   ← copy to .env, fill in FRED_API_KEY
├── pyproject.toml                 ← pip-installable package (src layout)
│
├── config/
│   ├── settings.yaml              ← ALL tuneable parameters live here
│   └── regime_labels.yaml         ← manually-pinned regime names (edit after clustering)
│
├── data/                          ← gitignored; created at runtime
│   ├── raw/                       ← macro_raw.parquet, asset_prices.parquet
│   ├── processed/                 ← features.parquet (after step 02)
│   ├── regimes/                   ← cluster_labels.parquet, profiles.parquet, …
│   └── checkpoints/               ← timestamped pickle checkpoints (see CheckpointManager)
│
├── legacy/                        ← reference implementation; do not modify
│   ├── unified_script.py          ← THE reference — all logic must be reachable here
│   └── step{1-5}_*.ipynb          ← original Jupyter notebooks
│
├── notebooks/                     ← plotting/exploration notebooks (one per pipeline stage)
│   ├── 01_ingestion.ipynb
│   ├── 02_features.ipynb
│   ├── 03_clustering.ipynb
│   ├── 04_regimes.ipynb
│   ├── 05_prediction.ipynb
│   ├── 06_assets.ipynb
│   ├── 07_pairplot.ipynb
│   ├── 08_diagnostics.ipynb
│   └── 09_raw_series.ipynb
│
├── pipelines/                     ← runnable pipeline steps
│   ├── 01_ingest.py
│   ├── 02_features.py
│   ├── 03_cluster.py
│   ├── 04_regime_label.py
│   ├── 05_predict.py
│   ├── 06_asset_returns.py
│   └── 07_dashboard.py
│
├── run_pipeline.py                ← master entry point with --steps / --refresh / --plots
│
├── outputs/                       ← gitignored; created at runtime
│   ├── models/                    ← pickled sklearn models
│   ├── plots/                     ← saved figures (PNG/PDF)
│   └── reports/                   ← dashboard.csv, weekly summaries
│
└── src/trading_crab_lib/          ← installable Python package (`pip install trading-crab-lib`)
    ├── __init__.py                ← ROOT, CONFIG_DIR, DATA_DIR, OUTPUT_DIR; load, CheckpointManager, RunConfig
    ├── paths.py                   ← LibraryPaths, resolve_library_paths (consumer installs)
    ├── checkpoints.py           ← CheckpointManager (parquet + manifest under data/checkpoints/)
    ├── config.py                  ← load(), setup_logging(), load_portfolio()
    ├── runtime.py                 ← RunConfig dataclass (verbose, plots, refresh flags)
    ├── transforms.py              ← ratios, log, select, gap-fill, derivatives, engineer_all
    ├── clustering.py              ← reduce_pca, evaluate_kmeans, fit_clusters, gap statistic, …
    ├── cluster_comparison.py
    ├── gmm.py
    ├── density.py
    ├── spectral.py
    ├── regime.py                  ← build_profiles, suggest_names, build_transition_matrix, …
    ├── prediction.py              ← orchestration helpers
    ├── prediction/              ← classifier, TSCV, feature gating, model metrics
    │   └── classifier.py
    ├── ingestion/                 ← multpl, fred, assets, grok, …
    ├── asset_returns.py           ← quarterly returns, rank by regime, proxy fallback
    ├── reporting.py               ← dashboard signals, portfolio blending, generate_recommendation, weekly report
    ├── tactics.py
    ├── diagnostics.py
    ├── email.py
    └── plotting.py                ← ALL visualization helpers (used by notebooks + pipelines)

How to Run

Full pipeline (scrape fresh data, recompute everything, generate plots)

python run_pipeline.py --refresh --recompute --plots

Load from checkpoints, skip re-scraping and re-computing, only re-cluster

python run_pipeline.py --steps 3,4,5,6,7 --plots

Run individual steps

python pipelines/01_ingest.py
python pipelines/02_features.py
python pipelines/03_cluster.py
python pipelines/04_regime_label.py
python pipelines/05_predict.py
python pipelines/06_asset_returns.py
python pipelines/07_dashboard.py

CLI flag reference (run_pipeline.py)

Flag Effect
--refresh Re-scrape multpl.com + re-hit FRED API (slow, ~10 min)
--recompute Recompute features from cached raw data (skips scraping)
--plots Generate all matplotlib figures and save to outputs/plots/
--verbose Set logging level to DEBUG
--steps 1,3,5 Run only the listed step numbers
--no-constrained Skip k-means-constrained (if not installed)
--market-code NAME Load market_code from grok, clustered, predicted, or any saved checkpoint
--save-market-code After step 3, save balanced_cluster as market_code_clustered checkpoint
--show-plots Call plt.show() in addition to saving (avoid in headless/CI)

Jupyter notebooks (exploration / plotting)

pip install -e ".[dev]"
jupyter lab notebooks/

Environment Setup

Fresh clone in Cursor / VS Code: create .venv, install dev deps, and select the workspace interpreter — see docs/CURSOR.md (.venv is gitignored by design).

# 1. Install package + dev extras
pip install -e ".[dev]"

# 2. Optional but recommended for balanced clustering
pip install k-means-constrained

# 3. Set FRED API key (free at fred.stlouisfed.org/docs/api/api_key.html)
cp .env.example .env
# edit .env: FRED_API_KEY=your_key_here

# 4. Verify
python -c "from trading_crab_lib.config import load; print(load()['data'])"

Key dependencies

Package Purpose
fredapi FRED macroeconomic data
lxml Fast HTML parsing for multpl.com scraper
yfinance ETF/equity price history
scipy BPoly.from_derivatives for gap filling
scikit-learn PCA, KMeans, RandomForest
k-means-constrained Balanced-size clustering (optional)
matplotlib / seaborn All visualization
pyarrow Parquet checkpoint I/O

Key Design Decisions

Checkpoint system

Every pipeline step checks CheckpointManager.is_fresh(name) before recomputing. Checkpoints are stored as parquet files under data/checkpoints/ with a manifest tracking creation timestamp and config hash. Pass --refresh or --recompute to force regeneration. This is the most important usability feature for day-to-day development — scraping 46 URLs every run is ~10 minutes.

Preservation secondaries (macro_raw_secondary, features_secondary, features_supervised_secondary) are wide snapshots written from steps 1–2 so downstream dropna(axis=1) in memory does not erase column history. They update when missing, on --refresh / --recompute (or --refresh-preservation to force), and are skipped after partial macro column repair; CheckpointManager.clear_all() keeps them unless you clear() them by name.

Global runtime flags (RunConfig)

All runtime behaviour is controlled by a RunConfig dataclass (not hardcoded in modules). Construct it once in run_pipeline.py or any pipeline step, and pass it through. Key flags mirror the legacy script:

@dataclass
class RunConfig:
    verbose: bool = False
    generate_plots: bool = False
    generate_pairplot: bool = False          # seaborn pairplot (slow)
    generate_scatter_matrix: bool = False    # pandas scatter_matrix (slow)
    refresh_source_datasets: bool = False    # re-scrape multpl + FRED
    recompute_derived_datasets: bool = False # recompute features from cached raw
    save_plots: bool = True                  # save figures to outputs/plots/
    show_plots: bool = False                 # plt.show() (use False in CI/headless)

Publication-lag shift

GDP (fred_gdp) and GNP (fred_gnp) are shifted +1 quarter in fred.py to prevent look-ahead bias. The raw BEA release comes ~30 days after quarter end, so at the end of Q1 you cannot know Q1 GDP. This is set per-series in config/settings.yaml (shift: true).

Feature pipeline order (transforms.py — engineer_all)

  1. Cross-asset ratios (10 derived columns: div_yield2, price_gdp, credit_spread, etc.)
  2. Log transforms (23 columns → log_{col})
  3. Narrow to initial_features (36 columns + market_code)
  4. Bernstein polynomial gap filling (interior NaNs) + Taylor extrapolation (edges)
  5. Smoothed derivatives via np.gradient on day-number time axis (d1, d2, d3 per column)
  6. Narrow to clustering_features (69 columns + market_code)

Steps 3 and 6 are controlled by initial_features and clustering_features lists in config/settings.yaml. Edit those lists there — not in the Python code.

PCA is fixed at 5 components

The legacy analysis established 5 PCA components as the working baseline. n_pca_components: 5 in settings.yaml. Do not switch to variance-threshold PCA without benchmarking first — it changes the cluster geometry.

Two clusterings are always produced

fit_clusters() always returns both cluster (best-k from silhouette, capped at k_cap) and balanced_cluster (size-constrained at balanced_k). Downstream steps default to balanced_cluster for regime labeling because equal-size clusters are better for per-regime statistics with limited data.

Plotting convention

All visualization helpers live in src/trading_crab_lib/plotting.py. Notebooks import from there — they do not define plotting logic inline. Every plot function accepts run_cfg: RunConfig and honours save_plots / show_plots. Output filenames are standardized as outputs/plots/{step}_{description}.png.

Custom color palette

Five-regime color palette from the legacy script:

CUSTOM_COLORS = ["#0000d0", "#d00000", "#f48c06", "#8338ec", "#50a000"]

Use plotting.REGIME_CMAP everywhere for consistency.


Data Sources

multpl.com (46 series)

Scraped via lxml cssselect from #datatable. All URLs and value_type metadata are in config/settings.yaml under multpl.datasets. Do not hardcode URLs in Python. Rate-limited to 2 seconds per request (RATE_LIMIT_SECONDS).

FRED API (7 series, more planned)

Current: GDP (shifted +1Q), GNP (shifted +1Q), BAA, AAA, CPI (CPIAUCSL), GS10, TB3MS.

Planned additions (see ROADMAP.md Tier 1):

  • VIXCLS (VIX, 1990+), UNRATE (unemployment, 1948+), M2NS (money supply, 1959+)
  • T10Y2Y (10Y-2Y spread), GS2 (2Y Treasury), HOUST (housing starts), UMCSENT

Requires FRED_API_KEY in .env. Free registration at fred.stlouisfed.org.

macrotrends.net (planned — not yet implemented)

Gold spot price back to 1915, WTI crude oil back to 1946, silver, copper. See ROADMAP.md Tier 1 item 1.5 and src/trading_crab_lib/ingestion/macrotrends.py (to be created). Scraping approach: extract embedded JSON from <script>var rawData={...}</script> tags.

ETF price history (yfinance)

SPY, GLD, TLT, USO, QQQ, IWM, VNQ, AGG — monthly adjusted close, resampled to quarterly. Fetched in ingestion/assets.py. No API key required.

Grok baseline labels

data/grok_quarter_classifications_20260216.pickle — an external LLM-assisted classification of quarters used as a visual reference overlay in notebooks. Not used for model training. Loaded via ingestion/grok.py (or directly in notebooks).


Config Reference (settings.yaml)

All tuneable parameters are in config/settings.yaml. Key sections:

Section Key parameters
data start_date, end_date, frequency
fred.series per-series name + shift flag
multpl.datasets list of [name, description, url, value_type]
features.log_columns columns to log-transform
features.initial_features columns retained before gap fill
features.clustering_features final columns fed to PCA
features.derivative_window rolling mean window for np.gradient smoothing
clustering.n_pca_components fixed at 5
clustering.n_clusters_search upper bound for k-sweep (default 12)
clustering.k_cap max k accepted from silhouette (default 5)
clustering.balanced_k k for size-constrained KMeans (default 5)
prediction.forward_horizons_quarters [1, 2, 4, 8]
prediction.cv_splits 5 (TimeSeriesSplit folds)
prediction.dt_max_depth 8 (DecisionTree depth)
prediction.rf_max_depth 12 (RandomForest max depth)

What Must NOT Change Without Discussion

  • The feature pipeline order — cross-ratios → log → select → gap-fill → deriv → select. The Bernstein gap fill must happen AFTER log transform so it interpolates in log space.
  • Publication-lag shifts — GDP and GNP must always be shifted. Do not remove without explicit approval.
  • clustering_features list — this is analytically determined. Changes here change the clustering geometry and invalidate any manually pinned regime_labels.yaml.
  • n_pca_components = 5 — changing this changes which regimes you find. Benchmark first.
  • Submodules are reference-only — do not push commits into submodules; treat them as read-only sources for alignment checks. You may git pull/update submodules locally, but any code changes must be made in the main repo unless we explicitly discuss otherwise.
  • Saving to .env or committing API keys — never. Use .env.example only.

What the Legacy Code Does That Must Be Matched

Cross-reference legacy/unified_script.py and the legacy/*.py modules for ground truth. Items marked ✓ are verified as matching in src/. Items marked ✗ are known gaps that still need to be implemented or aligned.

Algorithms (all ✓ — fully matched in src/)

  1. Scraping — lxml cssselect("#datatable tr"), user-agent string, 2s rate limit
  2. FRED — per-series shift, quarterly resample with .last()
  3. Cross-ratios — exact 10 formulas (div_yield2, price_div, price_gdp, price_gdp2, price_gnp2, div_minus_baa, credit_spread, real_price2, real_price3, real_price_gdp2)
  4. Log transformnp.log(col.clip(lower=1e-9))
  5. Gap fillingBPoly.from_derivatives with 4 boundary conditions per side (value + d1 + d2 + d3); Taylor extrapolation for leading/trailing edges
  6. Derivativesnp.gradient on matplotlib day-number axis + centered rolling mean of window=5 before and after each gradient call
  7. PCAStandardScalerPCA(n_components=5) → re-StandardScaler before KMeans
  8. K-sweeprange(2, 13) with n_init=50, silhouette + CH + DB
  9. Balanced clusteringKMeansConstrained(size_min=bucket-2, size_max=bucket+2)
  10. Color palette["#0000d0", "#d00000", "#f48c06", "#8338ec", "#50a000"]

Features in legacy NOT yet in src/ (see Legacy Alignment Gaps section)

  • DecisionTreeClassifier training for interpretability — implemented in classifier.py
  • TimeSeriesSplit cross-validation — implemented via _tscv_scores() helper
  • ✓ Portfolio construction — generate_recommendation() / blended portfolios in reporting.py
  • ✓ Macro-data fallback for asset returns — compute_proxy_returns() in asset_returns.py
  • Confusion matrix heatmap (walk-forward CV)plot_regime_confusion_matrix() in plotting.py; data from outputs/reports/model_metrics/confusion_matrices.parquet; PNG outputs/plots/05_confusion_matrix.png when step 5 runs with plots (run_pipeline.py --steps 5 --plots or pipelines/05_predict.py --plots). Legacy generate_classification_report() console formatting is not duplicated; sklearn classification_report() string remains in logs only.

Things src/ does better than legacy (do not regress)

  • ✓ Real ETF price data via yfinance (SPY, GLD, TLT, USO, QQQ, IWM, VNQ, AGG) instead of macro-data proxies
  • CheckpointManager with parquet + manifest (vs. ad-hoc pickle/CSV)
  • RunConfig dataclass for clean flag management
  • ✓ All config in settings.yaml (vs. hardcoded Python constants)
  • ✓ Full CLI in run_pipeline.py with --steps, --refresh, --recompute, etc.
  • ✓ Dedicated exploration notebooks (01–09)

Conventions

General Python style

  • Python 3.10+ — use match, | union types, X | None not Optional[X]
  • Type hints on all public functions
  • logging everywhere, no print() in library code (only in pipelines/ and run_pipeline.py)
  • No bare except: — always catch specific exception types
  • All file paths via pathlib.Path, never string concatenation

Naming

  • DataFrames: noun describing contents (features, pca_df, clustered, returns)
  • Series: noun describing the single variable (labels, cluster)
  • Functions: verb_noun (fetch_all, apply_log_transforms, build_profiles)
  • Config keys: snake_case throughout YAML

Checkpoint files

  • Stored under data/checkpoints/{name}.parquet (DataFrames) or {name}.pkl (models)
  • Preservation secondaries: see Checkpoint system above; names in PRESERVATION_CHECKPOINT_NAMES (trading_crab_lib.checkpoints).
  • Always prefer parquet over pickle for DataFrames (smaller, typed, readable)
  • Pickle only for sklearn models (no parquet-serializable alternative)
  • Never commit data files — data/ and outputs/ are in .gitignore

Testing

pytest tests/ -v
make lint          # Ruff: lint + format check (same paths as CI)

Tests live under tests/. Unit tests should not require network access — mock requests.get for scraping tests and FRED API calls. Use fixtures from tests/conftest.py.

Commits

  • Conventional format: feat:, fix:, refactor:, docs:, test:, chore:
  • Example: feat: add yfinance asset price ingestion (step 06)
  • Branch: always claude/description-sessionID — never push directly to main

Current Status (as of March 2026)

See STATE.md for a full breakdown of what runs, what's tested, and what output files are produced. See ROADMAP.md for prioritized feature backlog. See ARCHITECTURE.md for design decisions. See PITFALLS.md for known gotchas.

Complete ✓

  • Steps 01–07 run end-to-end; pipeline verified on real data (all 7 steps)
  • CheckpointManager — fully implemented; parquet + manifest
  • RunConfig — fully implemented, including from_args() factory
  • run_pipeline.py — master runner with full CLI (all flags implemented)
  • ingestion/assets.py — yfinance ETF price fetcher + 3-phase fallback chain (stooq → OpenBB → macro proxy)
  • plotting.py — 18 visualization helpers covering all 7 pipeline steps (includes plot_regime_confusion_matrix)
  • notebooks/01–09 — all notebooks present; 03_clustering expanded with 28 investigation cells
  • Requirements — minimum-bound strategy, Python 3.10+ compatible
  • from __future__ import annotations — present in all source files using X | Y syntax
  • Unit tests — 377+ collected tests (8 skipped: HDBSCAN) covering all modules including clustering investigation suite
  • Gap 1TimeSeriesSplit CV in classifier.py (5-fold walk-forward)
  • Gap 2DecisionTreeClassifier in classifier.py (max_depth=8)
  • Gap 3reporting/portfolio.py — simple + blended portfolio + BUY/SELL/HOLD
  • Gap 4compute_proxy_returns() fallback in assets/returns.py
  • Causal smoothingengineer_all(causal=True/False) + dual parquet outputs from step 2
  • Clustering investigation suite — GMM (gmm.py), DBSCAN/HDBSCAN (density.py), Spectral (spectral.py), multi-method comparison (cluster_comparison.py), gap statistic + SVD + PCA sweep in clustering.py
  • Config-driven var listsplotting.sample_series, plotting.key_indicators, assets.etfs all in settings.yaml
  • 16 ETFs — expanded from 8 to 16 (added HYG, XLK, XLP, XLE, GDX, TIP, BIL, EDV)
  • pythonpath = ["src", "."] in pyproject.toml pytest config — tests run without pip install -e . (repo root on path for run_pipeline imports in pipeline tests).

Next Priority (implement in upcoming sessions)

  1. Additional FRED series — VIX (VIXCLS), unemployment (UNRATE), M2 (M2NS), yield spreads (T10Y2Y, T10Y3M, GS2), housing starts (HOUST), consumer sentiment (UMCSENT)
  2. macrotrends.net scraper — gold/oil spot prices back to 1915/1946
  3. LightGBM classifier — alongside RF + DT in classifier.py
  4. Expand test suite — classifier, portfolio, dashboard
  5. end_date: null in settings.yaml → use today's date at runtime
  6. Per-asset regime probability models ("Putting it all together — Part I")
  7. Weekly automated report with AI-written narrative via Claude API

Note: Yield-curve spreads yc_* and build_forward_window_probabilities() are shipped — see transforms.py, regime.py, and root ROADMAP.md §1.3–1.4.

Known Limitations

  • profiler.py naming heuristics silently skip 4 features (10yr_ustreas, fred_gs10, fred_tb3ms, div_minus_baa) because only their derivatives are in clustering_features. Graceful fallback is intentional.
  • Only 7 FRED series currently ingested; many high-value series (VIX, unemployment, M2, yield curve) are not yet fetched.
  • ETF data starts 1993-2006; pre-1993 gold and oil regime analysis uses proxy columns only. macrotrends.net backfill would extend coverage to 1915+ for gold.
  • Clustering uses KMeans which treats each quarter independently; HMM would model temporal autocorrelation natively (Tier 2 roadmap item).

Legacy Alignment Gaps

Full comparison of legacy/*.py vs src/trading_crab_lib/ completed March 2026.

Closed Gaps (all implemented)

  • Gap 1 — TimeSeriesSplit CV (legacy/supervised.pyclassifier.py)
  • Gap 2 — DecisionTreeClassifier (legacy/supervised.pyclassifier.py)
  • Gap 3 — Portfolio construction (legacy/portfolio.pyreporting/portfolio.py)
  • Gap 4 — Macro-data proxy returns fallback (legacy/asset_returns.pyassets/returns.py)
  • Gap 5 — Causal/backward rolling windows for supervised learning (transforms.py)
  • Gap 6 — Empirical forward-window probabilities — build_forward_window_probabilities() in regime.py, pipelines/04_regime_label.py writes data/regimes/forward_window_probabilities.parquet; legacy name compute_forward_probabilities() in legacy/regime_analysis.py
  • Gap 7 — Confusion matrix visualization — plot_regime_confusion_matrix() in plotting.py (aggregated out-of-fold counts from confusion_matrices.parquet); wired in run_pipeline.py step 5 when generate_plots; optional pipelines/05_predict.py --plots

Remaining Gaps

Classification report console parity (legacy/supervised.py)

Legacy generate_classification_report() prints a per-class confusion matrix in text form. We log sklearn’s classification_report() string for the current-regime model but do not replicate the legacy print layout. Heatmap visualization of CV confusion counts is implemented (Gap 7).


Frequently Needed Commands

# Check what checkpoints exist
ls data/checkpoints/

# Run just the clustering step with plots
python run_pipeline.py --steps 3 --plots --verbose

# Smoke test step 5 after steps 1–4 (checks models + outputs/reports/model_metrics/*)
bash scripts/smoke_step5.sh
# Optional: run steps 1–5 in one invocation (needs data ingest prerequisites)
# SMOKE_FULL_PIPELINE=1 bash scripts/smoke_step5.sh

# Reload raw data from pickles (skip re-scraping) and recompute everything
python run_pipeline.py --recompute --plots

# Start fresh (re-scrape multpl + FRED, recompute all)
python run_pipeline.py --refresh --recompute --plots

# Launch notebooks
jupyter lab notebooks/

# Quick sanity check (no network, loads a checkpoint)
python -c "
from trading_crab_lib.checkpoints import CheckpointManager
cm = CheckpointManager()
print(cm.list())
"

# Print current dashboard (requires steps 01-06 to have run)
python pipelines/07_dashboard.py