This file breaks a large planning/documentation effort into small, independent phases (P1–P8) that can each be completed in a single Claude Code session without hitting token limits. Each phase produces specific markdown files that are committed before the next phase begins.
Created: 2026-03-30
Branch: claude/implement-phase-a2-SmHEA
The following issues were discovered during a codebase audit on 2026-03-30. They inform the planning phases below.
Symptom: Running the pipeline twice with different --market-code flags
(or after running pytest) produces different email output.
Root causes identified:
-
_fill_column()intransforms.py:145— Whenmarket_codeis present, gap-fill usesdf[[col, "market_code"]].dropna()to find valid rows. Differentmarket_codesources (grok vs predicted vs clustered) have different NaN patterns, so gap-fill boundaries change, altering derivative values. -
step5_predict()uses.dropna(axis=1, how="any")— This drops entire columns that have any NaN. Different gap-fill results → different NaN patterns → different columns survive → different feature sets fed to RF → different model. -
No global numpy/random seed — While individual sklearn models use
random_state=42, there is nonp.random.seed()orrandom.seed()at pipeline startup. Any stochastic operation not explicitly seeded is non-deterministic.
Fix strategy:
- Remove
market_codefrom gap-fill valid-row logic (it's a label, not a feature) - Set global seeds at pipeline startup:
np.random.seed(42),random.seed(42) - Add determinism tests that verify identical outputs given identical inputs
- Consider pinning column lists rather than relying on
dropna(axis=1)
- 17x
RuntimeWarning: invalid value encountered in reduce— numpy overflow during statsmodels MarkovRegression optimization. Harmless but noisy. - 1x
ValueWarning: A date index has no associated frequency— statsmodels wantsfreqon the DatetimeIndex.
Fix: Add @pytest.mark.filterwarnings or pytestmark to suppress known warnings
in test_markov.py. Optionally set .freq on test fixture indices.
src/trading_crab_lib/transforms.py is the only module missing this import.
reporting.py:print_dashboard() uses print() directly (12 instances). Per CLAUDE.md
convention, library code should use logging. The print_dashboard() function is
user-facing output, so this is semi-intentional, but should be refactored to return
a string and let the caller decide how to display it.
src/trading_crab/pipeline.py (1375 lines) and src/trading_crab/cli.py have zero
test coverage. Pipeline step functions are tested indirectly via library module tests,
but the orchestration logic, argument parsing, and step dispatch are untested.
tests/integration/ and tests/fixtures/ directories don't exist. There are no
end-to-end tests that run multiple pipeline steps in sequence.
No mypy.ini, pyrightconfig.json, or [tool.mypy] in pyproject.toml. Type hints
exist on most public functions but are never validated by CI.
No .pre-commit-config.yaml. Linting is only enforced in CI (flake8).
6 workflow files exist: publish.yml, publish-app.yml, publish-lib.yml,
python-app.yml, python-package.yml, python-publish.yml. Some are likely
duplicates from before the 2-package split.
Current write_weekly_report_md() produces:
## Current Regime(regime + confidence)## Recommendations(BUY/SELL bullets)## Risk & regime transition(top 3 transition probabilities)## Tactics(buy_hold / swing / stand_aside — only if tactics_signals.parquet exists)
Missing from GSD version:
## Diagnosticssection (ratio snapshots by |z|, RRG quadrant counts)- HTML rendering of markdown in email body
- Inline plot attachments (partially implemented in Phase E email work but not wired into the report generation)
- Strongest BUY/SELL ideas showing target % vs current (currently shows delta_pct which is allocation change, not price target)
No Dockerfile. Would be useful for reproducible pipeline runs and CI.
Config is loaded as a raw dict with .get() defaults scattered across modules.
No centralized validation that required keys exist or have correct types.
Each phase is a single Claude Code session. Commit output before moving to next.
Scope: Create a comprehensive guide that documents how to rebuild this project from scratch, assuming no prior code exists. Covers architecture, module order, testing strategy, and lessons learned.
Output: REBUILD-FROM-SCRATCH-GUIDE.md committed to repo.
Estimated size: ~500 lines of markdown. Split into 4 sub-sessions to avoid timeouts:
- P1a — Sections 1–4: Introduction, two-package architecture, repository layout, build order
- P1b — Sections 5–7: Feature pipeline deep-dive, clustering, prediction
- P1c — Sections 8–10: Testing strategy, packaging/distribution, lessons learned
- P1d — Section 11 (critical invariants) + update META_PLAN.md + commit/push
Scope: Fix the three root causes of non-deterministic pipeline results.
Tasks:
- Remove
market_codefrom_fill_column()valid-row logic intransforms.py - Remove
market_codefromapply_derivatives()valid-row logic intransforms.py - Set global seeds (
np.random.seed(42),random.seed(42)) inpipeline.py:main() - Add
random_stateconfig key tosettings.yamlunder a top-levelpipelinesection - Pin the feature column list in step 5 instead of using
dropna(axis=1)— or at minimum log which columns are dropped so the user can see it - Add a determinism test: run
engineer_all()twice on the same input, assert identical output - Add
from __future__ import annotationstotransforms.py
Output: Code changes + tests. Single session.
Scope: Eliminate all pytest warnings and document local setup.
Tasks:
- Suppress known statsmodels warnings in
test_markov.pyviafilterwarnings - Set
.freqon test fixture DatetimeIndex to eliminate frequency warning - Add
hmmlearnandstatsmodelsto dev requirements so no tests are skipped - Add
hdbscanto dev requirements (currently 11 tests skipped) - Update
requirements-dev.txtwith all optional deps needed for full test suite - Add a
[tool.pytest.ini_options]filterwarningssection to pyproject.toml - Document in README.md how to run tests with zero skips/warnings
Output: Config changes + test fixes. Single session.
Scope: Add the missing report sections and HTML rendering.
Tasks:
- Add
## Diagnosticssection towrite_weekly_report_md():- Ratio snapshots (top by |z-score|) from diagnostics.py
- RRG quadrant counts from diagnostics.py
- Wire diagnostics data into step 7 dashboard so it's available for the report
- Render email body as HTML (markdown → HTML conversion)
- Attach key plots inline (already partially implemented)
- Update
config/email.example.yamlwith newattach_plotsdefaults - Add tests for the new report sections
Output: Code changes + tests. Single session.
Scope: Replace Phase F with more granular phases based on audit findings.
New phases to define:
- Phase G — Determinism & reproducibility (P2 implementation details)
- Phase H — Test hardening (integration tests, pipeline.py tests, mypy)
- Phase I — Email & reporting enhancements (P4 implementation details)
- Phase J — CI/CD cleanup (deduplicate workflows, add mypy, pre-commit)
- Phase K — Migration prep (config independence, Docker, artifact publishing)
Also update: ROADMAP.md status, STATE.md with current test count, CLAUDE.md with any new conventions discovered.
Output: Updated markdown files. Single session.
Scope: Clean up CI workflows and add developer tooling.
Tasks:
- Audit and deduplicate the 6 GitHub Actions workflow files
- Add
[tool.mypy]to pyproject.toml with basic strict settings - Create
.pre-commit-config.yaml(flake8 + mypy + trailing whitespace) - Add mypy to CI
- Consider: is Docker useful? (Answer: yes for reproducible weekly runs, but not urgent. Document as future work.)
Output: CI config changes. Single session.
Scope: Fill test coverage gaps.
Tasks:
- Add smoke tests for
trading_crab.pipeline(testbuild_parser(), testmain()with--steps 3 --helpor minimal mocked steps) - Add smoke test for
trading_crab.clientry points - Create
tests/integration/with a mini end-to-end test using synthetic data - Add determinism regression tests (run pipeline twice, compare outputs)
- Reach 100% module coverage (every .py file has a corresponding test file)
Output: New test files. Single session.
Scope: Additional documentation that supports the rebuild guide.
Tasks:
- Create
LESSONS_LEARNED.md— pitfalls discovered, what we'd do differently - Update README.md — refresh layout tree, add badges, fix any stale references
- Update CLAUDE.md — add new ADRs for decisions made in P2-P7
- Answer user's questions about alternative artifact formats (Maven, npm, Docker)
in a
DISTRIBUTION.mdor section in REBUILD guide
Output: Documentation files. Single session.
P1 (REBUILD guide) — can start immediately, no code changes
P2 (determinism fix) — CRITICAL, do ASAP after P1
P3 (pytest warnings) — quick win, do alongside or after P2
P5 (update NEXT_STEPS) — do after P2+P3 so new phases reflect fixes
P4 (email enhancements)— after P5 (needs plan finalized)
P6 (CI/CD cleanup) — independent, any time after P3
P7 (test hardening) — after P2 (determinism tests) and P4 (email tests)
P8 (documentation) — last, after all code changes settle
Recommended session order: P1 → P2 → P3 → P5 → P4 → P6 → P7 → P8
Each session should:
- Read this META_PLAN.md for context
- Complete the specified phase
- Commit and push before ending
- Update this file's status section (below)
| Phase | Status | Branch/Commit | Notes |
|---|---|---|---|
| P1a | DONE | claude/review-meta-plan-Sqot2 | Sections 1–4 |
| P1b | DONE | claude/review-meta-plan-Sqot2 | Sections 5–7 |
| P1c | DONE | claude/review-meta-plan-Sqot2 | Sections 8–10 |
| P1d | DONE | claude/review-meta-plan-Sqot2 | Section 11 + commit/push |
| P2 | DONE | claude/review-meta-plan-Sqot2 | Critical determinism fix |
| P3 | DONE | claude/review-meta-plan-Sqot2 | Pytest warnings + local setup |
| P4 | DONE | claude/review-meta-plan-Sqot2 | Email enhancements |
| P5 | DONE | claude/p5-doc-updates-1774982009 | Update NEXT_STEPS.md |
| P6 | DONE | claude/p5-doc-updates-1774982009 | CI/CD cleanup |
| P7 | DONE | claude/p5-doc-updates-1774982009 | Test hardening |
| P8 | DONE | claude/p5-doc-updates-1774982009 | Documentation supplements |