diff --git a/{{cookiecutter.project_slug}}/.github/workflows/test.yml b/{{cookiecutter.project_slug}}/.github/workflows/test.yml index 0c12862..59f9f97 100644 --- a/{{cookiecutter.project_slug}}/.github/workflows/test.yml +++ b/{{cookiecutter.project_slug}}/.github/workflows/test.yml @@ -6,9 +6,6 @@ on: - main pull_request: -env: - UV_SYSTEM_PYTHON: "1" - jobs: quality: runs-on: ubuntu-latest @@ -30,8 +27,15 @@ jobs: curl -LsSf https://astral.sh/uv/install.sh | sh echo "$HOME/.local/bin" >> "$GITHUB_PATH" - - name: Sync dependencies - run: uv sync --dev + - name: Verify uv installation + run: | + uv --version + which uv + + - name: Install dependencies (workspace) + run: | + uv sync --dev --all-packages + uv pip list - name: Ruff lint run: uv run ruff check src/ tests/ diff --git a/{{cookiecutter.project_slug}}/CLAUDE.md b/{{cookiecutter.project_slug}}/CLAUDE.md index 027854d..bcb549e 100644 --- a/{{cookiecutter.project_slug}}/CLAUDE.md +++ b/{{cookiecutter.project_slug}}/CLAUDE.md @@ -1,144 +1,780 @@ -# Development Rules +# CellSem Agentic Workflow - Development Guide -## Test-Driven Development (MANDATORY) -1. Write unit and integration tests FIRST -2. Tests must fail initially (red) -3. Commit tests before implementation -4. Write minimal code to pass tests (green) -5. Refactor while keeping tests green, commit +**Template for building robust agentic workflows with integrated validation** -## TDD Workflow Commands (using uv) +This CLAUDE.md should be copied to each new agentic workflow project and customized for that project's specifics. + +--- + +## Development Philosophy + +### Core Principle: Scope Rings + +**Every project follows this sequence:** + +``` +Ring 0 (MVP - Ship First): Core value proposition +Ring 1 (After validation): User-requested enhancements +Ring 2 (If valuable): Advanced features +Ring 3 (Speculative): Experiments +``` + +**RULE: Cannot work on Ring N+1 until Ring N is shipped and validated with users** + +**Timeline:** +(treat week numbers as relative timings/durations here - actual agentic dev may be faster) +- Week 0: Validate constraints +- Week 1-2: Build Ring 0 +- Week 2-3: Ship & get user feedback +- Week 4+: Iterate based on feedback + +--- + +## Understanding the Scaffold + +This template provides **infrastructure** and **optional patterns**. See `SCAFFOLD_GUIDE.md` for complete decision trees. + +### Infrastructure (Keep Always) + +These prevent technical debt and ensure consistency: + +- ✅ **JSON schemas in `schemas/`** - Schema-first design (generate Pydantic models programmatically) +- ✅ **YAML prompts co-located with agents/services** - Declarative, versionable prompts +- ✅ **Prompt naming convention**: `{agent_name}.prompt.yaml` or `{service_name}.{purpose}.prompt.yaml` + - Examples: `agents/annotator.prompt.yaml`, `services/deepsearch.query.prompt.yaml` + - Easy to find: `find . -name "*.prompt.yaml"` + - Easy to review: `git diff **/*.prompt.yaml` +- ✅ **Test structure** - `unit/` and `integration/` with pytest markers +- ✅ **Tooling configs** - pytest, ruff, mypy, sphinx in `pyproject.toml` +- ✅ **Dotenv bootstrap** - Environment management via `.env` files + +### Optional Components (Evaluate for Ring 0) + +These are proven patterns - use if Ring 0 needs them, otherwise DELETE: + +- **`graphs/`** - Multi-step workflow orchestration + - Keep if: Complex branching workflows needed + - Delete if: Single agent or linear flow sufficient + - See: `src/{{cookiecutter.package_name}}/graphs/README.md` + +- **`validation/`** - Cross-cutting validation logic + - Keep if: Shared validation across 2+ services + - Delete if: Simple Pydantic validation sufficient + - See: `src/{{cookiecutter.package_name}}/validation/README.md` + +- **`agents/example_agent.py`** - Example agent with infrastructure patterns + - EXAMPLE code: Replace with your domain logic + - INFRASTRUCTURE patterns: Keep schema-first, prompt-first approach + +- **`deep-research-client` dependency** + - Keep if: Using Perplexity/deep research in Ring 0 + - Delete if: Not needed for MVP (remove from `pyproject.toml`) + +### Week 0 Includes Scaffold Review + +1. Define Ring 0 scope (update sections below) +2. Review `SCAFFOLD_GUIDE.md` decision trees +3. **Delete unused optional components** (graphs/, validation/, etc.) +4. **Remove unused dependencies** (deep-research-client if not needed) +5. Keep infrastructure, replace example code with domain logic +6. Update README.md with your project description + +**Key Principle**: Infrastructure ≠ premature abstraction. Schema files and prompt files are infrastructure that prevent technical debt. + +### Prompt File Organization + +**INFRASTRUCTURE**: Always store prompts in separate YAML files (not hardcoded in code). + +**Location**: Co-locate prompts with the agents/services that use them. + +**Naming Convention**: Use `.prompt.yaml` suffix for easy discoverability: +- `{agent_name}.prompt.yaml` - For single-purpose agents +- `{service_name}.{purpose}.prompt.yaml` - For services with multiple prompts + +**Examples**: +- `src/{{cookiecutter.package_name}}/agents/annotator.prompt.yaml` +- `src/{{cookiecutter.package_name}}/services/deepsearch.query.prompt.yaml` +- `src/{{cookiecutter.package_name}}/services/deepsearch.summary.prompt.yaml` + +**Benefits**: +- Easy discovery: `find . -name "*.prompt.yaml"` +- Clear ownership: prompt lives next to implementation +- Easy review: `git diff **/*.prompt.yaml` +- Grepable: search for prompt changes across project +- Version controlled: track prompt evolution in git + +**Pattern**: +```yaml +# Co-located with agent/service that uses it +system_prompt: | + You are an AI assistant specialized in {domain}. + +user_prompt: | + Process this {task_type}: {input_data} + +presets: + openai-gpt4: + provider: openai + model: gpt-4 + temperature: 0.1 +``` + +**Load in code**: +```python +from pathlib import Path +import yaml + +def load_prompt(prompt_file: str) -> dict: + """Load co-located prompt file.""" + prompt_path = Path(__file__).parent / prompt_file + return yaml.safe_load(prompt_path.read_text()) + +# Usage +prompt_config = load_prompt("my_agent.prompt.yaml") +``` + +--- + +## Project-Specific Configuration + +**[CUSTOMIZE THIS SECTION FOR EACH PROJECT]** + +### Ring 0 Scope (MVP) + +- [ ] Core feature 1 +- [ ] Core feature 2 +- [ ] Basic output format + +**STOP after Ring 0. Share with users. Get feedback.** + +### Ring 1 Scope (After User Validation) + +- [ ] TBD based on user feedback + +### Architecture Vision + + +**Core design:** +- Service pattern: [describe] +- Schema-first: Pydantic models from JSON schema +- Configuration: [describe preset system] + +**What NOT to do yet:** +- ❌ Don't add multi-provider support unless clearly stated use case (wait for 2nd provider need) +- ❌ Don't build abstract base classes (wait for 2+ concrete cases) +- ❌ Don't optimize for scale (wait for scale problems) BUT - also warn of any poorly scaling or costly anti-patterns (e.g. multiple LLM calls passing the same info) + +### Known Constraints + + + +**Example:** +```markdown +## Perplexity deep reasearch API (Tested YYYY-MM-DD) +- ❌ Does NOT respect JSON schema in system prompt +- ✅ DOES work with schema in user message as part of request for larger report +- Tested: 10/10 successful parses +``` + +--- + +## Week 0: Validation Phase (REQUIRED) + +**Before writing production code:** + +### 1. Test External Services (1-2 days) +- Write 5-10 simple test scripts for each external API +- Document behavior quirks in CONSTRAINTS.md +- Test edge cases, error modes, rate limits +- **Deliverable:** CONSTRAINTS.md with tested facts + +### 2. Create Scope Rings (1 day) +- Define Ring 0 (MVP) clearly - what's the minimum value? +- Identify Ring 1 candidates (defer until feedback) +- **Deliverable:** SCOPE_RINGS.md + +### 3. Update This CLAUDE.md +- Fill in Ring 0 scope above +- Document architectural decisions +- List what NOT to do yet +- **Deliverable:** Project-specific CLAUDE.md + +**Week 0 prevents:** Building elaborate systems on wrong assumptions + +--- + +## Test-Driven Development + +### Integration Tests: ALWAYS Required + +**From Day 1:** +- Integration tests with REAL external services +- Tests FAIL HARD if no API keys +- Forces validation against real behavior +- Catches API quirks immediately + +**Example:** +```python +@pytest.mark.integration +def test_perplexity_json_output(): + """Test real Perplexity API with JSON schema.""" + if not os.getenv("PERPLEXITY_API_KEY"): + pytest.fail("PERPLEXITY_API_KEY required for integration tests") + + # Test with real API + result = query_perplexity(...) + assert valid_json(result) +``` + +### TDD: When to Use + +**Use TDD for:** +- ✅ Parsers, validators, data transformers (clear inputs/outputs) +- ✅ Bug fixes (red → green → refactor) +- ✅ Core domain logic (once understood) + +**Don't use TDD for:** +- ❌ Exploratory prototypes +- ❌ Trying different prompt strategies +- ❌ Initial API integration experiments + +**TDD Workflow:** ```bash -# Install dependencies and sync environment -uv sync --dev # Install all dependencies including dev tools +# 1. Red: Write failing test +uv run pytest tests/test_parser.py -k test_new_feature # Should fail -# Run tests -uv run pytest # All tests -uv run pytest -m unit # Unit tests only -uv run pytest -m integration # Integration tests only -uv run pytest --cov # With coverage +# 2. Green: Minimal implementation +# ... write code ... +uv run pytest tests/test_parser.py -k test_new_feature # Should pass + +# 3. Refactor: Improve while tests stay green +``` + +### Coverage Targets + +**MVP Phase (Week 1-3):** +- Target: 60% coverage +- Focus on critical paths +- Integration tests > unit test coverage + +**Post-MVP (Week 4+):** +- Target: 80%+ coverage +- Comprehensive test suite +- Add edge cases -# Code quality (run before committing!) +--- + +## Code Quality: Phase-Appropriate Standards + +### MVP Phase (Week 1-3): Relaxed + +**Focus:** Deliver value, validate approach + +```bash +# Run these manually (not blocking) +uv run mypy src/ # Type checking (encouraged) +uv run ruff check --fix src/ tests/ # Lint uv run ruff format src/ tests/ # Format code -uv run ruff check --fix src/ tests/ # Lint and auto-fix -uv run mypy src/ # Type checking +``` -# Documentation (run before committing!) -python scripts/check-docs.py # Build docs and check for errors -cd docs && uv run sphinx-build . _build/html -W # Alternative direct command +**Standards:** +- ✅ Integration tests (required) +- ✅ Type hints (encouraged, not enforced) +- ✅ Linting (run manually, don't block commits) +- ✅ Coverage: 60% target +- ❌ NO pre-commit hooks yet (patterns not stable) -# Pre-commit hooks (recommended) -uv run pre-commit install # Install git hooks -uv run pre-commit run --all-files # Run on all files +### Post-MVP Phase (Week 4+): Strict -# Add new dependencies -uv add requests # Add runtime dependency -uv add --dev pytest # Add development dependency +**Focus:** Sustainable, maintainable code + +```bash +# Install pre-commit hooks +uv run pre-commit install + +# These now run automatically on commit +uv run pytest --cov --cov-fail-under=80 +uv run mypy src/ +uv run ruff check src/ tests/ +uv run ruff format src/ tests/ -# Environment management -uv sync # Sync dependencies (production only) -uv sync --dev # Sync with development dependencies ``` -## Code Quality Strategy -- **Pre-commit hooks**: Auto-run ruff to lint AND format /mypy to type check before each commit -- **Local checks**: Always run `ruff format` and `ruff check --fix` before pushing -- **GitHub Actions**: Runs same checks on PRs - should never fail if run locally -- **IDE integration**: Configure your editor to run formatters on save +**Standards:** +- ✅ Pre-commit hooks enforced +- ✅ Type checking enforced (mypy) +- ✅ Linting enforced (ruff) +- ✅ Coverage: 80%+ required +- ✅ CI/CD checks -## Environment Configuration -- **ALWAYS use dotenv**: Use `from dotenv import load_dotenv; load_dotenv()` for environment variables, never use `os.getenv()` directly -- **Never hardcode secrets**: All API keys, emails, and sensitive data must come from .env files -- **Environment precedence**: Constructor params > environment variables > sensible defaults +**When to transition:** After Ring 0 shipped, user feedback received, code patterns stabilizing -## FORBIDDEN Patterns -- Mock data generation for integration tests -- Simulated API responses -- Dummy database connections -- Placeholder implementations -- Integration tests that pass without real integration -- Skipping failing tests with pytest.mark.skip +--- + +## Testing Commands (using uv) + +```bash +# Environment setup +uv sync --dev # Install all dependencies including dev tools + +# Running tests +uv run pytest # All tests +uv run pytest -m unit # Unit tests only (CI uses this) +uv run pytest -m integration # Integration tests (local only) +uv run pytest --cov # With coverage +uv run pytest --cov --cov-fail-under=60 # MVP phase +uv run pytest --cov --cov-fail-under=80 # Post-MVP phase + +# Code quality +uv run mypy src/ # Type check +uv run ruff check --fix src/ tests/ # Lint + auto-fix +uv run ruff format src/ tests/ # Format + +# Documentation +python scripts/check-docs.py # Build and check docs +cd docs && uv run sphinx-build . _build/html -W # Alternative + +# Pre-commit (Post-MVP) +uv run pre-commit install # Install hooks +uv run pre-commit run --all-files # Run on all files + +# Dependencies +uv add requests # Runtime dependency +uv add --dev pytest # Dev dependency +``` + +--- ## Required Test Structure -- Unit tests: tests/unit/ (fast, isolated, no external deps) -- Integration tests: tests/integration/ (environment-dependent behavior) -- Fixtures with real connection setup/teardown -- Coverage minimum: 80% -- All tests must use pytest markers (@pytest.mark.unit or @pytest.mark.integration) + +``` +tests/ +├── unit/ # Fast, isolated, no external deps +│ ├── test_parsers.py +│ ├── test_validators.py +│ └── ... +├── integration/ # Real external services +│ ├── test_perplexity.py +│ ├── test_deepsearch.py +│ └── ... +└── conftest.py # Shared fixtures +``` + +**All tests must use markers:** +```python +@pytest.mark.unit # Unit test +@pytest.mark.integration # Integration test +``` + +**Integration test requirements:** +- Real API connections (no mocks) +- Fail hard if no credentials +- Document expected behavior +- Test error modes (rate limits, network failures) + +--- ## Integration Testing Strategy -**Local Development (Real APIs Only):** -- Integration tests REQUIRE real API keys (e.g. OPENAI_API_KEY, ANTHROPIC_API_KEY) -- Tests FAIL HARD if no API keys are present -- Forces developers to test against real APIs before pushing -- Pre-commit hooks enforce integration test passage -- Command: `uv run pytest -m integration` - -**CI/GitHub Actions (Unit Tests Only):** -- NO integration tests run in CI to avoid mock complexity -- Only unit tests run (fast, no external dependencies) -- Command: `uv run pytest -m unit` + +### Local Development (Real APIs) +- Integration tests REQUIRE real API keys +- Tests FAIL HARD if credentials missing +- Forces validation against real services +- Run: `uv run pytest -m integration` + +### CI/GitHub Actions (Unit Tests Only) +- NO integration tests in CI (avoid API costs/mocking) +- Only unit tests (fast, reliable) +- Run: `uv run pytest -m unit` **Rationale:** -- Local: Mandatory real API testing ensures integration quality -- CI: Simple, fast, reliable unit test validation -- Avoids complex mock maintenance and false confidence -- Forces developers to have working API access +- Local: Mandatory real API testing ensures quality +- CI: Simple, fast validation +- Avoids mock complexity and false confidence + +--- + +## FORBIDDEN Patterns + +**Never:** +- ❌ Mock data for integration tests (use real APIs) +- ❌ Simulated API responses in integration tests +- ❌ Skipping tests with `pytest.mark.skip` (fix or remove) +- ❌ Ring 1+ features before Ring 0 ships +- ❌ Building generic architecture before specific case works +- ❌ Rewriting existing code without documented reason + +**Required:** +- ✅ Real API integration tests from Day 1 +- ✅ Ship Ring 0 within 2-3 weeks +- ✅ Get user feedback before Ring 1 +- ✅ Extend existing code when possible + +--- + +## Architecture Requirements + +### Schema-First Pattern + +**JSON Schema is source of truth: Use it to generate pydanitc models** +```python +# 1. Define JSON schema (not Pydantic) +schema = { + "type": "object", + "properties": {...}, + "additionalProperties": False +} + +# 2. Generate Pydantic model from JSON schema +# Use cellsem-llm-client utilities +Model = create_model_from_json_schema(schema) + +# 3. Validate and correct LLM outputs +result = Model.model_validate(llm_output) # Strict validation +# OR +result = Model.model_validate(llm_output, strict=False) # Drop extra fields with warning +``` + +**Modular schemas:** +- Separate business logic from domain (biology) +- Reusable components +- Shared between core and validation packages + +### Core Libraries + +**Use:** +- `cellsem-llm-client` for LLM agents and generation of pydantic models +- `deepsearch-client` for DeepSearch queries +- `pydantic-ai` for orchestration graphs + +**DeepSearch calls belong in services layer, not scattered through code** + +### Orchestration: PydanticAI Graphs + +```python +# Define workflow as graph +workflow = Graph() +workflow.add_node("query", query_agent) +workflow.add_node("validate", validation_agent) +workflow.add_edge("query", "validate") + +# Declarative, inspectable, debuggable +result = workflow.run(input_data) +``` + +### Declarative Workflows + +**Prefer declarative over imperative:** + +**Prompts in YAML:** +```yaml +# prompts/gene_annotation.yaml +system_prompt: | + You are a gene program annotator. + {instructions} + +user_prompt: | + Analyze these genes: {gene_list} + +presets: + perplexity-sonar: + provider: perplexity + model: sonar-pro + temperature: 0.1 +``` + +**Benefits:** +- Easy to modify without code changes +- Versioned separately from logic +- Testable in isolation +- Self-documenting + +### Transparency & Debuggability + +**Required:** +- ✅ Save intermediate outputs at each step +- ✅ Structured output directory: `outputs/{project}/{query}/{timestamp}/` +- ✅ Ability to resume from any step +- ✅ Dry-run mode (show all prompts/calls without executing) + +**Example:** +```python +# Save intermediate results +def run_workflow(input_data, output_dir, start_step=None): + if start_step is None or start_step <= 1: + result1 = step1(input_data) + save_json(result1, f"{output_dir}/step1_output.json") + else: + result1 = load_json(f"{output_dir}/step1_output.json") + + if start_step is None or start_step <= 2: + result2 = step2(result1) + save_json(result2, f"{output_dir}/step2_output.json") + # ... +``` + +--- + +## Scripts & CLI + +### Core Runner Script + +**Every workflow needs a runner supporting:** + +```bash +# Single run +workflow-runner --input genes.txt --output results/ + +# Batch mode +workflow-runner --batch inputs/ --output results/ + +# Dry run (show plan without executing) +workflow-runner --input genes.txt --dry-run + +# Resume from step +workflow-runner --input genes.txt --output results/ --resume-from step3 +``` + +**Requirements:** +- Distributed with package (installed as console script) +- Single run, batch, and dry-run modes +- Clear progress output +- Error handling with helpful messages + +**Anti-pattern:** Encoding workflow logic in scripts instead of core package + +--- + +## Repository Structure + +### Two-Package Architecture (UV Workspace) + +This project uses **UV workspace** to manage two separately publishable packages: + +``` +{{cookiecutter.project_slug}}/ +├── pyproject.toml # UV workspace configuration +├── src/ +│ ├── {{cookiecutter.package_name}}/ # CORE PACKAGE +│ │ ├── pyproject.toml # Core package config +│ │ └── {{cookiecutter.package_name}}/ # Source code +│ │ ├── __init__.py # Bootstrap with dotenv +│ │ ├── agents/ # Agent orchestration +│ │ │ └── *.prompt.yaml # Co-located prompts +│ │ ├── graphs/ # Workflow graphs (OPTIONAL) +│ │ ├── schemas/ # JSON schemas (source of truth) +│ │ ├── services/ # LLM + API integrations +│ │ │ └── *.prompt.yaml # Co-located prompts +│ │ ├── utils/ # Supporting utilities +│ │ └── validation/ # Cross-cutting validations (OPTIONAL) +│ └── {{cookiecutter.package_name}}_validation_tools/ # VALIDATION PACKAGE (OPTIONAL) +│ ├── pyproject.toml # Validation package config +│ └── {{cookiecutter.package_name}}_validation_tools/ +│ ├── comparisons/ # Compare workflow runs +│ ├── metrics/ # Quality metrics +│ └── visualizations/ # Analysis plots +├── tests/ +│ ├── unit/ +│ └── integration/ +├── docs/ +├── scripts/ +│ └── check-docs.py +├── SCAFFOLD_GUIDE.md # Scaffold decision guide +└── CLAUDE.md # This file +``` + +**Core package** (`{{cookiecutter.package_name}}`): +- **Always keep**: Contains all workflow logic +- **Owns schemas**: Only location for JSON schemas +- **Prompts co-located**: `*.prompt.yaml` files next to agents/services +- Publish: `pip install {{cookiecutter.package_name}}` + +**Validation package** (`{{cookiecutter.package_name}}_validation_tools`): +- **OPTIONAL**: Delete entire directory if Ring 0 doesn't need validation tooling +- **Depends on core**: Imports schemas and models from core package +- **No schema duplication**: Uses `from {{cookiecutter.package_name}}.schemas import ...` +- Publish: `pip install {{cookiecutter.package_name}}-validation-tools` + +**UV Workspace benefits:** +- Single `uv sync` installs both packages in development mode +- Shared lockfile (`uv.lock`) for reproducibility +- Independent publishing to PyPI +- Clear dependency graph (validation → core) + +--- + +## Environment Configuration + +**ALWAYS use dotenv:** +```python +from dotenv import load_dotenv +load_dotenv() + +# Then use os.getenv() +api_key = os.getenv("PERPLEXITY_API_KEY") +``` + +**Precedence:** +1. Constructor parameters (explicit) +2. Environment variables (.env file) +3. Sensible defaults + +**Never:** +- ❌ Hardcode secrets +- ❌ Commit .env files +- ❌ Use `os.getenv()` without `load_dotenv()` + +--- ## Documentation Requirements -- Google-style docstrings for all public functions/classes -- **RST syntax in docstrings**: Use `.. code-block:: python` instead of markdown ```python -- Auto-generated API docs via Sphinx + AutoAPI -- Manual docs in docs/ using MyST markdown -- Build docs: `python scripts/check-docs.py` (recommended) or `sphinx-build docs docs/_build` -- **Always run docs check before committing** to catch RST syntax errors - -## MVP Definition -For each feature: + +**Google-style docstrings:** +```python +def query_deepsearch(gene_list: list[str], model: str = "sonar-pro") -> dict: + """Query DeepSearch API for gene program annotation. + + Args: + gene_list: List of gene symbols to annotate + model: DeepSearch model to use (default: sonar-pro) + + Returns: + Dictionary containing annotation results with keys: + - programs: List of identified gene programs + - citations: Supporting references + - confidence: Confidence scores + + Raises: + APIError: If DeepSearch request fails + ValidationError: If response doesn't match schema + + Example: + .. code-block:: python + + result = query_deepsearch(["TP53", "BRCA1"]) + programs = result["programs"] + """ +``` + +**RST syntax in docstrings:** +- Use `.. code-block:: python` (not markdown backticks) +- Check with: `python scripts/check-docs.py` + +**Documentation structure:** +- Auto-generated API docs (Sphinx + AutoAPI) +- Manual guides in docs/ (MyST markdown) +- ALWAYS run docs check before committing + +--- + +## Planning Requirements + +### For Each Feature + +**Include:** 1. Clear, testable goal -2. Integration test demonstrating real API connection -3. Error handling for actual failure modes (network, malformed data, rate limits) -4. No feature complete until real integration test passes +2. Integration test demonstrating real API behavior +3. Error handling for actual failure modes: + - Network failures + - Malformed data + - Rate limits + - Authentication errors +4. Critique: Potential issues/risks with approach + +**Template:** +```markdown +## Feature: [Name] + +### Goal +[What value does this provide?] + +### Integration Test +[How will we test with real APIs?] + +### Error Modes +- Network failure: [handling] +- Malformed response: [handling] +- Rate limit: [handling] + +### Critique +- Risk 1: [mitigation] +- Risk 2: [mitigation] +``` + +### MVP Definition + +**Each feature is not complete until:** +- ✅ Real integration test passes +- ✅ Error handling implemented +- ✅ Documented in code and docs + +--- + +## Decision Checklist + +**Before writing production code, verify:** + +- [ ] **Week 0 complete?** CONSTRAINTS.md, SCOPE_RINGS.md, CLAUDE.md updated +- [ ] **Is this Ring 0?** If no, STOP until Ring 0 ships +- [ ] **Have we tested the external API?** Integration test first +- [ ] **Can we extend existing code?** Don't rewrite without reason +- [ ] **Is architecture documented in CLAUDE.md?** Agent needs guidance +- [ ] **Are we in Week 3+ without user feedback?** Time to share + +--- + +## Red Flags: Stop and Review + +**Warning signs:** +- [ ] Ring 1+ features before Ring 0 ships +- [ ] Rewriting existing code without documented reason +- [ ] Building custom generic architecture (use/contribute to template) +- [ ] Week 3+ without sharing with users +- [ ] No integration tests with real APIs +- [ ] Missing architectural vision in CLAUDE.md + +**If any checked:** Pause. Review [[development-principles]]. Refocus on Ring 0. + +--- -## Planning +## Success Metrics -Each proposed plan of work should include an MVP and a critique section detailing potential issues/risks with the approach. +**Ship fast:** +- Ring 0 shipped: Week 2-3 (not Week 5+) +- User feedback: Week 3 +- Ring 1 decisions: Based on feedback -## Architecture +**Build right:** +- Integration tests from Day 1 +- Real API validation (no mocks) +- Coverage: 60% (MVP) → 80% (Post-MVP) +- Sustainable patterns (template infrastructure) -Use cellsem-llm-client for agents. -Use deepsearch-client for deepsearch (deepsearch calls belong in services) +**Iterate smart:** +- Rapid experiments + user feedback +- Extend existing code when possible +- Contribute patterns back to template -### Schema first pattern: - - Schema is ALWAYS defined as JSON schema NOT pydantic schema. - - pydantic models are ALWAYS derived from JSON schema - using cellsem-llm-client - - All JSON schema content generated by agents/deepsearch is validated and corrected by pydantic - - where schema specified additionalFields: False, pydantic must be set to drop any additional fields with a warning rather than hard failing. - - JSON schema should be modular with modules strictly separating business logic from biology +--- -### Orchestration: - - Agents are orchestrated via pydanticAI graphs +## References -### Declarative workflows: - - As far as possible, prefer declarative structure: - - Prompts (including system prompts) + interpolation defined in YAML - - YAML files define preset combinations: prompts, provider model, other API metadata (keys and values) - - If declarative use systems for orchestration via pydanticAI graphs are possible, then please use and document them. +- [[development-principles]] - Lessons from Langpa retrospective +- [[workflows]] - Integration with research workflows +- CellSemAgenticWorkflow template repository - [URL] -### Transparency: - - It must be possible to save and inspect intermediate files in workflows - - It must be possible to run workflows starting from multiple steps +--- -### Scripts: - - All workflows should have a core runner script supporting: - - single run - - batch mode - - dry-run (a single run returning a report of all prompts, agents, API calls in order.) - - The runner script must be distributed with the package - - Be careful to avoid encoding workflows in scripts rather than in the core package. +## Customization Checklist -### Repo structure: -The repo must consist of two packages: A core package + a validation package. Where relevant these MUST share schemas. They MUST also share an outputs directory and standard ways to structure these. For example, outputs/{project}/{query}/{timestamp}/(results_files). +**When starting new project from this template:** -Validation package must be capable of basic stats - precision, recall, F1, ROC curves, heat map bubble plots... +- [ ] Fill in "Ring 0 Scope" section above +- [ ] Document architectural decisions +- [ ] List "What NOT to do yet" +- [ ] Complete Week 0 validation +- [ ] Create CONSTRAINTS.md +- [ ] Create SCOPE_RINGS.md +- [ ] Update this CLAUDE.md with project specifics +- [ ] Share this with agent for each session +**This CLAUDE.md guides the agent. Keep it updated as decisions evolve.** diff --git a/{{cookiecutter.project_slug}}/README.md b/{{cookiecutter.project_slug}}/README.md index ada96b2..139a84d 100644 --- a/{{cookiecutter.project_slug}}/README.md +++ b/{{cookiecutter.project_slug}}/README.md @@ -10,6 +10,15 @@ ## 🚀 Quick Start +### Understanding This Scaffold + +This project was generated from a standardized template. **See `SCAFFOLD_GUIDE.md` for**: +- What's **infrastructure** (keep always) vs **optional** (evaluate for your Ring 0) +- What's **example code** (replace with your domain logic) +- **Decision trees** for each component (graphs/, validation/, etc.) + +**Week 0 Task**: Review scaffold and remove components not needed for your Ring 0 MVP. + ### Installation ```bash @@ -61,41 +70,93 @@ bootstrap() Documentation lives in `docs/` and is built with Sphinx + MyST. Run `python scripts/check-docs.py` to build with warnings-as-errors before each commit. Publish the rendered HTML via GitHub Pages or your preferred static host. +## 📦 Package Structure + +This project contains **two independently publishable packages** managed as a UV workspace: + +### Core Package +```bash +pip install {{cookiecutter.package_name}} +``` +Main workflow package with agents, services, and orchestration. **Always keep this package.** + +### Validation Tools (Optional) +```bash +pip install {{cookiecutter.package_name}}-validation-tools +``` +Tools for comparing runs, computing metrics, and visualizing results. + +**Note**: Validation package is **OPTIONAL**. Delete `src/{{cookiecutter.package_name}}_validation_tools/` if not needed for your Ring 0 MVP. See `SCAFFOLD_GUIDE.md` for guidance. + +## 🛠️ Development + +This is a **UV workspace** - a single `uv sync` installs both packages: + +```bash +# Install both packages in development mode +uv sync --dev + +# Run tests for all packages +uv run pytest + +# Lint all packages +uv run ruff check src/ tests/ +``` + ## ✨ Current Features +- ✅ **Two-package architecture** - Core + optional validation tools +- ✅ **UV workspace** - Modern multi-package management - ✅ **Agentic workflow scaffold** with strict TDD guardrails (`CLAUDE.md`) - ✅ **Unit & integration test suites** pre-configured with pytest markers - ✅ **Docs + automation scripts** for Sphinx builds - ✅ **Environment bootstrap** handled via `python-dotenv` -- ✅ **uv-first packaging** (`pyproject.toml` with Ruff, MyPy, pytest config) - ✅ **Integrated clients**: [`cellsem_llm_client`](https://github.com/Cellular-Semantics/cellsem_llm_client) for LLMs and [`deep-research-client`](https://github.com/monarch-initiative/deep-research-client) for Deepsearch workflows - ✅ **Pydantic AI graph orchestration**: `pydantic-ai` agent surfaces graph nodes safely with typed deps +- ✅ **Schema-first design**: JSON schemas → Pydantic models +- ✅ **Prompt co-location**: `*.prompt.yaml` files next to agents/services ## 🏗️ Architecture ``` {{cookiecutter.project_slug}}/ -├── src/{{cookiecutter.package_name}}/ -│ ├── agents/ # Agent classes coordinating workflows -│ ├── graphs/ # Optional workflow graphs powered by Pydantic -│ ├── schemas/ # Shared IO models and contracts -│ └── services/ # LLM + Deepsearch integration layers -├── tests/unit/ # Fast, isolated tests -├── tests/integration/ # Real API + IO validation (no mocks) -├── docs/ # Sphinx configuration and content -└── scripts/ # Tooling helpers (docs, chores, etc.) +├── pyproject.toml # UV workspace config +├── src/ +│ ├── {{cookiecutter.package_name}}/ # CORE PACKAGE +│ │ ├── pyproject.toml +│ │ └── {{cookiecutter.package_name}}/ +│ │ ├── agents/ # Agent orchestration +│ │ ├── graphs/ # Workflow graphs (OPTIONAL) +│ │ ├── schemas/ # JSON schemas (source of truth) +│ │ ├── services/ # LLM + API integrations +│ │ ├── utils/ # Supporting utilities +│ │ └── validation/ # Cross-cutting validations (OPTIONAL) +│ └── {{cookiecutter.package_name}}_validation_tools/ # VALIDATION PACKAGE (OPTIONAL) +│ ├── pyproject.toml +│ └── {{cookiecutter.package_name}}_validation_tools/ +│ ├── comparisons/ # Compare workflow runs +│ ├── metrics/ # Quality metrics +│ └── visualizations/ # Analysis plots +├── tests/ +│ ├── unit/ # Fast, isolated tests +│ └── integration/ # Real API validation (no mocks) +├── docs/ # Sphinx configuration and content +└── scripts/ # Tooling helpers (docs, chores, etc.) ``` -Optional workflow graphs powered by Pydantic ensure orchestration definitions are validated before agents execute them, keeping schema and runtime behaviors aligned. - -- `src/{{cookiecutter.package_name}}/agents`: Agent entrypoints coordinating services and schemas -- `src/{{cookiecutter.package_name}}/graphs`: Optional workflow graphs powered by Pydantic + pydantic-ai -- `src/{{cookiecutter.package_name}}/schemas`: JSON Schema contracts describing outputs + business rules -- `src/{{cookiecutter.package_name}}/services`: Concrete integrations (CellSem LLM client, Deepsearch) -- `src/{{cookiecutter.package_name}}/utils`: Repo-specific tooling/helpers that support workflows without being agents -- `src/{{cookiecutter.package_name}}/validation`: Cross-cutting workflow validations (schema checks, service registration guards) - -Workflow validations live in src/{{cookiecutter.package_name}}/validation. Use this module to centralize logic that inspects graphs, schemas, or services before workflows execute. +**Core package** (always keep): +- `agents/`: Agent classes coordinating workflows (prompts co-located as `*.prompt.yaml`) +- `graphs/`: Optional workflow graphs powered by Pydantic + pydantic-ai +- `schemas/`: JSON Schema contracts (source of truth for data models) +- `services/`: LLM and API integrations (CellSem LLM client, Deepsearch) +- `utils/`: Supporting utilities +- `validation/`: Cross-cutting validations (OPTIONAL - delete if not needed) + +**Validation package** (optional - delete if Ring 0 doesn't need): +- `comparisons/`: Tools for comparing workflow runs +- `metrics/`: Quality metrics (precision, recall, F1, etc.) +- `visualizations/`: Analysis plots (heatmaps, ROC curves, etc.) +- Imports schemas and models from core package (no duplication) ### Graph Agents with pydantic-ai diff --git a/{{cookiecutter.project_slug}}/SCAFFOLD_GUIDE.md b/{{cookiecutter.project_slug}}/SCAFFOLD_GUIDE.md new file mode 100644 index 0000000..a2586ea --- /dev/null +++ b/{{cookiecutter.project_slug}}/SCAFFOLD_GUIDE.md @@ -0,0 +1,275 @@ +# Project Scaffold Guide + +This guide helps you understand the generated project structure and decide what to keep for your Ring 0 MVP. + +## Two-Package Architecture + +This project uses a **two-package structure** managed as a UV workspace for separation of concerns: + +### 1. Core Package: `{{cookiecutter.package_name}}` + +**Always Keep**: This is your main workflow package. + +**Location**: `src/{{cookiecutter.package_name}}/` + +**Contains**: +- Agents, services, orchestration +- Core business logic +- Schemas (source of truth - **only location for schemas**) +- Prompts (co-located with agents/services) + +**Publish**: `pip install {{cookiecutter.package_name}}` + +### 2. Validation Package: `{{cookiecutter.package_name}}_validation_tools` + +**OPTIONAL**: Delete entire directory if not needed for Ring 0. + +**Location**: `src/{{cookiecutter.package_name}}_validation_tools/` + +**Contains**: +- Workflow output comparison tools (`comparisons/`) +- Quality metrics (`metrics/` - precision, recall, F1, etc.) +- Visualizations (`visualizations/` - heatmaps, ROC curves, plots) + +**Depends on**: Core package (imports schemas and models from core) + +**Publish**: `pip install {{cookiecutter.package_name}}-validation-tools` + +**Keep if**: +- Need to compare workflow runs +- Need quality metrics for evaluation +- Need visualizations for analysis +- Building tools for workflow validation + +**Delete if**: +- No comparison/analysis needed in Ring 0 +- Simple workflows without evaluation needs +- Not building validation tooling + +### UV Workspace Benefits + +- **Single `uv sync`**: Installs both packages in development mode +- **Shared lockfile**: `uv.lock` ensures reproducibility across team +- **Local development**: Edit either package, changes reflected immediately +- **Independent publishing**: Each package can be published to PyPI separately +- **Clear dependencies**: Validation depends on core, not vice versa + +--- + +## Infrastructure (Always Keep) + +These components prevent technical debt and ensure consistency across all projects: + +- **`src/{{cookiecutter.package_name}}/__init__.py`** - Bootstrap with dotenv loading +- **`src/{{cookiecutter.package_name}}/schemas/`** - JSON schemas directory (schema-first design) +- **`*.prompt.yaml` files** - Co-located with agents/services that use them +- **`tests/unit/`** and **`tests/integration/`** - Testing structure with pytest markers +- **`pyproject.toml`**, tooling configs - Development infrastructure (ruff, mypy, pytest, sphinx) +- **`.github/workflows/`** - CI/CD pipeline +- **`.githooks/`** - Pre-commit quality checks + +### Prompt File Naming Convention + +**Always use `.prompt.yaml` suffix** for easy identification and discoverability: + +**Examples:** +- `src/{{cookiecutter.package_name}}/agents/annotator.prompt.yaml` - Single prompt for agent +- `src/{{cookiecutter.package_name}}/services/deepsearch.query.prompt.yaml` - Specific purpose +- `src/{{cookiecutter.package_name}}/services/deepsearch.summary.prompt.yaml` - Another purpose + +**Benefits:** +- Easy to find all prompts: `find . -name "*.prompt.yaml"` +- Clear ownership: prompt lives next to code that uses it +- Easy to review: `git diff **/*.prompt.yaml` +- Grepable: search for prompt-related changes across project +- Version controlled: track prompt evolution in git + +--- + +## Optional Components (Evaluate for Ring 0) + +### `graphs/` - Workflow Orchestration + +**Keep if** your Ring 0 needs: +- Multi-step workflows with branching logic +- Complex dependencies between steps +- Dynamic routing based on runtime conditions +- Type-safe workflow definitions + +**Delete if**: +- Single agent or linear flow sufficient +- Simple sequential operations +- No branching or complex routing needed + +**Provided**: Working example of pydantic-ai graph orchestration with typed dependencies + +--- + +### `validation/` - Cross-cutting Validations + +**Keep if** your Ring 0 has: +- Complex validation logic used across multiple services +- Business rules that span multiple components +- Schema validations beyond simple Pydantic models + +**Delete if**: +- Simple validation in service layer is sufficient +- No shared validation logic across components + +**Ring 0 guidance**: Likely not needed. Add in Ring 1+ if you discover duplicated validation logic. + +**Alternative**: Keep validation in service layer until pattern emerges (don't premature abstract). + +**Provided**: Empty directory with README explaining usage patterns + +--- + +### `agents/` - Agent Classes + +**Keep if** your Ring 0 has: +- Multiple agents with shared patterns +- Complex agent coordination +- Agent orchestration needs + +**Delete if**: +- Single simple agent is sufficient +- No shared patterns between agents yet + +**Provided**: Example agent demonstrating schema-first and prompt-first patterns + +--- + +### `deep-research-client` Integration + +**Keep if** your Ring 0 needs: +- Deep research workflows +- Perplexity API integration +- Literature search and synthesis + +**Delete if**: +- Not using deep research capabilities in Ring 0 +- Different research tool needed + +**Action**: Remove from `pyproject.toml` dependencies if not needed + +--- + +## Example Code (Replace with Domain Logic) + +Files and code marked with `# EXAMPLE` comments are working demonstrations: + +- **Purpose**: Show proven patterns (schema-first, prompt-first, co-located prompts) +- **Action**: Replace with your domain-specific logic +- **Keep**: The patterns and infrastructure +- **Maintain**: Test structure and documentation style + +--- + +## Week 0 Checklist + +Use this checklist during your Week 0 validation phase: + +- [ ] **Define Ring 0 scope** in CLAUDE.md (update "Ring 0 Scope" section) +- [ ] **Review each directory** against your Ring 0 requirements +- [ ] **Delete unused optional components** (graphs/, validation/, agents/ if not needed) +- [ ] **Remove unused dependencies** from pyproject.toml (e.g., deep-research-client) +- [ ] **Replace example code** with first real use case +- [ ] **Update README.md** with your project description and purpose +- [ ] **Create .env file** with required API keys +- [ ] **Run integration tests** to validate API access +- [ ] **Document architectural decisions** in CLAUDE.md + +--- + +## Decision Tree + +``` +Is your Ring 0 a single, simple agent? +├─ YES → Delete: graphs/, validation/, example agents/ +│ Keep: schemas/, tests/, tooling, one real agent +│ +└─ NO → Multi-step workflow? + ├─ YES → Keep: graphs/, agents/ + │ Evaluate: validation/ (probably defer to Ring 1) + │ + └─ NO → Linear multi-agent? + Keep: agents/ + Delete: graphs/, validation/ +``` + +--- + +## Common Patterns to Follow + +### 1. Schema-First Design + +```python +# schemas/my_input.schema.json +{ + "$comment": "Define business logic in JSON schema first", + "type": "object", + "properties": { + "query": {"type": "string"}, + "max_results": {"type": "integer", "default": 10} + }, + "required": ["query"] +} + +# Then generate Pydantic model programmatically +# (using cellsem-llm-client utilities) +``` + +### 2. Prompt-First Design + +```yaml +# agents/my_agent.prompt.yaml +system_prompt: | + You are an AI assistant specialized in {domain}. + +user_prompt: | + {task_description} + +presets: + openai-gpt4: + provider: openai + model: gpt-4 + temperature: 0.1 +``` + +```python +# agents/my_agent.py +def load_prompt(prompt_file: str) -> dict: + """Load co-located prompt file.""" + prompt_path = Path(__file__).parent / prompt_file + return yaml.safe_load(prompt_path.read_text()) +``` + +### 3. Test Structure + +```python +# tests/unit/test_parser.py +@pytest.mark.unit +def test_parser_logic(): + """Fast, isolated, no external dependencies.""" + pass + +# tests/integration/test_api.py +@pytest.mark.integration +def test_real_api_connection(): + """Real API, fail hard if no credentials.""" + if not os.getenv("API_KEY"): + pytest.fail("API_KEY required for integration tests") + # Test with real API +``` + +--- + +## Questions? + +- Review `CLAUDE.md` for full development philosophy and ring-based approach +- See directory-specific `README.md` files for detailed component guidance +- Check `README.md` for quick start and architecture overview + +--- + +**Remember**: Infrastructure ≠ premature abstraction. Schema files, prompt files, and testing patterns are infrastructure that prevent technical debt. diff --git a/{{cookiecutter.project_slug}}/pyproject.toml b/{{cookiecutter.project_slug}}/pyproject.toml index 26eecbd..3766b58 100644 --- a/{{cookiecutter.project_slug}}/pyproject.toml +++ b/{{cookiecutter.project_slug}}/pyproject.toml @@ -1,20 +1,12 @@ -[project] -name = "{{cookiecutter.project_slug}}" -version = "0.1.0" -description = "{{cookiecutter.description}}" -readme = "README.md" -requires-python = ">= {{cookiecutter.python_version}}" -dependencies = [ - "python-dotenv>=1.0.1", - "pydantic>=2.7.0", - "pydantic-ai>=1.16.0", - "jsonschema>=4.22.0", - "cellsem-llm-client @ git+https://github.com/Cellular-Semantics/cellsem_llm_client.git@main", - "deep-research-client @ git+https://github.com/monarch-initiative/deep-research-client.git@main", +[tool.uv.workspace] +members = [ + "src/{{cookiecutter.package_name}}", + "src/{{cookiecutter.package_name}}_validation_tools", ] -[project.optional-dependencies] -dev = [ +[tool.uv] +managed = true +dev-dependencies = [ "pytest>=8.0.0", "pytest-cov>=4.1.0", "ruff>=0.5.0", @@ -25,17 +17,8 @@ dev = [ "pre-commit>=3.7.0", ] -[build-system] -requires = ["setuptools>=68.0"] -build-backend = "setuptools.build_meta" - -[tool.setuptools.packages.find] -where = ["src"] - -[tool.setuptools.package-data] -"{{cookiecutter.package_name}}" = ["schemas/*.schema.json"] - [tool.pytest.ini_options] +testpaths = ["tests"] addopts = "-m 'unit'" markers = [ "unit: fast, isolated tests", @@ -53,14 +36,14 @@ select = ["E", "F", "I", "UP", "B", "SIM"] quote-style = "double" [tool.mypy] -packages = ["{{cookiecutter.package_name}}"] +packages = ["{{cookiecutter.package_name}}", "{{cookiecutter.package_name}}_validation_tools"] python_version = "{{cookiecutter.python_version}}" disallow_untyped_defs = true strict_optional = true warn_unused_ignores = true [tool.coverage.run] -source = ["src/{{cookiecutter.package_name}}"] - -[tool.uv] -managed = true +source = [ + "src/{{cookiecutter.package_name}}", + "src/{{cookiecutter.package_name}}_validation_tools", +] diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/README.md new file mode 100644 index 0000000..e3b4fd7 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/README.md @@ -0,0 +1,26 @@ +# {{cookiecutter.project_name}} - Core Package + +Main agentic workflow package containing agents, services, and orchestration logic. + +## Installation + +```bash +pip install {{cookiecutter.package_name}} +``` + +## Usage + +See main project README at repository root for complete documentation. + +## Package Contents + +- **agents/** - Agent classes coordinating workflow execution +- **graphs/** - Workflow orchestration (optional, delete if not needed) +- **schemas/** - JSON schemas (source of truth for data models) +- **services/** - LLM and API integration layer +- **utils/** - Supporting utilities +- **validation/** - Cross-cutting validations (optional) + +## Development + +This package is part of a UV workspace. See repository root for development instructions. diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/pyproject.toml b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/pyproject.toml new file mode 100644 index 0000000..c68dded --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/pyproject.toml @@ -0,0 +1,33 @@ +[project] +name = "{{cookiecutter.package_name}}" +version = "0.1.0" +description = "{{cookiecutter.description}}" +readme = "README.md" +requires-python = ">= {{cookiecutter.python_version}}" +dependencies = [ + "python-dotenv>=1.0.1", + "pydantic>=2.7.0", + "pydantic-ai>=1.16.0", + "jsonschema>=4.22.0", + "pyyaml>=6.0", + "cellsem-llm-client @ git+https://github.com/Cellular-Semantics/cellsem_llm_client.git@main", + "deep-research-client @ git+https://github.com/monarch-initiative/deep-research-client.git@main", +] + +[project.optional-dependencies] +dev = [ + "pytest>=8.0.0", + "pytest-cov>=4.1.0", + "ruff>=0.5.0", + "mypy>=1.10.0", +] + +[build-system] +requires = ["setuptools>=68.0"] +build-backend = "setuptools.build_meta" + +[tool.setuptools.packages.find] +where = ["."] + +[tool.setuptools.package-data] +"{{cookiecutter.package_name}}" = ["schemas/*.schema.json", "**/*.prompt.yaml"] diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/agents/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/agents/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/example_agent.prompt.yaml b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/example_agent.prompt.yaml new file mode 100644 index 0000000..b52193d --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/example_agent.prompt.yaml @@ -0,0 +1,48 @@ +# EXAMPLE: Replace with your domain-specific prompts +# INFRASTRUCTURE: Prompts co-located with agents/services that use them +# Naming convention: {agent_name}.prompt.yaml or {service_name}.{purpose}.prompt.yaml + +system_prompt: | + You are an AI assistant helping with agentic workflow tasks. + + Your role is to process user queries and provide helpful, accurate responses + following the configured parameters. + + EXAMPLE: Replace this system prompt with your domain-specific instructions. + +user_prompt: | + Process the following query: + + Query: {query} + + Provide a comprehensive response based on the available context and your knowledge. + + EXAMPLE: Replace this user prompt template with your domain-specific format. + Use {variable_name} for template variables that will be filled at runtime. + +# Preset configurations for different LLM providers/models +# INFRASTRUCTURE: Define model presets here for easy switching +presets: + openai-gpt4: + provider: openai + model: gpt-4 + temperature: 0.1 + max_tokens: 1000 + + openai-gpt4-turbo: + provider: openai + model: gpt-4-turbo-preview + temperature: 0.1 + max_tokens: 2000 + + anthropic-claude: + provider: anthropic + model: claude-3-sonnet-20240229 + temperature: 0.1 + max_tokens: 1000 + +# EXAMPLE: Add your domain-specific presets +# Choose appropriate models for your use case: +# - Fast/cheap models for simple tasks +# - Powerful models for complex reasoning +# - Specialized models for specific domains diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/example_agent.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/example_agent.py new file mode 100644 index 0000000..6aedea9 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/agents/example_agent.py @@ -0,0 +1,138 @@ +"""Example agent demonstrating infrastructure patterns. + +EXAMPLE: Replace this entire file with your domain-specific agent logic. +INFRASTRUCTURE: The patterns shown here (schema-first, prompt-first, co-located prompts) are standard. +""" + +from __future__ import annotations + +from pathlib import Path +from typing import Any + +import yaml +from pydantic import BaseModel + + +class ExampleInput(BaseModel): + """Example input model. + + EXAMPLE: This would be generated from schemas/example_input.schema.json. + INFRASTRUCTURE: Always generate Pydantic models from JSON schemas programmatically. + """ + + query: str + max_results: int = 10 + + +class ExampleOutput(BaseModel): + """Example output model. + + EXAMPLE: Replace with your domain output structure. + """ + + status: str + result: str + metadata: dict[str, Any] + + +def load_prompt(prompt_file: str) -> dict[str, Any]: + """Load co-located prompt file. + + INFRASTRUCTURE: Always load prompts from YAML files, never hardcode them. + Co-locate prompts with the agents/services that use them. + + Args: + prompt_file: Name of .prompt.yaml file in same directory (e.g., "example_agent.prompt.yaml") + + Returns: + Dictionary containing prompt configuration (system_prompt, user_prompt, presets) + + Example: + .. code-block:: python + + prompt_config = load_prompt("example_agent.prompt.yaml") + system_prompt = prompt_config["system_prompt"] + user_prompt = prompt_config["user_prompt"].format(query="test") + """ + prompt_path = Path(__file__).parent / prompt_file + if not prompt_path.exists(): + raise FileNotFoundError(f"Prompt file not found: {prompt_path}") + return yaml.safe_load(prompt_path.read_text()) + + +def run_example_agent(input_data: ExampleInput) -> ExampleOutput: + """Execute example agent workflow. + + EXAMPLE: Replace with your domain-specific agent logic. + INFRASTRUCTURE: Keep the pattern: + 1. Load co-located prompt + 2. Validate input with Pydantic (from JSON schema) + 3. Execute workflow + 4. Return validated output + + Args: + input_data: Validated input model (generated from JSON schema) + + Returns: + Validated output model + + Example: + .. code-block:: python + + from {{cookiecutter.package_name}}.agents.example_agent import run_example_agent, ExampleInput + + result = run_example_agent( + ExampleInput(query="What is agentic AI?", max_results=5) + ) + print(result.status) # "completed" + """ + # INFRASTRUCTURE: Load co-located prompt + prompt_config = load_prompt("example_agent.prompt.yaml") + + # EXAMPLE: Your actual workflow logic would go here + # - Use prompt_config to build LLM request + # - Call external services (LLM, APIs, etc.) + # - Process results + # - Return validated output + + # For now, just demonstrate the pattern + system_prompt = prompt_config["system_prompt"] + user_prompt = prompt_config["user_prompt"].format(query=input_data.query) + + # Simulated processing (replace with real logic) + result = ExampleOutput( + status="completed", + result=f"Processed query: {input_data.query}", + metadata={ + "max_results": input_data.max_results, + "prompt_used": "example_agent.prompt.yaml", + "model": prompt_config.get("presets", {}).get("openai-gpt4", {}).get("model", "unknown"), + }, + ) + + return result + + +# EXAMPLE: Additional helper functions for your domain +# INFRASTRUCTURE: Keep functions small, testable, well-documented + + +def validate_agent_prerequisites() -> bool: + """Validate agent has required configuration and dependencies. + + INFRASTRUCTURE: Good pattern for startup validation. + + Returns: + True if prerequisites met, False otherwise + """ + # Check prompt file exists + prompt_path = Path(__file__).parent / "example_agent.prompt.yaml" + if not prompt_path.exists(): + return False + + # Add other prerequisite checks as needed + # - API keys present + # - Required services available + # - Configuration valid + + return True diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/README.md new file mode 100644 index 0000000..5b68e6a --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/README.md @@ -0,0 +1,124 @@ +# Graphs - Workflow Orchestration + +**Status**: OPTIONAL + +## Purpose + +This directory contains workflow orchestration logic using `pydantic-ai` for type-safe, declarative multi-step workflows. + +## Keep This Directory If + +Your Ring 0 MVP needs: + +- ✅ **Multi-step workflows** with branching logic +- ✅ **Complex dependencies** between workflow steps +- ✅ **Dynamic routing** based on runtime conditions +- ✅ **Type-safe workflow definitions** validated at parse time +- ✅ **Inspectable orchestration** for debugging and visualization + +## Delete This Directory If + +Your Ring 0 MVP has: + +- ❌ **Single agent** with no workflow orchestration +- ❌ **Simple linear flow** (step 1 → step 2 → step 3) +- ❌ **No branching** or conditional routing needed + +## What's Provided + +### `definitions.py` +- `GraphNode`: Atomic unit representing a workflow step +- `WorkflowGraph`: Declarative workflow definition with entrypoint and nodes +- `route()` method: Navigate the graph by node ID + +### `graph_agent.py` +- Pydantic AI agent for graph-based orchestration +- Typed dependencies (`GraphDependencies`) +- Structured output (`GraphNode`) +- Tool registration for graph navigation + +## Example Usage + +```python +from {{cookiecutter.package_name}}.graphs import ( + WorkflowGraph, + GraphNode, + build_graph_agent, + GraphDependencies +) + +# Define workflow declaratively +workflow = WorkflowGraph( + name="research_workflow", + entrypoint="query", + nodes=[ + GraphNode( + id="query", + description="Query knowledge base", + service="deepsearch_service", + next=["analyze"] + ), + GraphNode( + id="analyze", + description="Analyze results", + service="analysis_service", + next=["summarize"] + ), + GraphNode( + id="summarize", + description="Generate summary", + service="summary_service" + ), + ], +) + +# Execute with pydantic-ai agent +agent = build_graph_agent() +result = agent.run_sync( + "Navigate to next node", + deps=GraphDependencies(graph=workflow, current_node="query") +) +``` + +## When to Add This in Ring 0 + +**Add immediately if**: +- Your core value proposition involves multi-step orchestration +- You're building a workflow engine or pipeline system +- Branching logic is fundamental to your MVP + +**Defer to Ring 1+ if**: +- You can ship Ring 0 with a linear flow +- Orchestration complexity isn't core to initial value +- You're still discovering the workflow structure + +## Alternatives for Simple Cases + +For linear workflows, consider: + +```python +# Simple function composition (no graphs needed) +def simple_workflow(input_data: str) -> dict: + step1_result = query_service(input_data) + step2_result = analyze_service(step1_result) + return summarize_service(step2_result) +``` + +Only reach for graph orchestration when you have **proven need** for: +- Branching/conditional logic +- Dynamic routing +- Workflow reusability +- Complex dependencies + +## Architecture Notes + +- **Declarative**: Workflows defined as data (inspectable, serializable) +- **Type-safe**: Pydantic validation at definition time +- **Testable**: Mock individual nodes, test routing logic independently +- **Observable**: Easy to add logging, metrics at node transitions + +## See Also + +- `SCAFFOLD_GUIDE.md` - Full decision tree for optional components +- `CLAUDE.md` - Ring-based development philosophy +- [Pydantic AI Docs](https://github.com/pydantic/pydantic-ai) - Framework documentation diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/definitions.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/definitions.py similarity index 76% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/definitions.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/definitions.py index 3933985..7d10522 100644 --- a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/definitions.py +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/definitions.py @@ -1,4 +1,10 @@ -"""Pydantic-powered graph primitives for orchestrating workflows.""" +"""Pydantic-powered graph primitives for orchestrating workflows. + +OPTIONAL: Delete entire graphs/ directory if Ring 0 doesn't need workflow orchestration. +INFRASTRUCTURE: If you keep this, the pattern (Pydantic models, typed dependencies) is standard. + +See: src/{{cookiecutter.package_name}}/graphs/README.md for guidance on when to use. +""" from __future__ import annotations diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/graph_agent.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/graph_agent.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/graphs/graph_agent.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/graphs/graph_agent.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/schemas/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/schemas/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/example_input.schema.json b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/example_input.schema.json new file mode 100644 index 0000000..a509017 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/example_input.schema.json @@ -0,0 +1,28 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "example_input.schema.json", + "$comment": "EXAMPLE: Replace with your domain-specific input schema. INFRASTRUCTURE: Always define schemas in separate JSON files, then generate Pydantic models programmatically using cellsem-llm-client utilities.", + "title": "ExampleInput", + "description": "Example input schema demonstrating schema-first design pattern", + "type": "object", + "properties": { + "query": { + "type": "string", + "description": "The user query to process", + "minLength": 1, + "examples": [ + "What is agentic AI?", + "Explain multi-agent systems" + ] + }, + "max_results": { + "type": "integer", + "description": "Maximum number of results to return", + "minimum": 1, + "maximum": 100, + "default": 10 + } + }, + "required": ["query"], + "additionalProperties": false +} diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/schemas/workflow_output.schema.json b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/workflow_output.schema.json similarity index 85% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/schemas/workflow_output.schema.json rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/workflow_output.schema.json index 0f145be..33e2dbf 100644 --- a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/schemas/workflow_output.schema.json +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/schemas/workflow_output.schema.json @@ -1,5 +1,6 @@ { "$schema": "https://json-schema.org/draft/2020-12/schema", + "$comment": "EXAMPLE: Replace with your domain-specific workflow output schema. INFRASTRUCTURE: Always define schemas in separate JSON files, then generate Pydantic models programmatically using cellsem-llm-client utilities.", "title": "WorkflowOutput", "description": "Canonical structured response produced by {{cookiecutter.project_name}} agents.", "type": "object", diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/services/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/services/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/services/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/services/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/utils/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/utils/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/utils/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/utils/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/validation/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/validation/README.md new file mode 100644 index 0000000..1606914 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/validation/README.md @@ -0,0 +1,148 @@ +# Validation - Cross-cutting Concerns + +**Status**: OPTIONAL (currently empty) + +## Purpose + +This directory is for validation logic that is: +- **Shared** across multiple services/agents +- **Cross-cutting** (not specific to one component) +- **Complex** enough to warrant centralization + +## Keep This Directory If + +Your Ring 0 MVP has: + +- ✅ **Complex business rules** used across multiple components +- ✅ **Schema validations** beyond simple Pydantic models +- ✅ **Data quality checks** applied to multiple data sources +- ✅ **Compliance validations** (HIPAA, GDPR, etc.) affecting multiple services + +## Delete This Directory If + +Your Ring 0 MVP has: + +- ❌ **Simple validations** handled by Pydantic models +- ❌ **Service-specific validation** (keep in service layer) +- ❌ **No duplicated validation logic** across components + +## Ring 0 Guidance + +**Most projects should DELETE this directory for Ring 0.** + +Why? Because: +1. You likely don't have duplicated validation logic yet +2. Premature abstraction adds complexity +3. Better to keep validation in service layer until patterns emerge + +**Add validation/ in Ring 1+** when you discover: +- Same validation logic copy-pasted across 2+ components +- Complex validation rules that deserve their own module +- Clear separation between business logic and validation needed + +## Example Use Cases (Ring 1+) + +### Service Registration Validation + +```python +# validation/service_registry.py +def ensure_services_registered( + service_names: list[str], + available: list[str] +) -> None: + """Validate all required services are registered.""" + missing = set(service_names) - set(available) + if missing: + raise ValueError(f"Missing services: {missing}") +``` + +### Workflow Output Validation + +```python +# validation/workflow_output.py +from jsonschema import validate +from {{cookiecutter.package_name}}.schemas import load_schema + +def validate_workflow_output(data: dict) -> None: + """Validate output against workflow_output.schema.json.""" + schema = load_schema("workflow_output.schema.json") + validate(instance=data, schema=schema) +``` + +### Cross-service Data Quality + +```python +# validation/data_quality.py +def validate_gene_list(genes: list[str]) -> list[str]: + """Validate and normalize gene symbols across services.""" + # Complex validation shared by multiple services + # - Format checking + # - Normalization + # - Duplicate removal + pass +``` + +## Alternative Approaches for Ring 0 + +### Option 1: Validation in Pydantic Models + +```python +# Simple validations in domain models +from pydantic import BaseModel, field_validator + +class GeneQuery(BaseModel): + genes: list[str] + + @field_validator('genes') + @classmethod + def validate_gene_format(cls, v: list[str]) -> list[str]: + # Validation logic here + return v +``` + +### Option 2: Validation in Service Layer + +```python +# Keep validation close to where it's used +class DeepSearchService: + def query(self, genes: list[str]) -> dict: + # Validate inputs + self._validate_genes(genes) + # Execute query + return self._execute_query(genes) + + def _validate_genes(self, genes: list[str]) -> None: + # Service-specific validation + pass +``` + +## When to Centralize Validation + +Wait until you see these patterns: + +1. **Duplication**: Same validation in 2+ places +2. **Complexity**: Validation logic is complex enough to test independently +3. **Reuse**: Multiple services need identical validation +4. **Compliance**: Regulatory requirements span multiple components + +## Decision Tree + +``` +Do you have validation logic used in 2+ components? +├─ NO → DELETE validation/ directory +│ Keep validation in Pydantic models or service methods +│ +└─ YES → Is it complex enough to test independently? + ├─ NO → Consider extracting to shared utility first + │ Don't need full validation/ module yet + │ + └─ YES → Move to validation/ directory + Add comprehensive tests + Document validation rules +``` + +## See Also + +- `SCAFFOLD_GUIDE.md` - Full decision tree for optional components +- `CLAUDE.md` - Ring-based development philosophy (defer abstraction) +- [Pydantic Validators](https://docs.pydantic.dev/latest/concepts/validators/) - Built-in validation patterns diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/validation/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/validation/__init__.py similarity index 100% rename from {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/validation/__init__.py rename to {{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}/{{cookiecutter.package_name}}/validation/__init__.py diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/README.md new file mode 100644 index 0000000..8c02d35 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/README.md @@ -0,0 +1,40 @@ +# {{cookiecutter.project_name}} - Validation Tools + +**Status**: OPTIONAL + +Validation and analysis tools for comparing runs, computing metrics, and visualizing results. + +## Delete This Package If + +Your Ring 0 MVP doesn't need: +- Workflow output comparison +- Quality metrics (precision, recall, etc.) +- Visualizations and analysis + +See `SCAFFOLD_GUIDE.md` for guidance. + +## Installation + +```bash +pip install {{cookiecutter.package_name}}-validation-tools +``` + +## Structure + +- **comparisons/** - Compare workflow outputs across runs +- **metrics/** - Quality metrics (precision, recall, F1, etc.) +- **visualizations/** - Plots, heatmaps, ROC curves + +## Usage + +This package imports schemas and models from the core package: + +```python +from {{cookiecutter.package_name}}.schemas import load_schema +from {{cookiecutter.package_name}}_validation_tools.metrics import calculate_f1 +from {{cookiecutter.package_name}}_validation_tools.visualizations import plot_heatmap +``` + +## Development + +This package is part of a UV workspace. See repository root for development instructions. diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/pyproject.toml b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/pyproject.toml new file mode 100644 index 0000000..5ca0389 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/pyproject.toml @@ -0,0 +1,27 @@ +[project] +name = "{{cookiecutter.package_name}}-validation-tools" +version = "0.1.0" +description = "Validation and analysis tools for {{cookiecutter.project_name}}" +readme = "README.md" +requires-python = ">= {{cookiecutter.python_version}}" +dependencies = [ + "{{cookiecutter.package_name}}", + "pandas>=2.0.0", + "numpy>=1.24.0", + "matplotlib>=3.7.0", + "seaborn>=0.12.0", + "scikit-learn>=1.3.0", +] + +[project.optional-dependencies] +dev = [ + "pytest>=8.0.0", + "pytest-cov>=4.1.0", +] + +[build-system] +requires = ["setuptools>=68.0"] +build-backend = "setuptools.build_meta" + +[tool.setuptools.packages.find] +where = ["."] diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/__init__.py new file mode 100644 index 0000000..9b4191f --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/__init__.py @@ -0,0 +1,11 @@ +"""Validation and analysis tools for {{cookiecutter.project_name}}. + +OPTIONAL: Delete this entire package if not needed for your Ring 0 MVP. + +This package provides tools for validating, comparing, and analyzing +workflow outputs from the {{cookiecutter.package_name}} core package. + +Schemas are imported from the core package - no duplication. +""" + +__version__ = "0.1.0" diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/comparisons/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/comparisons/README.md new file mode 100644 index 0000000..71ac22a --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/comparisons/README.md @@ -0,0 +1,24 @@ +# Comparisons + +Tools for comparing workflow runs and outputs. + +## Use Cases + +- Compare results from different workflow versions +- Diff analysis between parameter configurations +- Side-by-side output comparison +- Regression testing (current vs baseline) + +## Example + +```python +from {{cookiecutter.package_name}}_validation_tools.comparisons import compare_runs + +result = compare_runs( + run1_output="outputs/run1/final.json", + run2_output="outputs/run2/final.json" +) + +print(result.differences) +print(result.similarity_score) +``` diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/comparisons/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/comparisons/__init__.py new file mode 100644 index 0000000..c841ff4 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/comparisons/__init__.py @@ -0,0 +1,9 @@ +"""Tools for comparing workflow runs and outputs. + +EXAMPLE: Add your domain-specific comparison logic here. + +Common patterns: +- Compare outputs from different workflow runs +- Diff analysis between versions +- Side-by-side result comparison +""" diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/metrics/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/metrics/README.md new file mode 100644 index 0000000..8b59045 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/metrics/README.md @@ -0,0 +1,27 @@ +# Metrics + +Quality metrics for evaluating workflow performance. + +## Common Metrics + +- **Precision**: True positives / (True positives + False positives) +- **Recall**: True positives / (True positives + False negatives) +- **F1 Score**: Harmonic mean of precision and recall +- **Accuracy**: Correct predictions / Total predictions +- **ROC-AUC**: Area under ROC curve +- **PR-AUC**: Area under precision-recall curve + +## Example + +```python +from {{cookiecutter.package_name}}_validation_tools.metrics import calculate_metrics + +metrics = calculate_metrics( + predictions=workflow_output, + ground_truth=gold_standard +) + +print(f"Precision: {metrics.precision:.3f}") +print(f"Recall: {metrics.recall:.3f}") +print(f"F1: {metrics.f1:.3f}") +``` diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/metrics/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/metrics/__init__.py new file mode 100644 index 0000000..b564f5d --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/metrics/__init__.py @@ -0,0 +1,10 @@ +"""Metrics for evaluating workflow quality. + +EXAMPLE: Add your domain-specific metrics here. + +Common metrics: +- Precision, recall, F1 score +- Accuracy, specificity, sensitivity +- ROC-AUC, PR-AUC +- Custom domain metrics +""" diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/visualizations/README.md b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/visualizations/README.md new file mode 100644 index 0000000..47310c1 --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/visualizations/README.md @@ -0,0 +1,26 @@ +# Visualizations + +Analysis visualizations for workflow results. + +## Available Visualizations + +- **Heatmaps**: Result matrices, correlation analysis +- **ROC Curves**: Classification performance +- **Precision-Recall Curves**: Threshold analysis +- **Confusion Matrices**: Classification breakdown +- **Time Series**: Performance over time +- **Comparison Charts**: Side-by-side analysis + +## Example + +```python +from {{cookiecutter.package_name}}_validation_tools.visualizations import plot_roc_curve + +fig = plot_roc_curve( + y_true=ground_truth, + y_scores=predictions, + title="Workflow Performance" +) + +fig.savefig("outputs/roc_curve.png") +``` diff --git a/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/visualizations/__init__.py b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/visualizations/__init__.py new file mode 100644 index 0000000..22fe86d --- /dev/null +++ b/{{cookiecutter.project_slug}}/src/{{cookiecutter.package_name}}_validation_tools/{{cookiecutter.package_name}}_validation_tools/visualizations/__init__.py @@ -0,0 +1,12 @@ +"""Visualizations for workflow analysis. + +EXAMPLE: Add your domain-specific visualizations here. + +Common visualizations: +- Heatmaps of results +- ROC curves +- Confusion matrices +- Precision-recall curves +- Time series plots +- Comparison charts +"""