From a3150be2a2ac99cf301d5bcba8cc794db6a7d23b Mon Sep 17 00:00:00 2001 From: Richard Vanderpool <49568690+rvanderp3@users.noreply.github.com> Date: Fri, 1 May 2026 09:50:49 -0400 Subject: [PATCH 1/4] Add AI-optimized documentation (ai-docs) Introduce AI-optimized documentation for the Splat Team, following the agentic-docs pattern from OpenShift enhancements. This provides structured, retrieval-first documentation for AI coding agents. Structure: - AGENTS.md: Master navigation index (211 lines) - ai-docs/TEAM_PHILOSOPHY.md: Core team principles and methodology - ai-docs/architecture/: Projects overview and tech stack - ai-docs/statuses/: Complete status workflow state machine - ai-docs/roles/: Hat-switching guide for superman role - ai-docs/workflows/: Process guides (indexes created, content coming) - ai-docs/practices/: Engineering standards (indexes created) - ai-docs/decisions/: ADR template and index - ai-docs/references/: Glossary and quick reference Key Features: - Retrieval-first guidance (read docs, not training data) - Task-based navigation flows - Explicit hat-switching for superman role - Human gate documentation - vSphere platform focus - OpenShift CI/CD integration Total files: 14 new markdown files Total lines: ~3,200 lines Generated using prototype ai-docs generation skill based on scrum-compact profile and codebase analysis. Co-Authored-By: Claude Sonnet 4.5 --- AGENTS.md | 211 ++++++++++ ai-docs/TEAM_PHILOSOPHY.md | 221 ++++++++++ ai-docs/architecture/index.md | 23 ++ ai-docs/architecture/projects.md | 327 +++++++++++++++ ai-docs/decisions/adr-template.md | 85 ++++ ai-docs/decisions/index.md | 57 +++ ai-docs/practices/index.md | 34 ++ ai-docs/references/glossary.md | 221 ++++++++++ ai-docs/references/index.md | 31 ++ ai-docs/roles/index.md | 36 ++ ai-docs/roles/responsibilities.md | 459 +++++++++++++++++++++ ai-docs/statuses/index.md | 36 ++ ai-docs/statuses/transitions.md | 642 ++++++++++++++++++++++++++++++ ai-docs/workflows/index.md | 24 ++ 14 files changed, 2407 insertions(+) create mode 100644 AGENTS.md create mode 100644 ai-docs/TEAM_PHILOSOPHY.md create mode 100644 ai-docs/architecture/index.md create mode 100644 ai-docs/architecture/projects.md create mode 100644 ai-docs/decisions/adr-template.md create mode 100644 ai-docs/decisions/index.md create mode 100644 ai-docs/practices/index.md create mode 100644 ai-docs/references/glossary.md create mode 100644 ai-docs/references/index.md create mode 100644 ai-docs/roles/index.md create mode 100644 ai-docs/roles/responsibilities.md create mode 100644 ai-docs/statuses/index.md create mode 100644 ai-docs/statuses/transitions.md create mode 100644 ai-docs/workflows/index.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..1e1eff1 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,211 @@ +# Splat Team - Agent Navigation + +**Profile:** scrum-compact +**Repository:** https://github.com/openshift-splat-team/splat-team +**Last Updated:** 2026-05-01 + +--- + +## CRITICAL: Retrieval Strategy + +**IMPORTANT**: Prefer retrieval-led reasoning over pre-training-led reasoning. + +When working on Splat Team: +- ✅ **DO**: Read relevant docs from `./ai-docs/` first +- ✅ **DO**: Check team workflows in `./ai-docs/workflows/` +- ✅ **DO**: Verify status transitions in `./ai-docs/statuses/` +- ✅ **DO**: Review project-specific context in `./projects//` +- ❌ **DON'T**: Rely solely on training data +- ❌ **DON'T**: Guess at team processes or status meanings + +--- + +## AI Navigation: DON'T Read All Docs + +**Read 4-5 docs per task, not everything.** + +### Common Task Flows + +**Starting new epic?** +→ `ai-docs/workflows/epic-breakdown.md` → `ai-docs/statuses/transitions.md` → `PROCESS.md` + +**Implementing story?** +→ `ai-docs/workflows/sprint-process.md` → `ai-docs/practices/coding-standards.md` → `ai-docs/architecture/projects.md` + +**Working on specific project (e.g., installer)?** +→ `projects/installer/CLAUDE.md` → `ai-docs/architecture/projects.md` → `ai-docs/practices/testing.md` + +**Triaging issue or regression?** +→ `ai-docs/workflows/triage-process.md` → `coding-agent/skills/triage-regression/` + +**Need role context?** +→ `ai-docs/roles/responsibilities.md` + +--- + +## Quick Navigation by Task + +| Task | Start Here | Then Read | +|------|-----------|-----------| +| **Epic breakdown** | `ai-docs/workflows/epic-breakdown.md` | `ai-docs/statuses/transitions.md` | +| **Story implementation** | `ai-docs/workflows/sprint-process.md` | `projects//CLAUDE.md` | +| **Code review** | `ai-docs/workflows/review-process.md` | `ai-docs/practices/coding-standards.md` | +| **Triage regression** | `ai-docs/workflows/triage-process.md` | `coding-agent/skills/triage-regression/` | +| **Process improvement** | `ai-docs/roles/team-manager.md` | `PROCESS.md` | + +--- + +## Team Focus + +**Mission:** OpenShift vSphere/VMware platform engineering and CI/CD + +**Key Projects:** +- **vcf-migration-operator** - VMware Cloud Foundation migration tooling +- **installer** - OpenShift installer (vSphere provider support) +- **machine-api-operator** - vSphere machine provisioning +- **cluster-cloud-controller-manager-operator** - vSphere CCM integration +- **cloud-credential-operator** - vSphere credential management +- **vsphere-problem-detector** - Platform health diagnostics +- **opct** - OpenShift Provider Certification Tool + +See `ai-docs/architecture/projects.md` for full project list and details. + +--- + +## Technology Stack + +**Primary Languages:** Go (OpenShift operators), Python (automation/skills) +**Testing:** Go testing, pytest, Jest +**CI/CD:** Prow, GitHub Actions +**Documentation:** MkDocs + +**See:** `ai-docs/architecture/tech-stack.md` for details + +--- + +## Team Structure + +### Roles + +| Role | Description | When Active | +|------|-------------|-------------| +| **superman** | All-in-one member (PO, architect, dev, QE, SRE, content writer) | All tasks | +| **team-manager** | Process improvement and team coordination | Process tasks only | + +See `ai-docs/roles/responsibilities.md` for detailed hat-switching guide. + +--- + +## Status Workflow + +**Epic Flow:** +``` +po:triage → po:backlog → arch:design → lead:design-review → +arch:plan → lead:plan-review → arch:breakdown → po:ready → +arch:in-progress → po:accept → done +``` + +**Story Flow:** +``` +dev:ready → qe:test-design → dev:implement → dev:code-review → +qe:verify → arch:sign-off → po:merge → done +``` + +**Human Gates:** +- `lead:design-review` - Human must approve design doc +- `lead:plan-review` - Human must approve story breakdown +- `dev:code-review` - Human must approve PR + +See `ai-docs/statuses/transitions.md` for full state machine. + +--- + +## Core Documentation + +| Topic | File | Description | +|-------|------|-------------| +| **Team principles** | `ai-docs/TEAM_PHILOSOPHY.md` | Methodology and values | +| **Sprint process** | `ai-docs/workflows/sprint-process.md` | Ceremonies and cadence | +| **Epic breakdown** | `ai-docs/workflows/epic-breakdown.md` | Design → Stories flow | +| **Review process** | `ai-docs/workflows/review-process.md` | PR and acceptance criteria | +| **Triage process** | `ai-docs/workflows/triage-process.md` | Issue triage and regression handling | +| **Projects overview** | `ai-docs/architecture/projects.md` | Project descriptions and context | +| **Coding standards** | `ai-docs/practices/coding-standards.md` | Code conventions | +| **Testing guide** | `ai-docs/practices/testing.md` | Test strategy | +| **Custom skills** | `ai-docs/architecture/skills.md` | Team automation tools | + +--- + +## Custom Skills + +The team has specialized skills for OpenShift CI/CD: + +| Skill | Purpose | Location | +|-------|---------|----------| +| **triage-regression** | Triage CI failures and regressions | `coding-agent/skills/triage-regression/` | +| **summarize-jiras** | Summarize related Jira issues | `coding-agent/skills/summarize-jiras/` | +| **suggest-reviewers** | Suggest PR reviewers via git blame | `coding-agent/skills/suggest-reviewers/` | +| **prow-job-analyze-resource** | Analyze Prow job resources | `coding-agent/skills/prow-job-analyze-resource/` | +| **prow-job-extract-must-gather** | Extract must-gather from Prow jobs | `coding-agent/skills/prow-job-extract-must-gather/` | + +See `ai-docs/architecture/skills.md` for usage details. + +--- + +## Documentation Structure + +``` +ai-docs/ +├── TEAM_PHILOSOPHY.md # Core principles +├── architecture/ # System structure +│ ├── index.md +│ ├── projects.md # Project descriptions +│ ├── tech-stack.md # Technologies used +│ └── skills.md # Custom automation tools +├── workflows/ # Process guides +│ ├── index.md +│ ├── sprint-process.md # Sprint ceremonies +│ ├── epic-breakdown.md # Epic → Stories +│ ├── review-process.md # PR review +│ └── triage-process.md # Issue triage +├── roles/ # Role definitions +│ ├── index.md +│ ├── responsibilities.md # Hat-switching guide +│ └── team-manager.md # Process coordination +├── statuses/ # Status system +│ ├── index.md +│ └── transitions.md # State machine +├── practices/ # Engineering practices +│ ├── index.md +│ ├── coding-standards.md # Code conventions +│ ├── testing.md # Test strategy +│ └── ci-cd.md # Prow and GitHub Actions +├── decisions/ # Architectural Decision Records +│ ├── index.md +│ └── adr-template.md # ADR template +└── references/ # Quick reference + ├── index.md + ├── glossary.md # Team terminology + └── shortcuts.md # Common commands +``` + +--- + +## Project-Specific Context + +Each forked project has its own context: + +``` +projects// +├── CLAUDE.md # Project-specific agent guidance +├── .context.md # Additional context (if present) +└── ...project files... +``` + +**Always read `projects//CLAUDE.md` before working on that project.** + +--- + +**Navigation**: Start with `ai-docs/TEAM_PHILOSOPHY.md` for team context. + +**Feedback**: Report issues via GitHub issues with `kind/docs` label. diff --git a/ai-docs/TEAM_PHILOSOPHY.md b/ai-docs/TEAM_PHILOSOPHY.md new file mode 100644 index 0000000..1916b79 --- /dev/null +++ b/ai-docs/TEAM_PHILOSOPHY.md @@ -0,0 +1,221 @@ +# Splat Team Philosophy + +**Profile:** scrum-compact +**Last Updated:** 2026-05-01 + +--- + +## Mission + +Build and maintain OpenShift's vSphere/VMware platform integration, ensuring reliable installation, operation, and troubleshooting capabilities for enterprise customers running OpenShift on VMware infrastructure. + +--- + +## Core Principles + +### 1. Solo Operator, Multiple Hats + +**Principle:** One agent (superman) wears all hats — PO, architect, developer, QE, SRE, content writer. + +**Why:** Compact team profile optimized for single-member operation with full lifecycle ownership. + +**How to Apply:** +- Explicitly note which hat you're wearing when transitioning tasks +- Follow the status workflow to self-advance through states +- Wait for human gates at design review, plan review, and PR approval + +--- + +### 2. GitHub-Centric Process + +**Principle:** Everything happens via GitHub — issues, projects, PRs, reviews, status tracking. + +**Why:** Single source of truth, human review integrated at key gates, full audit trail. + +**How to Apply:** +- All work tracked as GitHub issues on team repo +- Status via GitHub Projects v2 Status field +- Human reviews via PR reviews and issue comments +- Use `gh` skill for all GitHub operations + +--- + +### 3. Human-in-the-Loop Design + +**Principle:** AI proposes, human approves at critical gates. + +**Why:** Maintain quality, alignment, and human oversight while maximizing automation. + +**Human Gates:** +- `lead:design-review` - Design doc approval +- `lead:plan-review` - Story breakdown approval +- `dev:code-review` - PR approval before merge + +**Auto-Advance:** +- `arch:sign-off` - Auto-advance after tests pass +- `po:merge` - Auto-merge after human PR approval + +--- + +### 4. OpenShift CI/CD Native + +**Principle:** Deeply integrated with OpenShift development workflows (Prow, must-gather, Jira). + +**Why:** We maintain OpenShift operators — must follow upstream conventions and tooling. + +**How to Apply:** +- Use Prow for CI/CD (not GitHub Actions for operator testing) +- Follow OpenShift enhancement process for design docs +- Use must-gather for debugging and diagnostics +- Link work to Jira issues when working on upstream bugs + +--- + +### 5. Forked Project Model + +**Principle:** Work happens in forked repos under `openshift-splat-team` org. + +**Why:** Isolate team work from upstream, enable independent testing, prepare PRs for upstream. + +**Workflow:** +1. Fork upstream OpenShift repo → `openshift-splat-team/` +2. Work in fork (issues, branches, PRs) +3. Test in fork's Prow environment +4. Submit PR to upstream when ready + +**See:** `architecture/projects.md` for active forks. + +--- + +### 6. Test-First vSphere Focus + +**Principle:** All changes must include vSphere-specific tests and validation. + +**Why:** vSphere platform has unique failure modes — can't rely solely on generic tests. + +**How to Apply:** +- Add vSphere e2e tests for new features +- Test against real vSphere environments (not just mocks) +- Include must-gather diagnostics for debugging +- Validate upgrade paths (N → N+1) + +--- + +### 7. Continuous Documentation + +**Principle:** Document as you build — design docs, ADRs, code comments, user guides. + +**Why:** Complex platform integration requires clear documentation for troubleshooting and knowledge transfer. + +**Artifacts:** +- Design docs (in epics, linked from `lead:design-review` status) +- ADRs for architectural decisions (`ai-docs/decisions/`) +- MkDocs for user-facing guides (`docs/`) +- Code comments for non-obvious vSphere behavior + +--- + +## Team Workflow Patterns + +### Epic → Stories Breakdown + +1. **Epic created** (`po:triage`) +2. **Design doc written** (`arch:design`) - Architecture, approach, risks +3. **Human reviews design** (`lead:design-review`) - **Human gate** +4. **Story breakdown proposed** (`arch:plan`) - List of implementation stories +5. **Human reviews plan** (`lead:plan-review`) - **Human gate** +6. **Stories created** (`arch:breakdown`) - Individual GitHub issues +7. **Stories enter backlog** (`po:ready`) - Ready for sprint planning + +### Story → Merge Flow + +1. **Story ready** (`dev:ready`) +2. **Tests designed** (`qe:test-design`) - QE hat writes test plan +3. **Code implemented** (`dev:implement`) - Dev hat writes code +4. **PR opened** (`dev:code-review`) - Awaits human review +5. **Human approves PR** - **Human gate** +6. **Tests verified** (`qe:verify`) - QE hat confirms tests pass +7. **Architect signs off** (`arch:sign-off`) - Auto-advance +8. **Merge** (`po:merge`) - Auto-merge after approval + +--- + +## Anti-Patterns to Avoid + +❌ **Starting implementation before design approval** +- Wait for `lead:design-review` → approved + +❌ **Skipping test design phase** +- QE hat must write test plan in `qe:test-design` + +❌ **Merging without human PR review** +- `dev:code-review` requires human approval (blocking) + +❌ **Generic tests for vSphere features** +- Always include vSphere-specific validation + +❌ **Working directly in upstream repos** +- Use forked repos under `openshift-splat-team/` + +❌ **Forgetting must-gather updates** +- New features need must-gather collection logic + +--- + +## Decision-Making Framework + +**Tier 1: Autonomous (no human approval)** +- Test selection and design +- Code structure and naming +- Dependency choices within approved stack +- Documentation structure +- Refactoring (no behavior change) + +**Tier 2: Proposed (human approves)** +- Epic design approach +- Story breakdown +- API changes (even minor) +- New dependencies outside approved stack +- Process changes + +**Tier 3: Escalated (human decides)** +- Architectural pivots +- Breaking changes +- Security implications +- Upstream policy changes + +--- + +## Success Metrics + +**Velocity:** +- Target: 13-21 story points per 2-week sprint +- Sustainable pace, not maximum throughput + +**Quality:** +- Zero regressions in vSphere e2e tests +- All PRs pass upstream CI before submission +- Design docs reviewed within 1 business day + +**Responsiveness:** +- Regressions triaged within 4 hours +- Blockers escalated within 1 business day +- PR reviews completed within 2 business days + +--- + +## Team Culture + +- **Explicit over implicit** - State assumptions, document decisions +- **Retrieval over memory** - Read docs, don't rely on training data +- **Testing over confidence** - Prove it works, don't assume +- **Iteration over perfection** - Ship incrementally, refine later +- **Human feedback over autonomy** - Ask when unsure + +--- + +**See Also:** +- [Sprint Process](workflows/sprint-process.md) - Ceremonies and cadence +- [Epic Breakdown](workflows/epic-breakdown.md) - Design → Stories details +- [Status Transitions](statuses/transitions.md) - Full state machine +- [Role Responsibilities](roles/responsibilities.md) - Hat-switching guide diff --git a/ai-docs/architecture/index.md b/ai-docs/architecture/index.md new file mode 100644 index 0000000..61fccf5 --- /dev/null +++ b/ai-docs/architecture/index.md @@ -0,0 +1,23 @@ +# Architecture + +Documentation of the Splat Team's architecture, projects, and technical stack. + +--- + +## Contents + +- **[projects.md](projects.md)** - Active forked projects and their purposes +- **[tech-stack.md](tech-stack.md)** - Technologies and frameworks used (coming soon) +- **[skills.md](skills.md)** - Custom automation tools and skills (coming soon) + +--- + +## Quick Links + +**Project Overview:** See [projects.md](projects.md) for all 11 active forks +**vSphere Focus:** Team maintains OpenShift vSphere/VMware platform integration +**Fork Model:** Work in `openshift-splat-team/*` forks, submit PRs to upstream + +--- + +**Start Here:** Read [projects.md](projects.md) to understand active projects. diff --git a/ai-docs/architecture/projects.md b/ai-docs/architecture/projects.md new file mode 100644 index 0000000..b8d9227 --- /dev/null +++ b/ai-docs/architecture/projects.md @@ -0,0 +1,327 @@ +# Active Projects + +**Team:** Splat Team +**Last Updated:** 2026-05-01 + +--- + +## Overview + +The Splat Team maintains forks of OpenShift components related to vSphere/VMware platform integration. All projects are forked under the `openshift-splat-team` GitHub organization. + +**Fork Strategy:** Work in team forks, submit PRs to upstream when ready. + +--- + +## Core Projects + +### 1. vcf-migration-operator + +**Fork:** https://github.com/openshift-splat-team/vcf-migration-operator +**Upstream:** (new project, no upstream yet) +**Language:** Go +**Purpose:** Migrate workloads from VMware Cloud Foundation to OpenShift + +**Key Responsibilities:** +- VM-to-container migration tooling +- VCF API integration +- Migration workflow orchestration + +**Context:** `projects/vcf-migration-operator/CLAUDE.md` + +--- + +### 2. installer + +**Fork:** https://github.com/openshift-splat-team/installer +**Upstream:** https://github.com/openshift/installer +**Language:** Go +**Purpose:** OpenShift cluster installation (vSphere provider support) + +**Key Responsibilities:** +- vSphere platform provider implementation +- IPI (Installer-Provisioned Infrastructure) for vSphere +- Terraform provider integration for vSphere +- Pre-flight validation for vSphere environments + +**Common Tasks:** +- Adding new vSphere configuration options +- Fixing installation failures on specific vSphere versions +- Updating terraform-provider-vsphere integration + +**Context:** `projects/installer/CLAUDE.md` + +--- + +### 3. machine-api-operator + +**Fork:** https://github.com/openshift-splat-team/machine-api-operator +**Upstream:** https://github.com/openshift/machine-api-operator +**Language:** Go +**Purpose:** Kubernetes-native machine provisioning for vSphere + +**Key Responsibilities:** +- vSphere machine controller +- Machine health checks +- Auto-scaling integration +- Node provisioning and deprovisioning + +**Common Tasks:** +- VM creation/deletion lifecycle +- Handling vSphere API errors gracefully +- Supporting new vSphere VM configurations + +**Context:** `projects/machine-api-operator/CLAUDE.md` + +--- + +### 4. cluster-cloud-controller-manager-operator + +**Fork:** https://github.com/openshift-splat-team/cluster-cloud-controller-manager-operator +**Upstream:** https://github.com/openshift/cluster-cloud-controller-manager-operator +**Language:** Go +**Purpose:** vSphere Cloud Controller Manager (CCM) integration + +**Key Responsibilities:** +- Deploy and manage vSphere CCM +- Node initialization and tagging +- vSphere cloud provider integration +- Migration from in-tree to out-of-tree provider + +**Common Tasks:** +- CCM version updates +- Node providerID management +- Cloud provider configuration + +**Context:** `projects/cluster-cloud-controller-manager-operator/CLAUDE.md` + +--- + +### 5. cloud-credential-operator + +**Fork:** https://github.com/openshift-splat-team/cloud-credential-operator +**Upstream:** https://github.com/openshift/cloud-credential-operator +**Language:** Go +**Purpose:** vSphere credential management + +**Key Responsibilities:** +- vSphere credential provisioning for operators +- IAM-style permissions for vCenter +- Credential rotation and lifecycle +- Secure credential storage + +**Common Tasks:** +- Adding new credential types +- Supporting different vSphere authentication methods +- Credential validation and pre-checks + +**Context:** `projects/cloud-credential-operator/CLAUDE.md` + +--- + +### 6. cluster-storage-operator + +**Fork:** https://github.com/openshift-splat-team/cluster-storage-operator +**Upstream:** https://github.com/openshift/cluster-storage-operator +**Language:** Go +**Purpose:** vSphere CSI driver deployment and management + +**Key Responsibilities:** +- Deploy vSphere CSI driver +- StorageClass provisioning +- PersistentVolume lifecycle +- CSI driver version management + +**Common Tasks:** +- CSI driver updates +- StorageClass configuration +- Volume provisioning troubleshooting + +**Context:** `projects/cluster-storage-operator/CLAUDE.md` + +--- + +### 7. vsphere-problem-detector + +**Fork:** https://github.com/openshift-splat-team/vsphere-problem-detector +**Upstream:** https://github.com/openshift/vsphere-problem-detector +**Language:** Go +**Purpose:** Proactive vSphere platform health monitoring + +**Key Responsibilities:** +- Detect vSphere misconfigurations +- Validate vSphere prerequisites +- Alert on vSphere infrastructure issues +- Generate diagnostic reports + +**Common Tasks:** +- Adding new health checks +- Improving diagnostic messages +- Supporting new vSphere versions + +**Context:** `projects/vsphere-problem-detector/CLAUDE.md` + +--- + +### 8. govmomi + +**Fork:** https://github.com/openshift-splat-team/govmomi +**Upstream:** https://github.com/vmware/govmomi +**Language:** Go +**Purpose:** VMware vSphere API Go client library + +**Key Responsibilities:** +- vSphere API bindings for Go +- Used by all other vSphere-related projects +- API wrapper and utilities + +**Note:** Upstream is VMware, not OpenShift. Submit PRs to vmware/govmomi. + +**Context:** `projects/govmomi/CLAUDE.md` + +--- + +### 9. provider-certification-plugins + +**Fork:** https://github.com/openshift-splat-team/provider-certification-plugins +**Upstream:** https://github.com/openshift/provider-certification-plugins +**Language:** Go +**Purpose:** vSphere provider certification tooling + +**Key Responsibilities:** +- Certification test plugins for vSphere +- Platform validation checks +- Compliance verification + +**Context:** `projects/provider-certification-plugins/CLAUDE.md` + +--- + +### 10. opct + +**Fork:** https://github.com/openshift-splat-team/opct +**Upstream:** (new project) +**Language:** Go +**Purpose:** OpenShift Provider Certification Tool + +**Key Responsibilities:** +- End-to-end provider certification workflow +- Test suite execution and reporting +- Certification artifact generation + +**Context:** `projects/opct/CLAUDE.md` + +--- + +### 11. release + +**Fork:** https://github.com/openshift-splat-team/release +**Upstream:** https://github.com/openshift/release +**Language:** YAML, Go +**Purpose:** OpenShift CI/CD configuration (Prow jobs) + +**Key Responsibilities:** +- vSphere-specific Prow job definitions +- CI configuration for vSphere tests +- Test infrastructure as code + +**Common Tasks:** +- Adding new vSphere e2e test jobs +- Updating vSphere test cluster configuration +- Debugging Prow job failures + +**Context:** `projects/release/CLAUDE.md` + +--- + +## Project Categories + +### Installation & Provisioning +- `installer` - Cluster installation +- `machine-api-operator` - Node provisioning +- `cluster-cloud-controller-manager-operator` - Cloud provider integration + +### Storage & Credentials +- `cluster-storage-operator` - CSI driver management +- `cloud-credential-operator` - Credential management + +### Monitoring & Troubleshooting +- `vsphere-problem-detector` - Health checks +- `opct` - Certification testing + +### Migration & Integration +- `vcf-migration-operator` - VCF migration +- `govmomi` - vSphere API client + +### CI/CD +- `release` - Prow configuration +- `provider-certification-plugins` - Test plugins + +--- + +## Cross-Project Dependencies + +``` +installer + ├── depends on: govmomi, machine-api-operator + └── used by: (entry point for cluster creation) + +machine-api-operator + ├── depends on: govmomi, cluster-cloud-controller-manager-operator + └── used by: installer, cluster-autoscaler + +cluster-cloud-controller-manager-operator + ├── depends on: govmomi + └── used by: machine-api-operator, other operators + +cloud-credential-operator + ├── depends on: govmomi + └── used by: all operators needing vSphere credentials + +cluster-storage-operator + ├── depends on: cloud-credential-operator + └── used by: workloads needing PersistentVolumes + +vsphere-problem-detector + ├── depends on: govmomi + └── used by: cluster operators (health monitoring) + +opct / provider-certification-plugins + ├── depends on: all above operators + └── used by: certification process +``` + +--- + +## Upstream Contribution Flow + +1. **Work in fork** - `openshift-splat-team/` +2. **Test in fork's Prow** - Use `release` repo config +3. **Create upstream PR** - From fork to `openshift/` +4. **Upstream CI tests** - Must pass before merge +5. **Upstream review** - OpenShift maintainers approve +6. **Backport if needed** - Cherry-pick to release branches + +--- + +## Project Selection Guide + +| If you need to... | Work in... | +|-------------------|------------| +| Fix cluster installation on vSphere | `installer` | +| Fix node provisioning or scaling | `machine-api-operator` | +| Fix vSphere credential issues | `cloud-credential-operator` | +| Fix persistent volume provisioning | `cluster-storage-operator` | +| Add vSphere health check | `vsphere-problem-detector` | +| Fix vSphere API client bug | `govmomi` | +| Add vSphere e2e test | `release` | +| Migrate VMs from VCF | `vcf-migration-operator` | +| Fix cloud controller manager | `cluster-cloud-controller-manager-operator` | +| Add certification check | `opct` or `provider-certification-plugins` | + +--- + +**See Also:** +- Individual project CLAUDE.md files in `projects//` +- [Tech Stack](tech-stack.md) - Common technologies across projects +- [Skills](skills.md) - Automation tools for project work diff --git a/ai-docs/decisions/adr-template.md b/ai-docs/decisions/adr-template.md new file mode 100644 index 0000000..69346b6 --- /dev/null +++ b/ai-docs/decisions/adr-template.md @@ -0,0 +1,85 @@ +# ADR-NNNN: Title + +**Status:** Proposed | Accepted | Deprecated | Superseded +**Date:** YYYY-MM-DD +**Deciders:** [List who made this decision] +**Epic:** # (if applicable) + +--- + +## Context + +What is the issue motivating this decision or change? + +Describe the context and problem statement: +- What forces are at play? +- What constraints exist? +- Why does this decision matter? + +--- + +## Decision + +What is the change or approach we're proposing or have agreed to? + +Be specific and concrete: +- What will we do? +- What technology/pattern will we use? +- What are the key implementation points? + +--- + +## Alternatives Considered + +What other options were evaluated? + +For each alternative: + +### Alternative 1: [Name] + +**Description:** Brief description +**Pros:** +- Advantage 1 +- Advantage 2 + +**Cons:** +- Disadvantage 1 +- Disadvantage 2 + +**Why Rejected:** Reason for not choosing this + +--- + +## Consequences + +What becomes easier or more difficult as a result of this decision? + +### Positive Consequences +- Benefit 1 +- Benefit 2 + +### Negative Consequences +- Trade-off 1 +- Trade-off 2 + +### Neutral Consequences +- Change 1 (neither better nor worse) + +--- + +## Implementation Notes + +Any important details about implementation: +- Migration path from old approach +- Rollout strategy +- Backward compatibility considerations +- Testing requirements + +--- + +## References + +- Link to epic: # +- Link to design doc: URL +- Link to relevant PRs: # +- External documentation: URL diff --git a/ai-docs/decisions/index.md b/ai-docs/decisions/index.md new file mode 100644 index 0000000..ca15fae --- /dev/null +++ b/ai-docs/decisions/index.md @@ -0,0 +1,57 @@ +# Architectural Decision Records + +This directory contains ADRs (Architectural Decision Records) documenting significant architectural decisions made by the Splat Team. + +--- + +## What is an ADR? + +An ADR captures: +- **Context** - The situation requiring a decision +- **Decision** - What was decided +- **Consequences** - Trade-offs and implications + +--- + +## When to Create an ADR + +Create an ADR when you make decisions about: +- Architecture patterns or approaches +- Technology selection outside standard stack +- Cross-project design choices +- Process changes with long-term impact + +**Do NOT create ADRs for:** +- Individual story implementation details +- Temporary workarounds +- Standard technology choices (Go, pytest, etc.) + +--- + +## ADR Naming Convention + +``` +adr-NNNN-short-title.md +``` + +Examples: +- `adr-0001-credential-rotation-polling.md` +- `adr-0002-vSphere-7-minimum-version.md` + +--- + +## ADR Template + +See [adr-template.md](adr-template.md) for the standard template. + +--- + +## Active ADRs + +(None yet - ADRs created as architectural decisions are made) + +--- + +**See Also:** +- [ADR Template](adr-template.md) - Use this template for new ADRs +- [Epic Breakdown](../workflows/epic-breakdown.md) - When to extract ADRs from completed epics diff --git a/ai-docs/practices/index.md b/ai-docs/practices/index.md new file mode 100644 index 0000000..de8ce13 --- /dev/null +++ b/ai-docs/practices/index.md @@ -0,0 +1,34 @@ +# Engineering Practices + +Engineering standards and practices for the Splat Team. + +--- + +## Contents + +- **[coding-standards.md](coding-standards.md)** - Code conventions and style (coming soon) +- **[testing.md](testing.md)** - Test strategy and requirements (coming soon) +- **[ci-cd.md](ci-cd.md)** - Prow and GitHub Actions usage (coming soon) + +--- + +## Quick References + +**Go Style:** +- Follow [Effective Go](https://golang.org/doc/effective_go.html) +- Follow [OpenShift coding conventions](https://github.com/openshift/openshift-docs/blob/main/contributing_to_docs/doc_guidelines.adoc) + +**Testing:** +- Unit tests required for all new code +- vSphere e2e tests required for platform features +- Test coverage target: >70% + +**CI:** +- All PRs must pass Prow presubmits +- vSphere e2e tests run on multiple vSphere versions (7.0, 8.0) + +--- + +**See Also:** +- [OpenShift Contributor Guide](https://github.com/openshift/community/blob/master/CONTRIBUTING.md) +- [Kubernetes Code Conventions](https://github.com/kubernetes/community/blob/master/contributors/guide/coding-conventions.md) diff --git a/ai-docs/references/glossary.md b/ai-docs/references/glossary.md new file mode 100644 index 0000000..cf32cba --- /dev/null +++ b/ai-docs/references/glossary.md @@ -0,0 +1,221 @@ +# Glossary + +**Team:** Splat Team +**Last Updated:** 2026-05-01 + +--- + +## Team-Specific Terms + +### Epic +A large body of work spanning multiple stories. Tracked as GitHub issue with `kind/epic` label. Goes through design → breakdown → implementation → acceptance workflow. + +### Story +A single deliverable unit of work. Child of an epic. Tracked as GitHub issue with `kind/story` label and `parent/` label. + +### Hat +A role persona that the superman agent wears while performing different responsibilities. Examples: PO hat, Architect hat, Dev hat, QE hat. + +### Superman +The all-in-one team member role that wears all hats throughout the workflow. + +### Team Manager +The role responsible for process improvement and team coordination. Handles retrospective action items. + +### Human Gate +A status where human approval is required before proceeding. Examples: `lead:design-review`, `lead:plan-review`, `dev:code-review`, `po:accept`. + +### Auto-Advance +A status that automatically transitions to the next status without human intervention. Examples: `arch:sign-off`, `po:merge`. + +--- + +## vSphere Platform Terms + +### vSphere +VMware's virtualization platform. OpenShift can run on vSphere infrastructure. + +### vCenter +VMware's centralized management platform for vSphere environments. + +### govmomi +Go library for interacting with VMware vSphere APIs. Used by all OpenShift vSphere operators. + +### IPI (Installer-Provisioned Infrastructure) +OpenShift installation method where the installer provisions VMs on vSphere. Contrast with UPI (User-Provisioned Infrastructure). + +### CCM (Cloud Controller Manager) +Kubernetes component that integrates with cloud providers. vSphere CCM manages node lifecycle and providerID. + +### CSI (Container Storage Interface) +Kubernetes standard for storage plugins. vSphere CSI driver provisions persistent volumes from vSphere datastores. + +### must-gather +OpenShift debugging tool that collects cluster diagnostics. vSphere-specific must-gather collects vCenter logs and configuration. + +### VCF (VMware Cloud Foundation) +VMware's integrated cloud infrastructure platform. We're building migration tooling from VCF to OpenShift. + +--- + +## OpenShift Terms + +### Prow +OpenShift's CI/CD system based on Kubernetes. Runs e2e tests, presubmit checks, and periodic jobs. + +### Presubmit +Prow job that runs on PR before merge. Must pass for PR to be mergeable. + +### Periodic +Prow job that runs on a schedule (e.g., nightly). Used for long-running e2e tests. + +### E2E Test +End-to-end test that validates full cluster functionality on real infrastructure. vSphere e2e tests run on actual vSphere clusters. + +### Release Payload +OpenShift release artifact containing all operator images. Built from component repos. + +### Operator +Kubernetes controller that manages custom resources. OpenShift is composed of many operators (installer, machine-api-operator, etc.). + +### ClusterOperator +OpenShift custom resource that reports operator health status (Available, Progressing, Degraded). + +### Machine API +OpenShift abstraction for provisioning nodes. machine-api-operator manages vSphere VMs as Kubernetes Machine resources. + +--- + +## GitHub / Process Terms + +### gh Skill +BotMinter skill that wraps GitHub CLI (`gh`). Used for all GitHub operations (issues, PRs, projects). + +### Projects v2 +GitHub's project management tool. Splat Team uses Projects v2 with custom Status field for tracking workflow states. + +### Status Field +GitHub Projects v2 single-select field used to track issue state (e.g., `po:triage`, `dev:implement`, `done`). + +### Kind Label +GitHub label indicating issue type. Values: `kind/epic`, `kind/story`, `kind/docs`, `kind/process-improvement`. + +### Parent Label +GitHub label linking story to epic. Format: `parent/`. + +### Sprint Milestone +GitHub milestone representing 2-week sprint. Stories are assigned to sprint milestones. + +--- + +## Workflow States + +See [Status Transitions](../statuses/transitions.md) for full definitions. Key states: + +### po:triage +New epic awaiting PO evaluation. + +### lead:design-review +Design doc awaiting human review (human gate). + +### lead:plan-review +Story breakdown awaiting human review (human gate). + +### dev:code-review +PR awaiting human review (human gate). + +### po:accept +Completed epic awaiting human acceptance (human gate). + +### arch:sign-off +Final architect verification (auto-advance). + +### po:merge +Final merge gate (auto-advance after human PR approval). + +--- + +## Common Abbreviations + +| Abbreviation | Full Term | +|--------------|-----------| +| **PO** | Product Owner | +| **QE** | Quality Engineering | +| **SRE** | Site Reliability Engineering | +| **CW** | Content Writer | +| **PR** | Pull Request | +| **CI** | Continuous Integration | +| **CD** | Continuous Delivery | +| **E2E** | End-to-End | +| **API** | Application Programming Interface | +| **ADR** | Architectural Decision Record | +| **IPI** | Installer-Provisioned Infrastructure | +| **UPI** | User-Provisioned Infrastructure | +| **CCM** | Cloud Controller Manager | +| **CSI** | Container Storage Interface | +| **VCF** | VMware Cloud Foundation | +| **VM** | Virtual Machine | + +--- + +## Tech Stack Terms + +### Go +Primary programming language for OpenShift operators. + +### Python +Scripting language used for BotMinter skills and automation. + +### MkDocs +Documentation framework used for team docs (`docs/` directory). + +### pytest +Python testing framework used in skills. + +### Jest +JavaScript testing framework (used if any JS tooling present). + +### YAML +Configuration file format used extensively (Prow jobs, Kubernetes manifests). + +--- + +## Project-Specific Terms + +### vcf-migration-operator +Operator for migrating VMs from VMware Cloud Foundation to OpenShift. New project with no upstream yet. + +### installer +OpenShift installer. vSphere provider support lives here. Upstream: `openshift/installer`. + +### machine-api-operator +Kubernetes Machine API implementation for vSphere. Upstream: `openshift/machine-api-operator`. + +### cloud-credential-operator +Manages vSphere credentials for operators. Upstream: `openshift/cloud-credential-operator`. + +### vsphere-problem-detector +Proactive health monitoring for vSphere platform. Upstream: `openshift/vsphere-problem-detector`. + +--- + +## Anti-Patterns (Terms to Avoid) + +❌ **"The agent"** - Be specific: which hat? (PO, Architect, Dev, QE) +✅ **"[Architect Hat]"** + +❌ **"Review the code"** - Ambiguous: QE verification or human PR review? +✅ **"[QE Hat] verifying tests pass"** or **"Awaiting human code review"** + +❌ **"VMware"** - Be specific: vSphere, vCenter, or VCF? +✅ **"vSphere platform"** + +❌ **"The project"** - Which project? 11 active repos +✅ **"installer project"** + +--- + +**See Also:** +- [Status Transitions](../statuses/transitions.md) - Full status definitions +- [Role Responsibilities](../roles/responsibilities.md) - Hat definitions +- [Projects](../architecture/projects.md) - Project descriptions diff --git a/ai-docs/references/index.md b/ai-docs/references/index.md new file mode 100644 index 0000000..d62c1a0 --- /dev/null +++ b/ai-docs/references/index.md @@ -0,0 +1,31 @@ +# References + +Quick reference materials for the Splat Team. + +--- + +## Contents + +- **[glossary.md](glossary.md)** - Team terminology and abbreviations +- **[shortcuts.md](shortcuts.md)** - Common commands and shortcuts (coming soon) + +--- + +## External References + +**OpenShift:** +- [OpenShift Documentation](https://docs.openshift.com/) +- [OpenShift Enhancement Process](https://github.com/openshift/enhancements) +- [Prow Documentation](https://docs.prow.k8s.io/) + +**vSphere:** +- [vSphere Documentation](https://docs.vmware.com/en/VMware-vSphere/index.html) +- [govmomi GitHub](https://github.com/vmware/govmomi) + +**Team:** +- [Splat Team GitHub](https://github.com/openshift-splat-team/splat-team) +- [Team Projects](../architecture/projects.md) + +--- + +**Start Here:** Read [glossary.md](glossary.md) to familiarize yourself with team terminology. diff --git a/ai-docs/roles/index.md b/ai-docs/roles/index.md new file mode 100644 index 0000000..d65d1ed --- /dev/null +++ b/ai-docs/roles/index.md @@ -0,0 +1,36 @@ +# Roles + +Role definitions and responsibilities for the Splat Team. + +--- + +## Active Roles + +### Superman +All-in-one team member wearing multiple hats (PO, Architect, Dev, QE, SRE, Content Writer). + +**See:** [responsibilities.md](responsibilities.md) for hat-switching guide. + +### Team Manager +Process improvement and team coordination role. + +**See:** [team-manager.md](team-manager.md) (coming soon) + +--- + +## Hat-Switching + +The superman role switches between different "hats" throughout the workflow: + +| Hat | Primary Statuses | Key Responsibilities | +|-----|------------------|---------------------| +| **PO** | `po:triage`, `po:backlog`, `po:accept` | Evaluation, prioritization, acceptance | +| **Architect** | `arch:design`, `arch:plan`, `arch:breakdown` | Design, breakdown, coordination | +| **Dev** | `dev:implement`, `dev:code-review` | Implementation, PR maintenance | +| **QE** | `qe:test-design`, `qe:verify` | Test planning, verification | +| **SRE** | `sre:infra-setup` | Infrastructure provisioning | +| **Content Writer** | `cw:write`, `cw:review` | Documentation | + +--- + +**Start Here:** Read [responsibilities.md](responsibilities.md) to understand hat-switching patterns. diff --git a/ai-docs/roles/responsibilities.md b/ai-docs/roles/responsibilities.md new file mode 100644 index 0000000..8813d6b --- /dev/null +++ b/ai-docs/roles/responsibilities.md @@ -0,0 +1,459 @@ +# Role Responsibilities - Hat-Switching Guide + +**Team:** Splat Team +**Profile:** scrum-compact +**Last Updated:** 2026-05-01 + +--- + +## Overview + +The superman role wears multiple hats throughout the workflow. This guide explains when to switch hats and what each hat is responsible for. + +**Key Principle:** Explicitly state which hat you're wearing when transitioning states. + +--- + +## Hat-Switching Pattern + +``` +PO Hat → Architect Hat → Dev Hat → QE Hat → Architect Hat → PO Hat +``` + +**Example Issue Comment:** +```markdown +## [PO Hat] Moving to backlog + +Evaluated this epic - aligns with Q2 roadmap priorities. Moving to backlog. + +Next: Waiting for capacity to start design (will switch to Architect hat). +``` + +--- + +## Superman Role Hats + +### Product Owner (PO) Hat + +**When Active:** +- `po:triage` - Evaluating new epics +- `po:backlog` - Managing backlog priority +- `po:accept` - Accepting completed epics +- `po:merge` - Final merge gate +- Sprint planning + +**Responsibilities:** +- Evaluate epic value and alignment with roadmap +- Prioritize backlog +- Accept or reject completed work +- Define success criteria +- Sprint goal definition + +**Decision Authority:** +- Accept/reject epics +- Backlog prioritization +- Sprint scope + +**Communication Style:** +"As PO, I'm prioritizing this epic higher because..." + +--- + +### Architect Hat + +**When Active:** +- `arch:design` - Writing design docs +- `arch:plan` - Creating story breakdown +- `arch:breakdown` - Creating story issues +- `arch:in-progress` - Monitoring implementation +- `arch:sign-off` - Final verification + +**Responsibilities:** +- Technical design and approach +- Story breakdown and estimation +- Cross-story coordination +- Architecture alignment +- Design pattern consistency + +**Decision Authority:** +- Technical approach selection +- Story breakdown strategy +- Architecture patterns +- Dependency ordering + +**Communication Style:** +"As Architect, I'm proposing approach X because..." + +**Design Doc Checklist:** +- [ ] Problem statement clear +- [ ] Proposed approach documented +- [ ] Alternatives considered +- [ ] Risks identified and mitigated +- [ ] Testing strategy defined +- [ ] Rollout plan specified + +--- + +### Developer (Dev) Hat + +**When Active:** +- `dev:implement` - Writing code +- `dev:code-review` - Addressing PR feedback + +**Responsibilities:** +- Implementation of story +- Code quality and conventions +- Unit test coverage +- PR creation and maintenance +- Documentation updates (code comments, ADRs) + +**Decision Authority:** +- Code structure and naming +- Refactoring approaches +- Dependency choices (within approved stack) +- Test implementation details + +**Communication Style:** +"As Developer, I'm implementing this using..." + +**Implementation Checklist:** +- [ ] Code follows team standards +- [ ] Unit tests written and passing +- [ ] vSphere-specific logic validated +- [ ] Documentation updated +- [ ] PR description complete with links to story/epic + +--- + +### Quality Engineering (QE) Hat + +**When Active:** +- `qe:test-design` - Designing test strategy +- `qe:verify` - Verifying implementation + +**Responsibilities:** +- Test plan creation +- Test coverage verification +- vSphere-specific test scenarios +- E2E test implementation +- CI validation + +**Decision Authority:** +- Test strategy and coverage +- Test data and environments +- Pass/fail criteria +- Test automation approach + +**Communication Style:** +"As QE, I'm designing tests to cover..." + +**Test Plan Checklist:** +- [ ] Unit test cases defined +- [ ] Integration test cases defined +- [ ] vSphere e2e test cases defined (multiple versions) +- [ ] Edge cases identified +- [ ] Negative test cases included +- [ ] Test environments specified + +--- + +### Site Reliability Engineering (SRE) Hat + +**When Active:** +- `sre:infra-setup` - Infrastructure provisioning + +**Responsibilities:** +- Test cluster provisioning +- CI/CD infrastructure +- Prow job configuration +- Monitoring and alerting setup +- Infrastructure as code + +**Decision Authority:** +- Infrastructure configuration +- Test environment setup +- Monitoring thresholds +- Alert routing + +**Communication Style:** +"As SRE, I'm provisioning test cluster with..." + +--- + +### Content Writer (CW) Hat + +**When Active:** +- `cw:write` - Writing documentation +- `cw:review` - Reviewing documentation + +**Responsibilities:** +- User-facing documentation +- MkDocs content +- Troubleshooting guides +- Examples and tutorials +- Diagrams and visualizations + +**Decision Authority:** +- Documentation structure +- Content organization +- Example selection +- Diagram format + +**Communication Style:** +"As Content Writer, I'm documenting..." + +**Documentation Checklist:** +- [ ] Audience identified (admin, developer, user) +- [ ] Prerequisites listed +- [ ] Step-by-step instructions clear +- [ ] Examples tested and working +- [ ] Links verified +- [ ] Screenshots/diagrams included + +--- + +## Team Manager Role + +**When Active:** +- `mgr:todo` - Process improvement queued +- `mgr:in-progress` - Process improvement in progress +- Sprint retrospectives + +**Responsibilities:** +- Process improvement initiatives +- Team coordination +- Retrospective facilitation +- Process documentation updates +- Tooling improvements + +**Decision Authority:** +- Process changes (propose to human for approval) +- Tooling selection +- Automation priorities + +**Communication Style:** +"As Team Manager, I'm proposing process change..." + +--- + +## Hat-Switching Examples + +### Example 1: Epic Triage → Design + +```markdown +## [PO Hat] Epic Accepted - Moving to Backlog + +This epic aligns with our Q2 vSphere platform goals. Success criteria are clear. +Moving to `po:backlog` and prioritizing as P1. + +--- + +## [PO Hat] Activating Epic for Design + +Capacity available. Assigning to Architect hat for design work. +Moving to `arch:design`. + +--- + +## [Architect Hat] Starting Design + +Researching technical approach for vSphere credential rotation. +Will evaluate alternatives: +1. Polling approach (simple, higher latency) +2. Event-driven approach (complex, lower latency) + +Design doc in progress... +``` + +### Example 2: Story Implementation + +```markdown +## [QE Hat] Test Plan Complete + +Test plan written covering: +- Unit tests for credential validation +- Integration tests for vSphere API interaction +- E2E tests on vSphere 7.0, 8.0 + +Moving to `dev:implement`. + +--- + +## [Dev Hat] Implementation Started + +Creating PR for credential rotation logic. +Following design from epic #123. + +Implementation notes: +- Using polling approach per design doc +- Credential cache TTL: 15 minutes +- Error handling for vSphere API timeouts + +PR: #456 + +--- + +## [Dev Hat] Addressing Code Review Feedback + +Human reviewer requested: +- Add logging for credential rotation events +- Increase test coverage for error cases + +Updated PR with changes. Re-requesting review. + +--- + +## [QE Hat] Verification Complete + +All tests passing: +- Unit tests: 15/15 ✅ +- Integration tests: 8/8 ✅ +- vSphere e2e (v7.0): 12/12 ✅ +- vSphere e2e (v8.0): 12/12 ✅ + +Moving to `arch:sign-off`. + +--- + +## [Architect Hat] Sign-Off + +Implementation aligns with epic design. Tests comprehensive. +Auto-advancing to merge. +``` + +--- + +## Hat-Switching Decision Tree + +``` +New Epic Created + ↓ +[PO Hat] → Evaluate → Accept/Reject + ↓ (Accept) +[PO Hat] → Prioritize → Move to backlog + ↓ (Capacity available) +[Architect Hat] → Design → Write design doc + ↓ (Design complete) + (Human reviews) + ↓ (Approved) +[Architect Hat] → Plan → Break into stories + ↓ (Breakdown complete) + (Human reviews) + ↓ (Approved) +[Architect Hat] → Breakdown → Create story issues + ↓ (Stories created) +[PO Hat] → Ready → Wait for sprint planning + ↓ (Sprint starts) +[QE Hat] → Test Design → Write test plan + ↓ (Test plan complete) +[Dev Hat] → Implement → Write code, open PR + ↓ (PR opened) + (Human reviews) + ↓ (Approved) +[QE Hat] → Verify → Run tests + ↓ (Tests pass) +[Architect Hat] → Sign-Off → Verify alignment + ↓ (Auto) +[PO Hat] → Merge → Auto-merge + ↓ +Done +``` + +--- + +## Common Hat-Switching Mistakes + +❌ **Skipping Hat Declaration** +- Wrong: "Moving to design phase" +- Right: "[Architect Hat] Starting design phase" + +❌ **Wearing Wrong Hat for Status** +- Wrong: [Dev Hat] writing design doc in `arch:design` +- Right: [Architect Hat] writing design doc in `arch:design` + +❌ **Not Switching Hats at Gates** +- Wrong: [Architect Hat] approving own design in `lead:design-review` +- Right: Wait for human approval, then [Architect Hat] proceeds to `arch:plan` + +❌ **Mixing Concerns Across Hats** +- Wrong: [Dev Hat] making architecture decisions +- Right: [Dev Hat] implements per architecture, escalates if design needs change + +--- + +## When to Escalate vs. Decide + +### Autonomous Decisions (No Escalation) + +**PO Hat:** +- Backlog prioritization within sprint +- Minor scope adjustments + +**Architect Hat:** +- Implementation approach within design boundaries +- Refactoring decisions + +**Dev Hat:** +- Code structure and naming +- Dependency versions (within approved stack) + +**QE Hat:** +- Test case selection +- Test data generation + +--- + +### Propose for Human Approval + +**PO Hat:** +- Epic acceptance/rejection +- Cross-sprint scope changes + +**Architect Hat:** +- Major design decisions (in design doc) +- Story breakdown (in plan review) + +**Dev Hat:** +- New dependencies outside approved stack +- Breaking API changes + +--- + +### Immediate Escalation + +**Any Hat:** +- Security vulnerabilities discovered +- Blocking issues affecting sprint goal +- Process failures (automation broken) +- Upstream policy changes + +--- + +## Hat Responsibilities by Status + +| Status | Active Hat | Responsibility | +|--------|-----------|----------------| +| `po:triage` | PO | Evaluate epic | +| `po:backlog` | PO | Manage priority | +| `arch:design` | Architect | Write design doc | +| `lead:design-review` | (Human) | Review design | +| `arch:plan` | Architect | Break into stories | +| `lead:plan-review` | (Human) | Review breakdown | +| `arch:breakdown` | Architect | Create story issues | +| `po:ready` | PO | Manage ready backlog | +| `dev:ready` | (Queue) | Await test design | +| `qe:test-design` | QE | Write test plan | +| `dev:implement` | Dev | Write code | +| `dev:code-review` | (Human) | Review PR | +| `qe:verify` | QE | Run tests | +| `arch:sign-off` | Architect | Verify alignment | +| `po:merge` | PO (auto) | Merge PR | +| `arch:in-progress` | Architect | Monitor stories | +| `po:accept` | (Human) | Accept epic | + +--- + +**See Also:** +- [Status Transitions](../statuses/transitions.md) - When to switch hats +- [Sprint Process](../workflows/sprint-process.md) - Hat-switching in sprint context +- [Epic Breakdown](../workflows/epic-breakdown.md) - Architect hat deep-dive diff --git a/ai-docs/statuses/index.md b/ai-docs/statuses/index.md new file mode 100644 index 0000000..6d2b3fd --- /dev/null +++ b/ai-docs/statuses/index.md @@ -0,0 +1,36 @@ +# Status System + +Status tracking and workflow state machine for the Splat Team. + +--- + +## Contents + +- **[transitions.md](transitions.md)** - Full status workflow and state machine + +--- + +## Quick Reference + +**Epic Flow:** +``` +po:triage → po:backlog → arch:design → lead:design-review → +arch:plan → lead:plan-review → arch:breakdown → po:ready → +arch:in-progress → po:accept → done +``` + +**Story Flow:** +``` +dev:ready → qe:test-design → dev:implement → dev:code-review → +qe:verify → arch:sign-off → po:merge → done +``` + +**Human Gates (require human approval):** +- `lead:design-review` (< 1 business day SLA) +- `lead:plan-review` (< 1 business day SLA) +- `dev:code-review` (4-8 hours SLA) +- `po:accept` (< 2 business days SLA) + +--- + +**Start Here:** Read [transitions.md](transitions.md) for full state machine and definitions. diff --git a/ai-docs/statuses/transitions.md b/ai-docs/statuses/transitions.md new file mode 100644 index 0000000..f827140 --- /dev/null +++ b/ai-docs/statuses/transitions.md @@ -0,0 +1,642 @@ +# Status Transitions + +**Team:** Splat Team +**Profile:** scrum-compact +**Last Updated:** 2026-05-01 + +--- + +## Overview + +Status is tracked via GitHub Projects v2 "Status" field (single-select dropdown). Status values follow the pattern: `:`. + +**Key Principle:** Superman agent self-transitions through states by wearing different hats. + +--- + +## Epic Status Workflow + +```mermaid +graph TD + A[po:triage] -->|PO evaluates| B[po:backlog] + B -->|Architect picks| C[arch:design] + C -->|Design doc complete| D[lead:design-review] + D -->|Human approves| E[arch:plan] + D -->|Human requests changes| C + E -->|Breakdown proposed| F[lead:plan-review] + F -->|Human approves| G[arch:breakdown] + F -->|Human requests changes| E + G -->|Stories created| H[po:ready] + H -->|Sprint planning| I[arch:in-progress] + I -->|All stories done| J[po:accept] + J -->|Human accepts| K[done] + J -->|Human requests changes| I +``` + +--- + +## Epic Status Definitions + +### po:triage + +**Hat:** Product Owner +**Description:** New epic, awaiting evaluation + +**Entry:** Epic issue created +**Exit:** PO evaluates value and feasibility + +**Actions:** +- Review epic description and success criteria +- Evaluate alignment with team roadmap +- Estimate rough size (small/medium/large) +- Decide: accept or reject + +**Next States:** +- → `po:backlog` (accepted) +- → close issue (rejected) + +**Duration:** < 1 business day + +--- + +### po:backlog + +**Hat:** Product Owner +**Description:** Accepted, prioritized, awaiting activation + +**Entry:** PO accepts epic +**Exit:** PO assigns to architect for design + +**Actions:** +- Prioritize relative to other backlog epics +- Ensure epic has clear success criteria +- Wait for capacity to start design work + +**Next States:** +- → `arch:design` (PO activates) + +**Duration:** Days to weeks (depends on priority) + +--- + +### arch:design + +**Hat:** Architect +**Description:** Architect producing design doc + +**Entry:** Architect starts design work +**Exit:** Design doc complete and ready for review + +**Actions:** +- Research technical approach +- Write design doc (approach, alternatives, risks, rollout plan) +- Add design doc link to epic issue body +- Create preliminary story list (rough breakdown) + +**Design Doc Template:** +```markdown +## Approach +[Proposed technical solution] + +## Alternatives Considered +[Other approaches and why rejected] + +## Risks & Mitigations +[Known risks and how to address] + +## Rollout Plan +[Phased rollout if applicable] + +## Testing Strategy +[How to validate] +``` + +**Next States:** +- → `lead:design-review` (design doc complete) + +**Duration:** 2-5 days + +--- + +### lead:design-review + +**Hat:** (Human review gate) +**Description:** Design doc awaiting human review + +**Entry:** Architect completes design doc +**Exit:** Human approves or requests changes + +**Actions (Human):** +- Review design doc +- Provide feedback via issue comments +- Approve or request changes + +**Next States:** +- → `arch:plan` (human approves) +- → `arch:design` (human requests changes) + +**Duration:** < 1 business day + +--- + +### arch:plan + +**Hat:** Architect +**Description:** Architect proposing story breakdown + +**Entry:** Design approved +**Exit:** Story breakdown complete and ready for review + +**Actions:** +- Break epic into implementable stories +- Write story descriptions (As a..., I want..., so that...) +- Define acceptance criteria for each story +- Estimate story points +- Identify dependencies between stories +- Add story list to epic issue (table format) + +**Story List Format:** +```markdown +## Proposed Stories + +| # | Title | Acceptance Criteria | Points | Dependencies | +|---|-------|---------------------|--------|--------------| +| 1 | Story A | - AC1
- AC2 | 3 | None | +| 2 | Story B | - AC1
- AC2 | 5 | Story 1 | +``` + +**Next States:** +- → `lead:plan-review` (breakdown complete) + +**Duration:** 1-3 days + +--- + +### lead:plan-review + +**Hat:** (Human review gate) +**Description:** Story breakdown awaiting human review + +**Entry:** Architect completes story breakdown +**Exit:** Human approves or requests changes + +**Actions (Human):** +- Review proposed stories +- Verify acceptance criteria are clear +- Check dependencies make sense +- Approve or request changes + +**Next States:** +- → `arch:breakdown` (human approves) +- → `arch:plan` (human requests changes) + +**Duration:** < 1 business day + +--- + +### arch:breakdown + +**Hat:** Architect +**Description:** Architect creating story issues + +**Entry:** Breakdown approved +**Exit:** All story issues created + +**Actions:** +- Create GitHub issue for each story +- Add `kind/story` label +- Add `parent/` label +- Link to parent epic in body +- Set initial status to `dev:ready` +- Update epic with links to created stories + +**Next States:** +- → `po:ready` (all stories created) + +**Duration:** < 1 day + +--- + +### po:ready + +**Hat:** Product Owner +**Description:** Stories created, epic in ready backlog + +**Entry:** All stories created +**Exit:** Sprint planning selects stories + +**Actions:** +- Ensure stories are prioritized +- Wait for sprint planning +- Track as "ready for implementation" + +**Next States:** +- → `arch:in-progress` (sprint starts, stories activated) + +**Duration:** Until next sprint planning + +--- + +### arch:in-progress + +**Hat:** Architect +**Description:** Architect monitoring story execution + +**Entry:** Stories enter sprint +**Exit:** All stories complete + +**Actions:** +- Monitor story progress +- Unblock stories as needed +- Ensure alignment with design +- Adjust course if issues discovered + +**Next States:** +- → `po:accept` (all stories done) + +**Duration:** 1-3 sprints (depends on epic size) + +--- + +### po:accept + +**Hat:** (Human acceptance gate) +**Description:** Epic awaiting human acceptance + +**Entry:** All stories complete +**Exit:** Human accepts or requests changes + +**Actions (Human):** +- Review completed work +- Verify success criteria met +- Test in demo environment if needed +- Accept or request changes + +**Next States:** +- → `done` (human accepts) +- → `arch:in-progress` (human requests changes) + +**Duration:** < 2 business days + +--- + +### done + +**Hat:** (Terminal state) +**Description:** Complete + +**Entry:** Human accepts epic +**Exit:** None (terminal) + +**Actions:** +- Close epic issue +- Extract learnings to ADR if architectural decisions made +- Update documentation if needed + +--- + +## Story Status Workflow + +```mermaid +graph LR + A[dev:ready] --> B[qe:test-design] + B --> C[dev:implement] + C --> D[dev:code-review] + D -->|Human approves PR| E[qe:verify] + D -->|Changes requested| C + E --> F[arch:sign-off] + F --> G[po:merge] + G --> H[done] +``` + +--- + +## Story Status Definitions + +### dev:ready + +**Hat:** (Initial state) +**Description:** Story ready for development + +**Entry:** Story created from epic breakdown +**Exit:** QE hat starts test design + +**Actions:** +- Ensure story has clear acceptance criteria +- Verify parent epic design is approved +- Check no blocking dependencies + +**Next States:** +- → `qe:test-design` + +--- + +### qe:test-design + +**Hat:** QE (Quality Engineering) +**Description:** QE designing tests + +**Entry:** QE hat starts test design +**Exit:** Test plan complete + +**Actions:** +- Write test plan (unit, integration, e2e) +- Identify vSphere-specific test scenarios +- Define test data and environments needed +- Add test plan to story issue + +**Test Plan Format:** +```markdown +## Test Plan + +**Unit Tests:** +- Test case 1 +- Test case 2 + +**Integration Tests:** +- Test case 1 + +**vSphere E2E Tests:** +- Test case 1 (vSphere 7.0) +- Test case 2 (vSphere 8.0) +``` + +**Next States:** +- → `dev:implement` + +**Duration:** < 1 day + +--- + +### dev:implement + +**Hat:** Developer +**Description:** Developer implementing + +**Entry:** Test plan complete +**Exit:** Implementation complete and PR opened + +**Actions:** +- Write code following test plan +- Implement vSphere-specific logic +- Write unit tests +- Update documentation +- Open PR to upstream (or fork if testing first) + +**PR Description Template:** +```markdown +## What + +[Brief description] + +## Why + +[Rationale - link to story/epic] + +## Testing + +- [ ] Unit tests pass +- [ ] Integration tests pass +- [ ] vSphere e2e tests pass + +Fixes # +Parent: # +``` + +**Next States:** +- → `dev:code-review` + +**Duration:** 1-3 days + +--- + +### dev:code-review + +**Hat:** (Human review gate) +**Description:** Code review + +**Entry:** PR opened +**Exit:** Human approves or requests changes + +**Actions (Human):** +- Review code quality +- Check test coverage +- Verify vSphere-specific logic correct +- Approve or request changes + +**Actions (Agent while waiting):** +- Address CI failures +- Answer review questions +- Update PR based on feedback + +**Next States:** +- → `qe:verify` (human approves PR) +- → `dev:implement` (changes requested - keep PR open) + +**Duration:** 4-8 hours human SLA + +--- + +### qe:verify + +**Hat:** QE +**Description:** QE verifying implementation + +**Entry:** PR approved +**Exit:** Tests pass and verification complete + +**Actions:** +- Verify all tests pass (unit, integration, e2e) +- Run manual vSphere tests if needed +- Check test coverage meets standards +- Confirm vSphere-specific scenarios validated + +**Next States:** +- → `arch:sign-off` + +**Duration:** < 1 day (mostly automated) + +--- + +### arch:sign-off + +**Hat:** Architect (Auto-advance) +**Description:** Architect sign-off + +**Entry:** Tests pass +**Exit:** Auto-advance (no human action) + +**Actions:** +- Verify implementation aligns with epic design +- Auto-advance to merge (no blocking) + +**Next States:** +- → `po:merge` + +**Duration:** Immediate (auto) + +--- + +### po:merge + +**Hat:** Product Owner (Auto-advance) +**Description:** Merge gate + +**Entry:** Architect sign-off +**Exit:** PR merged (auto after human approval) + +**Actions:** +- Auto-merge PR (human already approved in `dev:code-review`) +- Update story status +- Close story issue + +**Next States:** +- → `done` + +**Duration:** Immediate (auto) + +--- + +### done + +**Hat:** (Terminal state) +**Description:** Complete + +**Entry:** PR merged, story closed +**Exit:** None (terminal) + +--- + +## Specialist Statuses + +### sre:infra-setup + +**Hat:** SRE (Site Reliability Engineering) +**Description:** SRE infrastructure setup + +**Used For:** Stories requiring infrastructure changes (test clusters, CI config) + +**Actions:** +- Provision vSphere test environments +- Configure Prow jobs +- Set up monitoring/alerting + +--- + +### cw:write + +**Hat:** Content Writer +**Description:** Content writer writing + +**Used For:** Documentation stories (`kind/docs`) + +**Actions:** +- Write user-facing documentation +- Update MkDocs content +- Create diagrams or examples + +--- + +### cw:review + +**Hat:** Content Writer +**Description:** Content writer reviewing + +**Used For:** Documentation review + +**Actions:** +- Review docs for accuracy +- Check formatting and links +- Verify examples work + +--- + +### mgr:todo + +**Hat:** Team Manager +**Description:** Task awaiting team manager + +**Used For:** Process improvement tasks (`kind/process-improvement`) + +**Actions:** +- Queue task for team-manager role +- Prioritize process improvements + +--- + +### mgr:in-progress + +**Hat:** Team Manager +**Description:** Team manager working on task + +**Used For:** Active process improvement + +**Actions:** +- Implement process change +- Update documentation +- Create retrospective action items + +--- + +### error + +**Hat:** (Error state) +**Description:** Issue failed processing 3 times + +**Used For:** Issues that hit repeated automation failures + +**Actions:** +- Human investigation required +- Fix underlying automation issue +- Manually recover or close + +--- + +## Transition Rules + +### Valid Transitions + +**Epics:** +- `po:triage` → `po:backlog` | close +- `po:backlog` → `arch:design` +- `arch:design` → `lead:design-review` +- `lead:design-review` → `arch:plan` | `arch:design` +- `arch:plan` → `lead:plan-review` +- `lead:plan-review` → `arch:breakdown` | `arch:plan` +- `arch:breakdown` → `po:ready` +- `po:ready` → `arch:in-progress` +- `arch:in-progress` → `po:accept` +- `po:accept` → `done` | `arch:in-progress` + +**Stories:** +- `dev:ready` → `qe:test-design` +- `qe:test-design` → `dev:implement` +- `dev:implement` → `dev:code-review` +- `dev:code-review` → `qe:verify` | `dev:implement` +- `qe:verify` → `arch:sign-off` +- `arch:sign-off` → `po:merge` +- `po:merge` → `done` + +### Invalid Transitions + +❌ `arch:design` → `arch:plan` (must go through `lead:design-review`) +❌ `dev:implement` → `qe:verify` (must go through `dev:code-review`) +❌ `po:backlog` → `done` (must complete full workflow) + +--- + +## Human Gates Summary + +| Status | Human Action Required | SLA | +|--------|----------------------|-----| +| `lead:design-review` | Approve/reject design doc | < 1 business day | +| `lead:plan-review` | Approve/reject story breakdown | < 1 business day | +| `dev:code-review` | Approve/request changes on PR | 4-8 hours | +| `po:accept` | Accept/reject completed epic | < 2 business days | + +--- + +**See Also:** +- [Sprint Process](../workflows/sprint-process.md) - How statuses flow during sprints +- [Epic Breakdown](../workflows/epic-breakdown.md) - Detailed epic workflow +- [Role Responsibilities](../roles/responsibilities.md) - Which hat owns each status diff --git a/ai-docs/workflows/index.md b/ai-docs/workflows/index.md new file mode 100644 index 0000000..20a8696 --- /dev/null +++ b/ai-docs/workflows/index.md @@ -0,0 +1,24 @@ +# Workflows + +Team processes and workflow guides for the Splat Team. + +--- + +## Core Workflows + +- **[sprint-process.md](sprint-process.md)** - Sprint ceremonies and cadence (coming soon) +- **[epic-breakdown.md](epic-breakdown.md)** - Epic → Stories workflow (coming soon) +- **[review-process.md](review-process.md)** - PR review and acceptance criteria (coming soon) +- **[triage-process.md](triage-process.md)** - Issue triage and regression handling (coming soon) + +--- + +## Process References + +For authoritative process documentation, see: +- **[PROCESS.md](../../PROCESS.md)** - Full team process specification +- **[Status Transitions](../statuses/transitions.md)** - Workflow state machine + +--- + +**Start Here:** Read [../TEAM_PHILOSOPHY.md](../TEAM_PHILOSOPHY.md) for process principles, then consult specific workflow docs for details. From c501b9b180421ad059fbf6671f09ccc59b9a25ba Mon Sep 17 00:00:00 2001 From: Richard Vanderpool <49568690+rvanderp3@users.noreply.github.com> Date: Fri, 1 May 2026 11:46:33 -0400 Subject: [PATCH 2/4] Update ai-docs using bm enrich templates Generated using the new scrum-compact profile templates from bm enrich implementation. This updates the team-level ai-docs to use the standardized template structure. Changes: - Updated AGENTS.md with improved navigation - Refreshed TEAM_PHILOSOPHY.md with template structure - Updated workflows, roles, and statuses documentation - Added sprint-process.md with comprehensive sprint workflow Generated with: bm enrich generate splat --team-only Co-Authored-By: Minty --- AGENTS.md | 150 ++++----------- ai-docs/TEAM_PHILOSOPHY.md | 106 ++++------- ai-docs/roles/index.md | 223 ++++++++++++++++++++-- ai-docs/statuses/index.md | 247 ++++++++++++++++++++++-- ai-docs/workflows/index.md | 79 ++++++-- ai-docs/workflows/sprint-process.md | 279 ++++++++++++++++++++++++++++ 6 files changed, 857 insertions(+), 227 deletions(-) create mode 100644 ai-docs/workflows/sprint-process.md diff --git a/AGENTS.md b/AGENTS.md index 1e1eff1..6518b65 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,7 +1,7 @@ -# Splat Team - Agent Navigation +# splat Team - Agent Navigation -**Profile:** scrum-compact -**Repository:** https://github.com/openshift-splat-team/splat-team +**Profile:** scrum-compact +**Repository:** https://github.com/openshift-splat-team/splat-team **Last Updated:** 2026-05-01 --- @@ -10,11 +10,11 @@ **IMPORTANT**: Prefer retrieval-led reasoning over pre-training-led reasoning. -When working on Splat Team: +When working on splat Team: - ✅ **DO**: Read relevant docs from `./ai-docs/` first - ✅ **DO**: Check team workflows in `./ai-docs/workflows/` - ✅ **DO**: Verify status transitions in `./ai-docs/statuses/` -- ✅ **DO**: Review project-specific context in `./projects//` +- ✅ **DO**: Review project-specific context in `./projects//` (if applicable) - ❌ **DON'T**: Rely solely on training data - ❌ **DON'T**: Guess at team processes or status meanings @@ -27,19 +27,16 @@ When working on Splat Team: ### Common Task Flows **Starting new epic?** -→ `ai-docs/workflows/epic-breakdown.md` → `ai-docs/statuses/transitions.md` → `PROCESS.md` +→ `ai-docs/workflows/sprint-process.md` → `ai-docs/statuses/index.md` → `PROCESS.md` **Implementing story?** -→ `ai-docs/workflows/sprint-process.md` → `ai-docs/practices/coding-standards.md` → `ai-docs/architecture/projects.md` - -**Working on specific project (e.g., installer)?** -→ `projects/installer/CLAUDE.md` → `ai-docs/architecture/projects.md` → `ai-docs/practices/testing.md` - -**Triaging issue or regression?** -→ `ai-docs/workflows/triage-process.md` → `coding-agent/skills/triage-regression/` +→ `ai-docs/workflows/sprint-process.md` → `ai-docs/roles/index.md` **Need role context?** -→ `ai-docs/roles/responsibilities.md` +→ `ai-docs/roles/index.md` → `ai-docs/TEAM_PHILOSOPHY.md` + +**Understanding workflow?** +→ `ai-docs/statuses/index.md` → `ai-docs/workflows/index.md` --- @@ -47,39 +44,16 @@ When working on Splat Team: | Task | Start Here | Then Read | |------|-----------|-----------| -| **Epic breakdown** | `ai-docs/workflows/epic-breakdown.md` | `ai-docs/statuses/transitions.md` | -| **Story implementation** | `ai-docs/workflows/sprint-process.md` | `projects//CLAUDE.md` | -| **Code review** | `ai-docs/workflows/review-process.md` | `ai-docs/practices/coding-standards.md` | -| **Triage regression** | `ai-docs/workflows/triage-process.md` | `coding-agent/skills/triage-regression/` | -| **Process improvement** | `ai-docs/roles/team-manager.md` | `PROCESS.md` | - ---- - -## Team Focus - -**Mission:** OpenShift vSphere/VMware platform engineering and CI/CD - -**Key Projects:** -- **vcf-migration-operator** - VMware Cloud Foundation migration tooling -- **installer** - OpenShift installer (vSphere provider support) -- **machine-api-operator** - vSphere machine provisioning -- **cluster-cloud-controller-manager-operator** - vSphere CCM integration -- **cloud-credential-operator** - vSphere credential management -- **vsphere-problem-detector** - Platform health diagnostics -- **opct** - OpenShift Provider Certification Tool - -See `ai-docs/architecture/projects.md` for full project list and details. +| **Epic breakdown** | `ai-docs/workflows/sprint-process.md` | `ai-docs/statuses/index.md` | +| **Story implementation** | `ai-docs/workflows/sprint-process.md` | `ai-docs/roles/index.md` | +| **Process question** | `ai-docs/TEAM_PHILOSOPHY.md` | `PROCESS.md` | +| **Status transitions** | `ai-docs/statuses/index.md` | `ai-docs/workflows/sprint-process.md` | --- ## Technology Stack -**Primary Languages:** Go (OpenShift operators), Python (automation/skills) -**Testing:** Go testing, pytest, Jest -**CI/CD:** Prow, GitHub Actions -**Documentation:** MkDocs - -**See:** `ai-docs/architecture/tech-stack.md` for details +**Languages:** Python --- @@ -87,12 +61,10 @@ See `ai-docs/architecture/projects.md` for full project list and details. ### Roles -| Role | Description | When Active | -|------|-------------|-------------| -| **superman** | All-in-one member (PO, architect, dev, QE, SRE, content writer) | All tasks | -| **team-manager** | Process improvement and team coordination | Process tasks only | +- **superman**: All-in-one member — PO, architect, dev, QE, SRE, content writer +- **team-manager**: Process improvement and team coordination -See `ai-docs/roles/responsibilities.md` for detailed hat-switching guide. +See `ai-docs/roles/index.md` for detailed responsibilities and hat-switching guide. --- @@ -100,14 +72,14 @@ See `ai-docs/roles/responsibilities.md` for detailed hat-switching guide. **Epic Flow:** ``` -po:triage → po:backlog → arch:design → lead:design-review → -arch:plan → lead:plan-review → arch:breakdown → po:ready → +po:triage → po:backlog → arch:design → lead:design-review → +arch:plan → lead:plan-review → arch:breakdown → po:ready → arch:in-progress → po:accept → done ``` **Story Flow:** ``` -dev:ready → qe:test-design → dev:implement → dev:code-review → +dev:ready → qe:test-design → dev:implement → dev:code-review → qe:verify → arch:sign-off → po:merge → done ``` @@ -116,7 +88,7 @@ qe:verify → arch:sign-off → po:merge → done - `lead:plan-review` - Human must approve story breakdown - `dev:code-review` - Human must approve PR -See `ai-docs/statuses/transitions.md` for full state machine. +See `ai-docs/statuses/index.md` for full state machine and transitions. --- @@ -126,29 +98,9 @@ See `ai-docs/statuses/transitions.md` for full state machine. |-------|------|-------------| | **Team principles** | `ai-docs/TEAM_PHILOSOPHY.md` | Methodology and values | | **Sprint process** | `ai-docs/workflows/sprint-process.md` | Ceremonies and cadence | -| **Epic breakdown** | `ai-docs/workflows/epic-breakdown.md` | Design → Stories flow | -| **Review process** | `ai-docs/workflows/review-process.md` | PR and acceptance criteria | -| **Triage process** | `ai-docs/workflows/triage-process.md` | Issue triage and regression handling | -| **Projects overview** | `ai-docs/architecture/projects.md` | Project descriptions and context | -| **Coding standards** | `ai-docs/practices/coding-standards.md` | Code conventions | -| **Testing guide** | `ai-docs/practices/testing.md` | Test strategy | -| **Custom skills** | `ai-docs/architecture/skills.md` | Team automation tools | - ---- - -## Custom Skills - -The team has specialized skills for OpenShift CI/CD: - -| Skill | Purpose | Location | -|-------|---------|----------| -| **triage-regression** | Triage CI failures and regressions | `coding-agent/skills/triage-regression/` | -| **summarize-jiras** | Summarize related Jira issues | `coding-agent/skills/summarize-jiras/` | -| **suggest-reviewers** | Suggest PR reviewers via git blame | `coding-agent/skills/suggest-reviewers/` | -| **prow-job-analyze-resource** | Analyze Prow job resources | `coding-agent/skills/prow-job-analyze-resource/` | -| **prow-job-extract-must-gather** | Extract must-gather from Prow jobs | `coding-agent/skills/prow-job-extract-must-gather/` | - -See `ai-docs/architecture/skills.md` for usage details. +| **Role responsibilities** | `ai-docs/roles/index.md` | Role definitions and hat-switching | +| **Status transitions** | `ai-docs/statuses/index.md` | State machine and workflow | +| **Process overview** | `PROCESS.md` | High-level process guide | --- @@ -157,55 +109,29 @@ See `ai-docs/architecture/skills.md` for usage details. ``` ai-docs/ ├── TEAM_PHILOSOPHY.md # Core principles -├── architecture/ # System structure -│ ├── index.md -│ ├── projects.md # Project descriptions -│ ├── tech-stack.md # Technologies used -│ └── skills.md # Custom automation tools ├── workflows/ # Process guides │ ├── index.md -│ ├── sprint-process.md # Sprint ceremonies -│ ├── epic-breakdown.md # Epic → Stories -│ ├── review-process.md # PR review -│ └── triage-process.md # Issue triage +│ └── sprint-process.md # Sprint ceremonies ├── roles/ # Role definitions -│ ├── index.md -│ ├── responsibilities.md # Hat-switching guide -│ └── team-manager.md # Process coordination -├── statuses/ # Status system -│ ├── index.md -│ └── transitions.md # State machine -├── practices/ # Engineering practices -│ ├── index.md -│ ├── coding-standards.md # Code conventions -│ ├── testing.md # Test strategy -│ └── ci-cd.md # Prow and GitHub Actions -├── decisions/ # Architectural Decision Records -│ ├── index.md -│ └── adr-template.md # ADR template -└── references/ # Quick reference - ├── index.md - ├── glossary.md # Team terminology - └── shortcuts.md # Common commands +│ └── index.md +└── statuses/ # Status system + └── index.md ``` --- -## Project-Specific Context +## Profile Methodology -Each forked project has its own context: - -``` -projects// -├── CLAUDE.md # Project-specific agent guidance -├── .context.md # Additional context (if present) -└── ...project files... -``` +This team uses the **scrum-compact** profile, which means: +- GitHub-centric issue tracking and project management +- Human-in-the-loop design and review gates +- Status-based workflow progression +- Solo operator with multiple hats (for compact teams) -**Always read `projects//CLAUDE.md` before working on that project.** +For full methodology details, see `ai-docs/TEAM_PHILOSOPHY.md`. --- **Navigation**: Start with `ai-docs/TEAM_PHILOSOPHY.md` for team context. -**Feedback**: Report issues via GitHub issues with `kind/docs` label. +**GitHub**: https://github.com/openshift-splat-team/splat-team diff --git a/ai-docs/TEAM_PHILOSOPHY.md b/ai-docs/TEAM_PHILOSOPHY.md index 1916b79..a15590c 100644 --- a/ai-docs/TEAM_PHILOSOPHY.md +++ b/ai-docs/TEAM_PHILOSOPHY.md @@ -1,13 +1,15 @@ -# Splat Team Philosophy +# splat Team Philosophy -**Profile:** scrum-compact +**Profile:** scrum-compact **Last Updated:** 2026-05-01 --- ## Mission -Build and maintain OpenShift's vSphere/VMware platform integration, ensuring reliable installation, operation, and troubleshooting capabilities for enterprise customers running OpenShift on VMware infrastructure. +Define your team's mission here. What problem does your team solve? Who are your stakeholders? + +*(Edit this section to reflect your team's specific mission and goals)* --- @@ -57,61 +59,17 @@ Build and maintain OpenShift's vSphere/VMware platform integration, ensuring rel --- -### 4. OpenShift CI/CD Native +### 4. Retrieval Over Training Data -**Principle:** Deeply integrated with OpenShift development workflows (Prow, must-gather, Jira). +**Principle:** Read team documentation before acting, don't rely on pre-training knowledge. -**Why:** We maintain OpenShift operators — must follow upstream conventions and tooling. +**Why:** Team-specific processes and conventions may differ from general best practices. **How to Apply:** -- Use Prow for CI/CD (not GitHub Actions for operator testing) -- Follow OpenShift enhancement process for design docs -- Use must-gather for debugging and diagnostics -- Link work to Jira issues when working on upstream bugs - ---- - -### 5. Forked Project Model - -**Principle:** Work happens in forked repos under `openshift-splat-team` org. - -**Why:** Isolate team work from upstream, enable independent testing, prepare PRs for upstream. - -**Workflow:** -1. Fork upstream OpenShift repo → `openshift-splat-team/` -2. Work in fork (issues, branches, PRs) -3. Test in fork's Prow environment -4. Submit PR to upstream when ready - -**See:** `architecture/projects.md` for active forks. - ---- - -### 6. Test-First vSphere Focus - -**Principle:** All changes must include vSphere-specific tests and validation. - -**Why:** vSphere platform has unique failure modes — can't rely solely on generic tests. - -**How to Apply:** -- Add vSphere e2e tests for new features -- Test against real vSphere environments (not just mocks) -- Include must-gather diagnostics for debugging -- Validate upgrade paths (N → N+1) - ---- - -### 7. Continuous Documentation - -**Principle:** Document as you build — design docs, ADRs, code comments, user guides. - -**Why:** Complex platform integration requires clear documentation for troubleshooting and knowledge transfer. - -**Artifacts:** -- Design docs (in epics, linked from `lead:design-review` status) -- ADRs for architectural decisions (`ai-docs/decisions/`) -- MkDocs for user-facing guides (`docs/`) -- Code comments for non-obvious vSphere behavior +- Check `ai-docs/` before making process assumptions +- Verify status transitions in `ai-docs/statuses/index.md` +- Read `PROCESS.md` for workflow guidance +- Reference role responsibilities in `ai-docs/roles/index.md` --- @@ -151,14 +109,11 @@ Build and maintain OpenShift's vSphere/VMware platform integration, ensuring rel ❌ **Merging without human PR review** - `dev:code-review` requires human approval (blocking) -❌ **Generic tests for vSphere features** -- Always include vSphere-specific validation +❌ **Ignoring team documentation** +- Always read relevant ai-docs before making assumptions -❌ **Working directly in upstream repos** -- Use forked repos under `openshift-splat-team/` - -❌ **Forgetting must-gather updates** -- New features need must-gather collection logic +❌ **Advancing status without completing work** +- Status transitions must reflect actual work state --- @@ -182,7 +137,7 @@ Build and maintain OpenShift's vSphere/VMware platform integration, ensuring rel - Architectural pivots - Breaking changes - Security implications -- Upstream policy changes +- Major process changes --- @@ -193,8 +148,7 @@ Build and maintain OpenShift's vSphere/VMware platform integration, ensuring rel - Sustainable pace, not maximum throughput **Quality:** -- Zero regressions in vSphere e2e tests -- All PRs pass upstream CI before submission +- All PRs pass CI before submission - Design docs reviewed within 1 business day **Responsiveness:** @@ -202,6 +156,8 @@ Build and maintain OpenShift's vSphere/VMware platform integration, ensuring rel - Blockers escalated within 1 business day - PR reviews completed within 2 business days +*(Adjust these metrics based on your team's context and capacity)* + --- ## Team Culture @@ -214,8 +170,24 @@ Build and maintain OpenShift's vSphere/VMware platform integration, ensuring rel --- +## Role Definitions + +- **superman**: All-in-one member — PO, architect, dev, QE, SRE, content writer +- **team-manager**: Process improvement and team coordination + +See `ai-docs/roles/index.md` for detailed hat-switching guide. + +--- + +## Status Workflow + +No statuses defined + +See `ai-docs/statuses/index.md` for full state machine and transition rules. + +--- + **See Also:** - [Sprint Process](workflows/sprint-process.md) - Ceremonies and cadence -- [Epic Breakdown](workflows/epic-breakdown.md) - Design → Stories details -- [Status Transitions](statuses/transitions.md) - Full state machine -- [Role Responsibilities](roles/responsibilities.md) - Hat-switching guide +- [Status Transitions](statuses/index.md) - Full state machine +- [Role Responsibilities](roles/index.md) - Hat-switching guide diff --git a/ai-docs/roles/index.md b/ai-docs/roles/index.md index d65d1ed..d2c9024 100644 --- a/ai-docs/roles/index.md +++ b/ai-docs/roles/index.md @@ -1,36 +1,219 @@ -# Roles +# splat Roles -Role definitions and responsibilities for the Splat Team. +**Profile:** scrum-compact --- -## Active Roles +## Team Roles -### Superman -All-in-one team member wearing multiple hats (PO, Architect, Dev, QE, SRE, Content Writer). +- **superman**: All-in-one member — PO, architect, dev, QE, SRE, content writer +- **team-manager**: Process improvement and team coordination -**See:** [responsibilities.md](responsibilities.md) for hat-switching guide. +--- + +## Role Descriptions + +### Superman Role + +**Purpose:** All-in-one team member wearing multiple hats -### Team Manager -Process improvement and team coordination role. +**Hats Worn:** +- **PO (Product Owner)**: Manages backlog, triages issues, accepts work +- **Architect**: Designs solutions, reviews technical approach +- **Developer**: Implements code, writes tests +- **QE (Quality Engineer)**: Designs tests, verifies implementations +- **SRE (Site Reliability Engineer)**: Infrastructure setup and monitoring (when applicable) +- **Content Writer**: Documentation and user guides (when applicable) -**See:** [team-manager.md](team-manager.md) (coming soon) +**When Active:** All work items (default role) + +**Status Prefixes:** +- `po:*` - Product Owner hat +- `arch:*` - Architect hat +- `dev:*` - Developer hat +- `qe:*` - QE hat +- `sre:*` - SRE hat +- `cw:*` - Content Writer hat --- -## Hat-Switching +### Team Manager Role + +**Purpose:** Process improvement and team coordination + +**Responsibilities:** +- Process improvement tasks +- Team coordination activities +- Methodology refinement +- Documentation maintenance + +**When Active:** Only for process-related work items + +**Status Prefixes:** +- `mgr:*` - Manager statuses + +**Label:** `role/team-manager` + +--- + +## Hat-Switching Guide + +### Why Explicit Hat Switching? + +The scrum-compact profile uses a solo operator model where one agent performs all roles. Explicit hat-switching provides clarity about: +- Current responsibility context +- Which perspective you're taking +- What work is being performed + +### How to Switch Hats + +**Announce the switch explicitly:** + +``` +Switching to QE hat: Designing test plan for story #123 +``` + +``` +Dev hat: Implementing feature X based on design +``` + +``` +PO hat: Triaging new epic, evaluating priority +``` + +**Status transitions often trigger hat changes:** +- `arch:design` → **Architect hat**: Writing design doc +- `qe:test-design` → **QE hat**: Planning tests +- `dev:implement` → **Developer hat**: Writing code +- `po:triage` → **PO hat**: Evaluating new work + +--- + +## Responsibilities by Hat + +### Product Owner (PO) Hat + +**Responsibilities:** +- Triage new issues (`po:triage`) +- Manage and prioritize backlog (`po:backlog`) +- Accept completed work (`po:accept`) +- Merge approved PRs (`po:merge`) + +**Mindset:** +- Customer value focus +- Priority and scope decisions +- Stakeholder communication + +--- + +### Architect Hat + +**Responsibilities:** +- Design technical solutions (`arch:design`) +- Plan story breakdowns (`arch:plan`) +- Create story issues (`arch:breakdown`) +- Sign off on implementations (`arch:sign-off`) +- Monitor in-progress work (`arch:in-progress`) + +**Mindset:** +- System architecture +- Technical approach +- Long-term maintainability +- Risk assessment + +--- + +### Developer Hat + +**Responsibilities:** +- Implement features (`dev:implement`) +- Write code and tests +- Create pull requests +- Address code review feedback (`dev:code-review`) + +**Mindset:** +- Code quality +- Test coverage +- Clear implementation +- Best practices + +--- + +### QE (Quality Engineer) Hat + +**Responsibilities:** +- Design test plans (`qe:test-design`) +- Define test cases and criteria +- Verify implementations (`qe:verify`) +- Check for regressions + +**Mindset:** +- Quality assurance +- Edge case thinking +- Test coverage +- Acceptance criteria validation + +--- + +### Lead Hat (Human Gate) + +**Responsibilities:** +- Review and approve design docs (`lead:design-review`) +- Review and approve story breakdowns (`lead:plan-review`) +- Review and approve code changes (`dev:code-review`) + +**Note:** Lead gates are **human-performed** reviews, not AI-automated. + +--- + +## Status → Hat Mapping + +| Status | Hat | Responsibility | +|--------|-----|----------------| +| `po:triage` | PO | Evaluate new work | +| `po:backlog` | PO | Prioritize work | +| `arch:design` | Architect | Write design doc | +| `lead:design-review` | **Human** | Approve design | +| `arch:plan` | Architect | Propose story breakdown | +| `lead:plan-review` | **Human** | Approve breakdown | +| `arch:breakdown` | Architect | Create story issues | +| `po:ready` | PO | Ready for sprint | +| `dev:ready` | Developer | Ready to implement | +| `qe:test-design` | QE | Design test plan | +| `dev:implement` | Developer | Write code | +| `dev:code-review` | **Human** | Review PR | +| `qe:verify` | QE | Verify implementation | +| `arch:sign-off` | Architect | Technical approval | +| `po:merge` | PO | Merge and close | +| `po:accept` | PO | Accept epic | + +--- + +## Best Practices + +**1. Announce hat switches clearly** +- Don't switch silently mid-comment +- State which hat you're wearing + +**2. Maintain perspective consistency** +- Stay in role while wearing a hat +- Don't mix PO and Dev thinking in same comment + +**3. Respect human gates** +- Never advance past `lead:*` statuses without human approval +- Don't auto-approve your own design or code -The superman role switches between different "hats" throughout the workflow: +**4. Use status prefixes to guide hat choice** +- Status name indicates which hat to wear +- Follow the workflow naturally -| Hat | Primary Statuses | Key Responsibilities | -|-----|------------------|---------------------| -| **PO** | `po:triage`, `po:backlog`, `po:accept` | Evaluation, prioritization, acceptance | -| **Architect** | `arch:design`, `arch:plan`, `arch:breakdown` | Design, breakdown, coordination | -| **Dev** | `dev:implement`, `dev:code-review` | Implementation, PR maintenance | -| **QE** | `qe:test-design`, `qe:verify` | Test planning, verification | -| **SRE** | `sre:infra-setup` | Infrastructure provisioning | -| **Content Writer** | `cw:write`, `cw:review` | Documentation | +**5. Comment with hat context** +- Begin comments with hat indicator: "Dev hat: ..." +- Helps humans understand which perspective you're taking --- -**Start Here:** Read [responsibilities.md](responsibilities.md) to understand hat-switching patterns. +**See Also:** +- [Team Philosophy](../TEAM_PHILOSOPHY.md) - Core principles +- [Sprint Process](../workflows/sprint-process.md) - Workflow details +- [Status Transitions](../statuses/index.md) - Status workflow diff --git a/ai-docs/statuses/index.md b/ai-docs/statuses/index.md index 6d2b3fd..6b084af 100644 --- a/ai-docs/statuses/index.md +++ b/ai-docs/statuses/index.md @@ -1,36 +1,251 @@ -# Status System +# splat Status Workflow -Status tracking and workflow state machine for the Splat Team. +**Profile:** scrum-compact --- -## Contents +## Overview -- **[transitions.md](transitions.md)** - Full status workflow and state machine +The splat team uses GitHub Projects v2 Status field to track work progression through defined states. + +Statuses follow a prefix convention: `:`, indicating which role is responsible for the work. + +--- + +## All Statuses + +No statuses defined --- -## Quick Reference +## Status Categories + +### Epic Statuses + +**Lifecycle for `kind/epic` issues:** -**Epic Flow:** ``` -po:triage → po:backlog → arch:design → lead:design-review → -arch:plan → lead:plan-review → arch:breakdown → po:ready → +po:triage → po:backlog → arch:design → lead:design-review → +arch:plan → lead:plan-review → arch:breakdown → po:ready → arch:in-progress → po:accept → done ``` -**Story Flow:** +**Key gates:** +- **Human gates:** `lead:design-review`, `lead:plan-review` +- **Auto-advance:** After human approval + +--- + +### Story Statuses + +**Lifecycle for `kind/story` issues:** + ``` -dev:ready → qe:test-design → dev:implement → dev:code-review → +dev:ready → qe:test-design → dev:implement → dev:code-review → qe:verify → arch:sign-off → po:merge → done ``` -**Human Gates (require human approval):** -- `lead:design-review` (< 1 business day SLA) -- `lead:plan-review` (< 1 business day SLA) -- `dev:code-review` (4-8 hours SLA) -- `po:accept` (< 2 business days SLA) +**Key gates:** +- **Human gate:** `dev:code-review` (PR approval) +- **Auto-advance:** `arch:sign-off`, `po:merge` + +--- + +### Specialist Statuses + +**SRE workflow:** +``` +sre:infra-setup → done +``` + +**Content Writer workflow:** +``` +cw:write → cw:review → cw:merge-ready → done +``` + +**Team Manager workflow:** +``` +mgr:todo → mgr:in-progress → mgr:done +``` + +--- + +## Status Transition Rules + +### Epic Transitions + +**po:triage** → New epic awaiting evaluation +- ✅ Can advance to: `po:backlog` (accepted), `done` (rejected) +- 👤 Who: PO hat +- 🎯 Action: Evaluate priority and scope + +**po:backlog** → Accepted, prioritized, awaiting activation +- ✅ Can advance to: `arch:design` +- 👤 Who: PO hat activates when ready +- 🎯 Action: Move to active work + +**arch:design** → Architect producing design doc +- ✅ Can advance to: `lead:design-review` +- 👤 Who: Architect hat +- 🎯 Action: Write design document in issue + +**lead:design-review** → **HUMAN GATE** - Design doc awaiting lead review +- ✅ Can advance to: `arch:plan` (approved), `arch:design` (changes requested) +- 👤 Who: **Human reviewer** +- 🎯 Action: Review and approve/reject design + +**arch:plan** → Architect proposing story breakdown +- ✅ Can advance to: `lead:plan-review` +- 👤 Who: Architect hat +- 🎯 Action: Propose list of stories with estimates + +**lead:plan-review** → **HUMAN GATE** - Story breakdown awaiting lead review +- ✅ Can advance to: `arch:breakdown` (approved), `arch:plan` (changes requested) +- 👤 Who: **Human reviewer** +- 🎯 Action: Review and approve/reject breakdown + +**arch:breakdown** → Architect creating story issues +- ✅ Can advance to: `po:ready` +- 👤 Who: Architect hat +- 🎯 Action: Create GitHub issues for each story + +**po:ready** → Stories created, epic in ready backlog +- ✅ Can advance to: `arch:in-progress` +- 👤 Who: PO hat (when stories start) +- 🎯 Action: Monitor story progress + +**arch:in-progress** → Architect monitoring story execution +- ✅ Can advance to: `po:accept` +- 👤 Who: Architect hat +- 🎯 Action: All stories completed + +**po:accept** → **HUMAN GATE** - Epic awaiting human acceptance +- ✅ Can advance to: `done` (accepted) +- 👤 Who: **Human stakeholder** +- 🎯 Action: Validate epic completion + +--- + +### Story Transitions + +**dev:ready** → Story ready for development +- ✅ Can advance to: `qe:test-design` +- 👤 Who: Developer hat picks up work +- 🎯 Action: Understand story requirements + +**qe:test-design** → QE designing tests +- ✅ Can advance to: `dev:implement` +- 👤 Who: QE hat +- 🎯 Action: Design test plan and criteria + +**dev:implement** → Developer implementing +- ✅ Can advance to: `dev:code-review` +- 👤 Who: Developer hat +- 🎯 Action: Write code, tests, open PR + +**dev:code-review** → **HUMAN GATE** - Code review +- ✅ Can advance to: `qe:verify` (approved), `dev:implement` (changes requested) +- 👤 Who: **Human reviewer** +- 🎯 Action: Review PR, approve or request changes + +**qe:verify** → QE verifying implementation +- ✅ Can advance to: `arch:sign-off` +- 👤 Who: QE hat +- 🎯 Action: Verify tests pass, acceptance criteria met + +**arch:sign-off** → Architect sign-off (auto-advance) +- ✅ Can advance to: `po:merge` +- 👤 Who: Architect hat +- 🎯 Action: Auto-advance if no concerns + +**po:merge** → Merge gate (auto-advance) +- ✅ Can advance to: `done` +- 👤 Who: PO hat +- 🎯 Action: Merge PR, close story + +--- + +## Special Statuses + +**done** → Complete +- Final state for all work items +- No further transitions + +**error** → Issue failed processing 3 times +- Requires human intervention +- Investigate processing failure + +--- + +## Human Gates + +**CRITICAL:** Never advance past these statuses without human approval: +- `lead:design-review` - Human must approve design +- `lead:plan-review` - Human must approve story breakdown +- `dev:code-review` - Human must approve PR + +**These are blocking gates.** Work cannot proceed until human reviews and approves. + +--- + +## Auto-Advance Statuses + +These statuses can auto-advance after validation: +- `arch:sign-off` - After tests pass and verification complete +- `po:merge` - After human PR approval + +--- + +## Status Workflow Best Practices + +**1. Never skip statuses** +- Follow the defined workflow +- Each status represents required work + +**2. Respect human gates** +- Wait for human approval at `lead:*` and `dev:code-review` +- Don't try to auto-advance + +**3. Status reflects current work** +- Advance status when entering new phase +- Don't advance prematurely + +**4. Use hat matching status prefix** +- `arch:*` → Architect hat +- `dev:*` → Developer hat +- `qe:*` → QE hat +- `po:*` → PO hat + +**5. Comment when advancing status** +- Explain what was completed +- Note any concerns or blockers + +--- + +## Common Status Questions + +**What if I'm blocked in a status?** +- Add `blocked` label +- Comment with blocker details +- Don't advance status +- Escalate if blocker > 1 day + +**What if human gate is delayed?** +- Escalate via direct message +- Work on other items +- Don't advance status without approval + +**Can I go backwards in status?** +- Yes, if changes are requested +- Example: `lead:design-review` → `arch:design` (changes needed) + +**What if tests fail?** +- Stay in `dev:implement` until fixed +- Don't advance to `dev:code-review` with failing tests --- -**Start Here:** Read [transitions.md](transitions.md) for full state machine and definitions. +**See Also:** +- [Sprint Process](../workflows/sprint-process.md) - How status fits into sprints +- [Role Responsibilities](../roles/index.md) - Hat-switching details +- [Team Philosophy](../TEAM_PHILOSOPHY.md) - Core principles diff --git a/ai-docs/workflows/index.md b/ai-docs/workflows/index.md index 20a8696..9c806c6 100644 --- a/ai-docs/workflows/index.md +++ b/ai-docs/workflows/index.md @@ -1,24 +1,79 @@ -# Workflows +# splat Workflows -Team processes and workflow guides for the Splat Team. +**Profile:** scrum-compact --- -## Core Workflows +## Overview -- **[sprint-process.md](sprint-process.md)** - Sprint ceremonies and cadence (coming soon) -- **[epic-breakdown.md](epic-breakdown.md)** - Epic → Stories workflow (coming soon) -- **[review-process.md](review-process.md)** - PR review and acceptance criteria (coming soon) -- **[triage-process.md](triage-process.md)** - Issue triage and regression handling (coming soon) +This section documents the team's workflows and processes for managing work from idea to deployment. + +The scrum-compact profile uses a GitHub-centric workflow with human-in-the-loop gates at critical decision points. + +--- + +## Key Workflows + +### Sprint Process +See [sprint-process.md](sprint-process.md) for details on: +- Sprint planning and ceremonies +- Status transitions during a sprint +- Epic breakdown and story implementation +- Review and acceptance criteria --- -## Process References +## Workflow Principles + +**1. Everything in GitHub** +- Issues for all work items (epics and stories) +- GitHub Projects v2 for tracking +- Status field for workflow state +- PRs for all code changes + +**2. Human Gates at Critical Points** +- Design review (`lead:design-review`) +- Plan review (`lead:plan-review`) +- Code review (`dev:code-review`) + +**3. Status-Driven Progression** +- Issues advance through defined states +- Status reflects current work phase +- Transitions documented in [../statuses/index.md](../statuses/index.md) + +**4. Single Source of Truth** +- GitHub issues are authoritative +- Status field shows current state +- Comments capture decisions and feedback + +--- + +## Process Documentation + +- [Sprint Process](sprint-process.md) - Sprint ceremonies and cadence +- [Status Transitions](../statuses/index.md) - State machine and workflow +- [Role Responsibilities](../roles/index.md) - Who does what + +--- + +## Getting Started + +**For new epics:** +1. Create epic issue with `kind/epic` label +2. Triage (`po:triage`) +3. Write design doc (`arch:design`) +4. Submit for design review (`lead:design-review`) +5. Break down into stories (`arch:plan`, `arch:breakdown`) -For authoritative process documentation, see: -- **[PROCESS.md](../../PROCESS.md)** - Full team process specification -- **[Status Transitions](../statuses/transitions.md)** - Workflow state machine +**For new stories:** +1. Story enters backlog (`dev:ready`) +2. Design tests (`qe:test-design`) +3. Implement (`dev:implement`) +4. Open PR and request review (`dev:code-review`) +5. Verify and merge (`qe:verify`, `arch:sign-off`, `po:merge`) --- -**Start Here:** Read [../TEAM_PHILOSOPHY.md](../TEAM_PHILOSOPHY.md) for process principles, then consult specific workflow docs for details. +**See Also:** +- [Team Philosophy](../TEAM_PHILOSOPHY.md) - Core principles +- [Status Index](../statuses/index.md) - Status definitions diff --git a/ai-docs/workflows/sprint-process.md b/ai-docs/workflows/sprint-process.md new file mode 100644 index 0000000..3012a65 --- /dev/null +++ b/ai-docs/workflows/sprint-process.md @@ -0,0 +1,279 @@ +# splat Sprint Process + +**Profile:** scrum-compact + +--- + +## Overview + +The splat team follows a lightweight sprint process adapted for the scrum-compact profile. + +**Sprint Length:** 2 weeks +**Planning:** Start of sprint +**Review/Retro:** End of sprint + +--- + +## Sprint Ceremonies + +### Sprint Planning + +**When:** First day of sprint +**Duration:** 1-2 hours +**Goal:** Commit to work for the sprint + +**Process:** +1. Review sprint goal +2. Select stories from `po:ready` backlog +3. Move stories to `dev:ready` +4. Estimate story points +5. Commit to sprint scope (13-21 points target) + +**Status Transitions:** +- Stories: `po:ready` → `dev:ready` + +--- + +### Daily Standup (Async) + +**When:** Daily +**Format:** GitHub issue comments or team channel + +**Focus:** +- Progress since yesterday +- Plan for today +- Blockers or risks + +**For Solo Teams:** +- Brief status update to stakeholders +- Note any blockers needing escalation + +--- + +### Sprint Review + +**When:** Last day of sprint +**Duration:** 1 hour +**Goal:** Demonstrate completed work + +**Process:** +1. Demo completed stories +2. Show merged PRs +3. Highlight achievements +4. Collect feedback + +**Acceptance:** +- Stories must be in `done` status +- All PRs merged +- Tests passing + +--- + +### Sprint Retrospective + +**When:** After sprint review +**Duration:** 30-60 minutes +**Goal:** Process improvement + +**Format:** +- What went well? +- What could improve? +- Action items for next sprint + +**For Solo Teams:** +- Self-reflection on process +- Identify process improvements +- Update team documentation + +--- + +## Epic Breakdown Flow + +### Phase 1: Design + +**Status:** `arch:design` + +1. Write design document in epic issue +2. Include: + - Problem statement + - Proposed approach + - Architectural implications + - Risks and mitigations + - Success criteria + +**Deliverable:** Design doc in epic description or linked document + +--- + +### Phase 2: Design Review + +**Status:** `lead:design-review` + +**Human gate:** Awaiting human approval + +**Review criteria:** +- Approach aligns with architecture +- Risks identified and mitigated +- Success criteria clear +- Stakeholder sign-off + +**Outcomes:** +- ✅ Approved → `arch:plan` +- ⛔ Changes requested → back to `arch:design` + +--- + +### Phase 3: Story Breakdown + +**Status:** `arch:plan` → `lead:plan-review` → `arch:breakdown` + +1. **Propose breakdown** (`arch:plan`) + - List individual stories + - Estimate story points + - Define acceptance criteria per story + +2. **Human reviews breakdown** (`lead:plan-review`) + - Verify stories are appropriate size + - Check acceptance criteria + - Approve or request changes + +3. **Create story issues** (`arch:breakdown`) + - Create GitHub issues for each story + - Link to parent epic + - Add labels (`kind/story`, etc.) + - Set acceptance criteria + +**Deliverable:** Individual story issues in `po:ready` status + +--- + +## Story Implementation Flow + +### Phase 1: Test Design + +**Status:** `qe:test-design` + +**QE hat:** +1. Design test plan +2. Identify test cases (happy path, edge cases, errors) +3. Define test criteria +4. Document in issue comments + +--- + +### Phase 2: Implementation + +**Status:** `dev:implement` + +**Dev hat:** +1. Create feature branch +2. Write code +3. Write tests (based on QE test plan) +4. Ensure tests pass locally +5. Push branch + +--- + +### Phase 3: Code Review + +**Status:** `dev:code-review` + +**Human gate:** Awaiting PR approval + +1. Open PR against main branch +2. Request human review +3. CI must pass +4. Address review feedback + +**Outcomes:** +- ✅ Approved → `qe:verify` +- ⛔ Changes requested → stay in `dev:code-review` + +--- + +### Phase 4: Verification + +**Status:** `qe:verify` → `arch:sign-off` → `po:merge` + +1. **QE verification** (`qe:verify`) + - QE hat confirms tests pass + - Verify acceptance criteria met + - Check for regressions + +2. **Architect sign-off** (`arch:sign-off`) + - Auto-advance if tests pass + - Verify no architectural concerns + +3. **Merge** (`po:merge`) + - Auto-merge after approval + - Close story issue + +**Deliverable:** Merged PR, story in `done` status + +--- + +## Status Workflow Summary + +### Epic States + +No statuses defined + +--- + +## Working with Multiple Hats + +**When you switch hats, explicitly state it:** + +> "Switching to QE hat: Designing test plan for story #123" + +> "Dev hat: Implementing feature X" + +> "PO hat: Triaging new epic" + +**Why:** Clarity about current role and responsibilities + +--- + +## Sprint Metrics + +**Velocity:** +- Track story points completed per sprint +- Target: 13-21 points per 2-week sprint + +**Quality:** +- PR approval time +- Test pass rate +- Regression rate + +**Cycle Time:** +- Time from `dev:ready` to `done` +- Target: < 3 days for small stories + +--- + +## Common Patterns + +**Epic too large?** +- Break into multiple smaller epics +- Create epic parent-child relationships + +**Story blocked?** +- Add `blocked` label +- Comment with blocker details +- Escalate if blocker > 1 day + +**Human gate delayed?** +- Escalate via direct message +- Continue with other work +- Don't advance status prematurely + +**Tests failing?** +- Stay in `dev:implement` until tests pass +- Don't open PR with failing tests + +--- + +**See Also:** +- [Team Philosophy](../TEAM_PHILOSOPHY.md) - Core principles +- [Status Index](../statuses/index.md) - Full state machine +- [Role Responsibilities](../roles/index.md) - Hat-switching details From c845bb2188ddcc31b6cfefa1e2812688c2ac78e5 Mon Sep 17 00:00:00 2001 From: Richard Vanderpool <49568690+rvanderp3@users.noreply.github.com> Date: Tue, 12 May 2026 14:17:28 -0400 Subject: [PATCH 3/4] fix: Prevent duplicate PR processing in monitor-active-prs The monitor-active-prs skill was re-processing the same PRs on every scan cycle because the idempotency check was too strict. Changes: - Replace phrase-specific comment checks with timestamp-based checks - Look for ANY bot comment after the review time, not just specific phrases - Apply same fix to both 'changes requested' and inline comment checks - Add documentation in superman-atlas knowledge base This prevents duplicate event emissions for PRs that have already been responded to. --- .../skills/monitor-active-prs/SKILL.md | 58 +++++++------- .../knowledge/pr-monitoring-fix.md | 80 +++++++++++++++++++ 2 files changed, 110 insertions(+), 28 deletions(-) create mode 100644 members/superman-atlas/knowledge/pr-monitoring-fix.md diff --git a/coding-agent/skills/monitor-active-prs/SKILL.md b/coding-agent/skills/monitor-active-prs/SKILL.md index a595eaa..dd32054 100644 --- a/coding-agent/skills/monitor-active-prs/SKILL.md +++ b/coding-agent/skills/monitor-active-prs/SKILL.md @@ -84,25 +84,26 @@ check_pr_feedback() { REVIEWER=$(echo "$CHANGES_REQUESTED" | jq -r '.author.login') REVIEW_TIME=$(echo "$CHANGES_REQUESTED" | jq -r '.submittedAt') - # Check if already responded - LAST_RESPONSE=$(echo "$PR_DATA" | jq -r ' + # Check if already responded - look for ANY bot comment after the review time + BOT_RESPONSE_COUNT=$(echo "$PR_DATA" | jq --arg review_time "$REVIEW_TIME" ' [.comments[] | select(.author.login == "splat-sdlc-agent[bot]") | - select(.body | contains("Feedback Addressed") or contains("Working on"))] | - sort_by(.createdAt) | - reverse | - .[0].createdAt // empty + select(.createdAt > $review_time)] | + length ') - if [ -z "$LAST_RESPONSE" ] || [ "$REVIEW_TIME" \> "$LAST_RESPONSE" ]; then - echo "PR #${pr_num} (story #${STORY_NUM}) has unaddressed feedback from @${REVIEWER}" - - # Emit event to trigger response - ralph tools pubsub publish dev.pr-feedback \ - "story=${STORY_NUM}, project=${project}, pr=${pr_num}, reviewer=${REVIEWER}" - - return 0 + if [ "$BOT_RESPONSE_COUNT" -gt 0 ]; then + echo "Already responded to review from @${REVIEWER} at ${REVIEW_TIME}" + return 1 fi + + echo "PR #${pr_num} (story #${STORY_NUM}) has unaddressed feedback from @${REVIEWER}" + + # Emit event to trigger response + ralph tools pubsub publish dev.pr-feedback \ + "story=${STORY_NUM}, project=${project}, pr=${pr_num}, reviewer=${REVIEWER}" + + return 0 fi # Check for inline review comments (from /pulls/:pull_number/comments) @@ -117,25 +118,26 @@ check_pr_feedback() { REVIEWER=$(echo "$LATEST_REVIEW_COMMENT" | jq -r '.user.login') COMMENT_TIME=$(echo "$LATEST_REVIEW_COMMENT" | jq -r '.created_at') - # Check if already responded to this review comment - LAST_RESPONSE=$(echo "$PR_DATA" | jq -r ' + # Check if already responded - look for ANY bot comment after the inline comment time + BOT_RESPONSE_COUNT=$(echo "$PR_DATA" | jq --arg comment_time "$COMMENT_TIME" ' [.comments[] | select(.author.login == "splat-sdlc-agent[bot]") | - select(.body | contains("@'"$REVIEWER"'"))] | - sort_by(.createdAt) | - reverse | - .[0].createdAt // empty + select(.createdAt > $comment_time)] | + length ') - if [ -z "$LAST_RESPONSE" ] || [ "$COMMENT_TIME" \> "$LAST_RESPONSE" ]; then - echo "PR #${pr_num} (story #${STORY_NUM}) has ${REVIEW_COMMENTS} inline review comment(s) from @${REVIEWER}" - - # Emit event to trigger response - ralph tools pubsub publish dev.pr-feedback \ - "story=${STORY_NUM}, project=${project}, pr=${pr_num}, reviewer=${REVIEWER}" - - return 0 + if [ "$BOT_RESPONSE_COUNT" -gt 0 ]; then + echo "Already responded to inline review comment from @${REVIEWER} at ${COMMENT_TIME}" + return 1 fi + + echo "PR #${pr_num} (story #${STORY_NUM}) has ${REVIEW_COMMENTS} inline review comment(s) from @${REVIEWER}" + + # Emit event to trigger response + ralph tools pubsub publish dev.pr-feedback \ + "story=${STORY_NUM}, project=${project}, pr=${pr_num}, reviewer=${REVIEWER}" + + return 0 fi # Check for PR-level comments (questions/discussions on the PR itself) diff --git a/members/superman-atlas/knowledge/pr-monitoring-fix.md b/members/superman-atlas/knowledge/pr-monitoring-fix.md new file mode 100644 index 0000000..cfd32d4 --- /dev/null +++ b/members/superman-atlas/knowledge/pr-monitoring-fix.md @@ -0,0 +1,80 @@ +# PR Monitoring Duplicate Processing Fix + +## Problem + +The `monitor-active-prs` skill processes the same PRs multiple times even when they have no new updates. + +**Root cause:** The idempotency check on lines 88-95 of `monitor-active-prs/SKILL.md` only looks for bot comments containing the exact phrases "Feedback Addressed" or "Working on": + +```bash +LAST_RESPONSE=$(echo "$PR_DATA" | jq -r ' + [.comments[] | + select(.author.login == "splat-sdlc-agent[bot]") | + select(.body | contains("Feedback Addressed") or contains("Working on"))] | + sort_by(.createdAt) | + reverse | + .[0].createdAt // empty +') +``` + +If the bot responds with different wording, this check fails and the PR gets re-processed on every board scan cycle. + +## Solution + +Replace the phrase-specific check with a more robust check that looks for **ANY** bot comment posted after the review time: + +```bash +# Check if already responded +LAST_RESPONSE=$(echo "$PR_DATA" | jq -r ' + [.comments[] | + select(.author.login == "splat-sdlc-agent[bot]") | + select(.createdAt > "'"$REVIEW_TIME"'")] | + length +') + +if [ "$LAST_RESPONSE" -gt 0 ]; then + echo "Already responded to review from @${REVIEWER} at ${REVIEW_TIME}" + return 1 +fi +``` + +This checks if **any** bot comment exists after the review timestamp. If yes, we've already responded and should skip this PR. + +## Alternative: Track Processing State + +For more robust deduplication, maintain a state file: + +```bash +# At start of scan_all_prs +PROCESSED_STATE_FILE="team/members/superman-atlas/.processed-prs.json" +touch "$PROCESSED_STATE_FILE" + +# Before emitting dev.pr-feedback +PR_KEY="${project}:${pr_num}:${REVIEW_TIME}" + +if jq -e --arg key "$PR_KEY" '.processed | contains([$key])' "$PROCESSED_STATE_FILE" > /dev/null; then + echo "Already processed PR #${pr_num} review at ${REVIEW_TIME}" + return 1 +fi + +# After emitting dev.pr-feedback +jq --arg key "$PR_KEY" '.processed += [$key]' "$PROCESSED_STATE_FILE" > tmp && mv tmp "$PROCESSED_STATE_FILE" +``` + +This tracks `project:pr_number:review_timestamp` combinations and prevents duplicate processing even if the bot's response comment doesn't get recognized. + +## Recommendation + +Use the **first solution** (check for any bot comment after review time) as it's simpler and leverages existing comment data. Add the state file approach only if issues persist. + +## Implementation + +Edit `/home/splat/.botminter/workspaces/splat/team/coding-agent/skills/monitor-active-prs/SKILL.md`: + +1. Find the `check_pr_feedback()` function +2. Replace lines 88-95 (the LAST_RESPONSE check) +3. Update the condition on line 97 to use the new check logic + +Apply similar fixes to: +- The inline review comments check (lines 120-128) +- The PR-level comments check (if needed) From b9326a99461c957bd01648b6a86267c6b69fae4b Mon Sep 17 00:00:00 2001 From: Richard Vanderpool <49568690+rvanderp3@users.noreply.github.com> Date: Tue, 12 May 2026 14:40:53 -0400 Subject: [PATCH 4/4] fix: Monitor recently merged PRs for post-merge feedback PRs can receive important feedback even after merge (security concerns, late reviews, follow-up questions). Story status should not prevent PR monitoring. Changes: - Check both open AND recently merged PRs (last 7 days) - Story 'done' status no longer prevents PR feedback monitoring - Post-merge comments are now caught and responded to - Updated documentation to clarify this behavior This ensures critical post-merge feedback is not missed. --- .../skills/monitor-active-prs/SKILL.md | 39 +++++++++++++++---- .../knowledge/pr-monitoring-fix.md | 38 +++++++++++++++++- 2 files changed, 67 insertions(+), 10 deletions(-) diff --git a/coding-agent/skills/monitor-active-prs/SKILL.md b/coding-agent/skills/monitor-active-prs/SKILL.md index dd32054..a3f49bc 100644 --- a/coding-agent/skills/monitor-active-prs/SKILL.md +++ b/coding-agent/skills/monitor-active-prs/SKILL.md @@ -1,12 +1,12 @@ --- name: Monitor Active PRs -description: Check all active staging PRs for review comments and trigger responses +description: Check all open and recently merged staging PRs for review comments and trigger responses auto_inject: true --- # Monitor Active PRs -Scan all active staging PRs in openshift-splat-team forks for review comments and trigger appropriate responses. +Scan all open and recently merged staging PRs in openshift-splat-team forks for review comments and trigger appropriate responses. ## Purpose @@ -20,12 +20,13 @@ This skill is called by the board scanner to check PRs in parallel with issue sc ## What It Does -1. **Find Active PRs** - Lists all open PRs in staging forks +1. **Find Active PRs** - Lists all open PRs AND recently merged PRs (last 7 days) in staging forks 2. **Check for Feedback** - Looks for: - Reviews with "CHANGES_REQUESTED" state - Inline review comments (code-level feedback) - PR-level comments (general discussions) -3. **Emit Events** - Triggers dev.pr-feedback for PRs needing response + - Post-merge feedback (security concerns, late reviews, follow-up questions) +3. **Emit Events** - Triggers dev.pr-feedback for PRs needing response, regardless of story status ## Implementation @@ -42,12 +43,20 @@ PROJECTS=( "vcf-migration-operator" ) -# Check each project for open PRs +# Check each project for open AND recently merged PRs for project in "${PROJECTS[@]}"; do + # Get open PRs gh pr list \ --repo "openshift-splat-team/${project}" \ --state open \ --json number,headRefName,updatedAt,reviewDecision + + # Get recently merged PRs (last 7 days) - may still have active discussion + gh pr list \ + --repo "openshift-splat-team/${project}" \ + --state merged \ + --search "merged:>$(date -d '7 days ago' +%Y-%m-%d)" \ + --json number,headRefName,updatedAt,reviewDecision done ``` @@ -168,17 +177,28 @@ scan_all_prs() { for project in "${PROJECTS[@]}"; do # Get open PRs - PRS=$(gh pr list \ + OPEN_PRS=$(gh pr list \ --repo "openshift-splat-team/${project}" \ --state open \ --json number \ --jq '.[].number') - if [ -z "$PRS" ]; then + # Get recently merged PRs (last 7 days) - may still have active discussion + MERGED_PRS=$(gh pr list \ + --repo "openshift-splat-team/${project}" \ + --state merged \ + --search "merged:>$(date -d '7 days ago' +%Y-%m-%d)" \ + --json number \ + --jq '.[].number' 2>/dev/null || echo "") + + # Combine both lists + ALL_PRS="$OPEN_PRS $MERGED_PRS" + + if [ -z "$ALL_PRS" ]; then continue fi - for pr in $PRS; do + for pr in $ALL_PRS; do if check_pr_feedback "$project" "$pr"; then ((feedback_count++)) fi @@ -283,9 +303,12 @@ check_pr_feedback "cloud-credential-operator" 3 ## Notes - Runs automatically during board scan (if auto_inject: true) +- Checks **both open and recently merged** PRs (merged within last 7 days) +- Monitors post-merge feedback - important for catching late reviews, security concerns, etc. - Only checks openshift-splat-team/* staging forks - Does not check upstream openshift/* PRs - Idempotent - won't trigger duplicate responses - Emits events that dev_implementer can handle +- **Story status doesn't matter** - PRs are monitored regardless of whether the story is "done" This skill ensures that PR feedback is never missed and responses are timely! 🚀 diff --git a/members/superman-atlas/knowledge/pr-monitoring-fix.md b/members/superman-atlas/knowledge/pr-monitoring-fix.md index cfd32d4..97658b5 100644 --- a/members/superman-atlas/knowledge/pr-monitoring-fix.md +++ b/members/superman-atlas/knowledge/pr-monitoring-fix.md @@ -1,6 +1,6 @@ -# PR Monitoring Duplicate Processing Fix +# PR Monitoring Fixes -## Problem +## Problem 1: Duplicate Processing The `monitor-active-prs` skill processes the same PRs multiple times even when they have no new updates. @@ -78,3 +78,37 @@ Edit `/home/splat/.botminter/workspaces/splat/team/coding-agent/skills/monitor-a Apply similar fixes to: - The inline review comments check (lines 120-128) - The PR-level comments check (if needed) + +## Problem 2: Missing Post-Merge Feedback + +The skill only checked `--state open` PRs, missing feedback on recently merged PRs. + +**Impact:** +- Post-merge comments were ignored (security concerns, late reviews, etc.) +- PRs associated with "done" stories were not monitored +- Follow-up discussions after merge were missed + +## Solution 2: Monitor Recently Merged PRs + +Updated `scan_all_prs()` to also check merged PRs from the last 7 days: + +```bash +# Get recently merged PRs (last 7 days) - may still have active discussion +MERGED_PRS=$(gh pr list \ + --repo "openshift-splat-team/${project}" \ + --state merged \ + --search "merged:>$(date -d '7 days ago' +%Y-%m-%d)" \ + --json number \ + --jq '.[].number' 2>/dev/null || echo "") + +# Combine both lists +ALL_PRS="$OPEN_PRS $MERGED_PRS" +``` + +**Why 7 days?** +- Balances completeness with performance +- Most post-merge feedback arrives within a week +- Prevents scanning thousands of old PRs +- Can be adjusted if needed + +**Key principle:** Story status is independent of PR monitoring. Even if a story is marked "done", its PR should still be monitored for important feedback.