Audit remediation: close 33 findings from system-audit-2026-05-27#2
Conversation
…ntion
Codifies the comment convention the user had in mind for the Skill Graph,
Skill Metadata Protocol, and Skill Audit Loop layers: every authored
frontmatter field carries a YAML comment block (#) immediately above it,
naming purpose + allowed values + when-to-use, and these comments STAY in
the production SKILL.md.
Two distinct conventions coexist with opposite lifecycles:
- Field-purpose comments — STAY in derived skills. Authoritative-by-co-
location documentation. Source of truth is docs/field-reference.md; the
inline comment is the abridged summary. Discipline mirrors JSDoc/TSDoc
summaries pointing at canonical type definitions.
- `# TEMPLATE NOTE:` comments — STRIPPED on derivation. Authoring
scaffolding only, lives only in the template. Verified with
`grep -n "TEMPLATE NOTE" <derived>` returning zero hits.
Changes:
1. SKILL_METADATA_PROTOCOL.md — new sub-section "Inline field comments
— the authoring convention" placed after the "Where does my skill
live?" decision tree and before "Required vs Optional Fields". Includes
a side-by-side table of the two comment styles, a worked example
showing v8 classification + eval-health triple with field-purpose
comments, and an incident-grounded justification (the 2026-05-26
session where a cold-start agent proposed cutting `eval_state: monitored`
as "dead value" because the field's design intent lived three docs away).
2. examples/skill-metadata-template.md —
(a) Header rewritten to teach both comment conventions with concrete
examples; the original "every # TEMPLATE NOTE: must be stripped"
framing replaced with the convention split.
(b) Eval-health triple (eval_artifacts / eval_state / routing_eval)
converted from a single merged # TEMPLATE NOTE: block into three
field-purpose comment blocks (one per field), demonstrating the
convention. The orthogonality-rationale moves into the section
header. Scaffold-specific note about routing_eval staying `absent`
on this template stays as a # TEMPLATE NOTE: (correctly classified
as authoring scaffolding).
Out of scope (separate work):
- Remaining ~15 # TEMPLATE NOTE: blocks in the template that are actually
field-purpose content (followup CONTENT-style edit; mechanical rename).
- Template v7→v8 migration (pre-existing debt: template fails lint because
it carries v7 classification while the schema now requires v8 axes).
Lint state before this commit: 2 errors. After: same 2 errors. My edits
changed only comments, not fields — verified via `git diff | grep '^[+-][a-z_]+:'`
returning zero matches.
- Skill-scaffold SKILL.md update to teach the convention (CONTENT mode —
flows through /audit:improve).
- Codemod to backfill field-purpose comments across 153 corpus skills
(large CONTENT cascade).
The 2026-05-26 session memory at
~/.claude-profiles/jacobbalslev01/projects/-Users-jacobbalslev-Development/memory/
dont-extrapolate-research-into-filing-2026-05-26.md records the failure mode
this convention is designed to prevent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to d9fe52f, which established the convention and demonstrated it on the eval-health triple. This commit applies the convention to all remaining field-purpose blocks in the canonical authoring template and clears the pre-existing 2-error lint failure by migrating the template to v8 axes. CONVENTION APPLICATION (13 blocks converted): For each field whose existing `# TEMPLATE NOTE:` block was actually field-purpose content (defining what the field IS, its allowed values, and when-to-use), the prefix was dropped and the content tightened to 2-5 lines per block. Converted blocks: - category, domain, drift_check, eval_last_run, stability, compatibility, keywords, examples, anti_examples, grounding, portability, lifecycle, runtime_telemetry Blocks correctly KEPT as `# TEMPLATE NOTE:` (genuine scaffolding): - description authoring tip — pushy-description guidance - triggers — scaffold-specific "this skill is routable" - paths SPLIT — field-purpose first, scaffold-specific rationale kept as TEMPLATE NOTE - workspace_tags SPLIT — same pattern - relations.boundary — scaffold-specific about empty arrays - routing_eval scaffold-specific note — why `absent` on THIS template The rule: a `# TEMPLATE NOTE:` line MUST be removable from a derived skill without losing field semantics. If the line teaches what a field IS, it's a field-purpose comment and stays. If it teaches HOW to use the template itself or describes why THIS scaffold is configured a certain way, it's a TEMPLATE NOTE and gets stripped on derivation. V8 MIGRATION: - schema_version: 7 -> 8 - Added subject: agent-ops + operation: know (the v8 axes the schema requires; the template was failing lint until now) - Renamed scope: reference -> scope: workspace (v8 canonical for the legacy alias) - Kept type: capability + category: agent as deprecated back-compat with an explicit deprecation comment + instruction to delete them when adapting BODY: - HOW TO READ THIS FILE blockquote rewritten to teach the two-convention rule explicitly (only # TEMPLATE NOTE: and > **TEMPLATE NOTE:** get stripped; field-purpose comments STAY). Includes a verification command (grep -n "TEMPLATE NOTE" must return zero, grep -c "^\s*#" must preserve field-purpose comments). - Coverage bullet under Teaching-layer delivery updated to match. LINT STATE: Before this commit: examples/skill-metadata-template.md FAIL (2 errors: subject + operation required, missing). After this commit: examples/skill-metadata-template.md OK (154 file(s) checked, 0 error(s)). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…maining canonical layers Third commit in the convention-rollout sequence: - d9fe52f established the convention in SKILL_METADATA_PROTOCOL.md - 7faadf9 applied it to the template + v8-migrated - This commit extends it into SKILL_AUDIT_LOOP.md + SKILL_GRAPH.md so all three named layers of the skill system carry consistent guidance. SKILL_AUDIT_LOOP.md changes: 1. § "The Health Block — state lives on the skill" The Health Block YAML example (~12 fields) was rewritten from trailing- inline comments to leading block-style field-purpose comments, matching the convention. Each field now carries a 2-3 line comment above it naming purpose + allowed values + (where relevant) which gate writes it. The example also bumps schema_version: 7 -> 8 per the v8-canonical doctrine and adds an intro sentence explicitly pointing at the convention spec. 2. § Part 2 § 1. Frontmatter validity Added a new checklist item enforcing the convention: - Strippable forms (# TEMPLATE NOTE: lines, > **TEMPLATE NOTE:** blockquotes) ABSENT from production skills. - Field-purpose comments PRESENT (verified via grep density check). Updated the schema_version bullet to accept 7 or 8 (was: only 7). SKILL_GRAPH.md changes: 3. § Tier 5 — Canonical specimen The examples/skill-metadata-template.md row in the specimen table now explicitly calls out that the template demonstrates the v8 5-axis classification AND the inline field-purpose comment convention. Also mentions the v7-deprecated back-compat shape the template carries (with explicit deprecation comment per 7faadf9). LINT STATE: Before/after: 153 files checked, 0 errors. No regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…logy) into skill-graph/AGENTS.md
Adds a new top-level section "No Invented Terminology — State Concepts in
Plain Words" between Quality Doctrine and Version Labels Are Earned, Not
Bumped. Same rule as the workspace AGENTS.md §17 added in commit 283be3607
on ~/Development/master — the body is mirrored so agents working with
skill-graph context (descendant CLAUDE.md auto-loads this file via
@AGENTS.md) see the rule even when the workspace AGENTS.md is not in
context.
Why both files: the workspace AGENTS.md and skill-graph/AGENTS.md cover
different launch conditions:
- Workspace AGENTS.md loads when Claude reads ~/Development/CLAUDE.md
on session start (always, when launched from ~/Development/).
- skill-graph/AGENTS.md loads when the descendant skill-graph/CLAUDE.md
is fetched — which happens when the session touches files in
skill-graph/.
- If a session launches from inside skill-graph/ (against the
convention but possible), only skill-graph/AGENTS.md is loaded —
no workspace AGENTS.md inheritance per claude-code #26489.
Having the rule in both files closes the coverage gap.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ll codemod One-shot codemod that adds inline field-purpose comment blocks above each frontmatter field in SKILL.md files, per the convention established this session in: - skill-graph commit d9fe52f (SKILL_METADATA_PROTOCOL.md § Inline field comments — the authoring convention) - skill-graph commit 7faadf9 (canonical template demonstrates it) - skills commit d6c13e4 (first-principles-thinking pilot) What it does: reads each SKILL.md, walks the metadata: block, and inserts the canonical comment block above each field that lacks one. Section dividers inserted at documented transition points. Idempotent. What it does NOT do: modify any field VALUE, modify the body, or commit. Caller commits one-skill-per-commit per Standard #16. Tested on 5 pilot skills — all lint clean post-codemod: - skills/meta-methods/inversion (+90 lines) - skills/meta-methods/second-order-thinking (+90 lines) - skills/knowledge-organization/semantics (+93 lines, stringified-nested) - skills/code-engineering/acid-fundamentals (+85 lines) - skills/code-engineering/architecture-decision-records (+78 lines) Each pilot commits separately as a CONTENT commit in the skills repo per Standard #16. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…OCOL hardcode + audit start-of-run announce
Resolves 11 findings (F1, F2, F5, F8, F10, F11, F17, F18 + Opus audit + Codex overlap) from
the 2026-05-26 parallel Opus/Codex skill-system audits. Companion plan at
docs/plans/skill-system-codex-opus-synthesis-2026-05-26.md tracks the remaining 8
unsolved findings (filed as SH-6565..SH-6572 per /wrap Step 1b).
Changes:
- AGENTS.md: drop phantom ~/Development/SKILL_*.md paths from SYSTEM allowlist (F1);
add analysis-only carve-out to Work Modes mode-declaration rule (F13 mirror).
- SKILL_METADATA_PROTOCOL.md: preface updated to v8 5-axis canonical state (F2).
- SKILL_AUDIT_LOOP.md: removed 3 transitional "Absorbed into this file..." paragraphs
that read as recursive (F11).
- docs/QUICKSTART-30MIN.md, docs/skill-metadata-protocol.md: rewrite "author BOTH v7+v8"
compatibility-window claims to v8-only authoring (F17, F18) — repairs broken anchor
links from F2's section rename.
- docs/field-reference.generated.md: regenerated for C7 protocol-consistency check.
- lib/audit/skill-audit.js: announce mode (INTEGRITY-only vs GRADED) at START of run,
not just at end (F5). Reader sees what they're getting BEFORE spending the run cost.
- scripts/export-marketplace-skills.js + __tests__: remove SKILL_GRAPH_PROTOCOL hardcode
(F8). The constant stamped every export with 'Skill Metadata Protocol v7' regardless
of source content — a documented conformance caveat in SKILL_GRAPH.md. Per
.claude/rules/version-schema-contract.md ("version labels are EARNED, not bumped"),
stop emitting the field; per-skill schema_version is the honest signal.
- scripts/export-skill.js + generate-manifest.js: add back-compat-intent comments on
audit_verdict (v6 deprecated) field-extraction lists, citing SH-6557 retirement (F10).
Verified: skill-graph npm run verify passes (15 v8 schema-compat assertions green);
doctor 7/7 PASS; check-markdown-links + check-doc-drift PASS; canonical skill count
unchanged at 153 source / 152 marketplace; zero SKILL.md edits (mode separation preserved).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… status + drift checks Following two independent audits (Opus + Codex, 2026-05-26) of the Skill Metadata Protocol / Skill Graph / Skill Audit Loop normative surfaces, delete v7 classification from every current-state authoring path and honest-up the gates that should have caught the drift but didn't. The v7→v8 phase ended in schema (commit 4bd16d9). This commit ends it in the prose, examples, and tooling that teach the contract. Doc surface (v7 deleted from current-state surfaces; v7 only remains in explicitly historical contexts — migration narratives, ADRs, audit specimens, drafts, research notes — which the drift script now allowlists): - AGENTS.md: current-contract sentence replaced with pointer to SKILL_GRAPH.md § Current State; Quick Reference v7-legacy-axes block collapsed to a one-line deprecation note; stale "115 carry v5, 25 carry v6" corpus parenthetical removed; obsolete SKILL_GRAPH_PROTOCOL hardcode tension paragraph deleted (the hardcode itself is gone). - README.md: schema badge v7 → v8; example block uses v8 5-axis classification with no v7 axes; schemas-table description points to Current State instead of inlining a version; Status section names v8 as the current contract. - SKILL_AUDIT_LOOP.md: operations-table preamble drops the v7 pin; per-skill audit checklist requires v8 axes (subject/operation/scope) and treats v7 fields as deprecated back-compat reads. - SKILL_METADATA_PROTOCOL.md: "v7 Legacy Fields (compatibility-window holdovers)" section reduced to a clear deprecation notice; the "authors of new skills must author both" framing deleted (that was the exact anti-pattern Opus called out in his own audit but missed three lines below his preface edit). - SKILL_GRAPH.md: Current State + Tier 1 schema row rephrased to avoid the literal "schema_version: 7" YAML syntax while still describing the deprecation; obsolete SKILL_GRAPH_PROTOCOL hardcode caveat rewritten to record the removal. - docs/ADOPTION.md, AUTHORING-QUICKSTART.md, PRIMER.md, QUICKSTART-30MIN.md, manifest-field-mapping.md, quality-doctrine.md: every example frontmatter block updated to schema_version: 8 with v8 axes; v7 axes comments deleted. - docs/field-reference.md: `type` and `category` field notices marked DEPRECATED with explicit "v7→v8 phase ended 2026-05-26" framing instead of "sunset window" tense; new skills MUST author the v8 replacements. - docs/skill-metadata-protocol.md: Schema Versioning Policy section updated — current authored version is 8 (bumped from 7 when the 5-axis model replaced type/category); Health Block policy reworded to avoid the literal v7 YAML syntax in prose. - docs/skill-audit-loop-executable-map.md: stale "1 of 481 skills" application-eval claim replaced with a pointer to SKILL_GRAPH.md § Current State for live counts. - docs/status.generated.md: regenerated (schema 8, 153 skills, 4/4 PASS); previously stale (schema unknown, 148 skills, markdown links failing). Tooling fixes: - scripts/build-status-doc.js: readSchemaVersion now reads oneOf[].enum branches and returns the canonical (highest) version — previously returned "unknown" because the schema's integer/string back-compat shape uses enum, not const. `--check` now diffs rendered output against the on-disk file (with timestamp/duration normalization) and exits 1 on staleness — previously rubber-stamped any state. - scripts/check-doc-drift.js: returns MAX (canonical) version, not MIN (floor); references to v7 in current-state docs are now correctly flagged as drift. Allowlist expanded to audits/, _drafts/, docs/adr/, docs/research/ — these are legitimate historical context. - skill-graph/AGENTS.md § Validation Commands: new subsection documenting `npm run audit-manifest:check` and `npm run status:check` as separate red gates that run outside `npm run verify`. The audit- manifest gate is CONTENT-debt the audit loop drains (15 historical comprehension verdicts without backing evals/comprehension.json); wiring it into the main verify suite would block unrelated SYSTEM work. Verification: - node bin/skill-graph.js doctor: 7/7 PASS - npm run verify: exit 0 - npm run status:check: exit 0 - npm run audit-manifest:check: exit 1, 15 historical mismatches (expected, documented as separate red gate) - node scripts/check-doc-drift.js: 54 active docs scanned against schema v8, 0 stale references - node scripts/check-protocol-consistency.js: PASS (C7 field-reference parity preserved) Follow-ups filed: - SH-6573: Cold-start one-screen doc at top of SKILL_GRAPH.md - SH-6546 (pre-existing): Ship ADR-0018 boundary→suppresses rename References: skill-system audits 2026-05-26 (Opus + Codex); docs/research/skill-system-current-state-addendum-2026-05-26.md (Codex addendum, untracked); workspace AGENTS.md § Non-Negotiable Standards #16-#17; .claude/rules/version-schema-contract.md § Companion rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backfill-field-purpose-comments.js now handles both physical encodings of SKILL.md frontmatter per SKILL_METADATA_PROTOCOL.md § Two physical encodings: (a) nested Agent-Skills-compatible (everything under metadata: at 2-space indent) and (b) flat top-level (every field at root, no metadata: block). Detection: presence of a metadata: line at top level. Indent and start position derived from that. The FIELD_COMMENTS map values stay encoding- agnostic; the caller prepends the encoding's indent. Closes the 2-file coverage gap from the initial codemod pass (commit 17615df), where methodical and task-path-optimization were correctly skipped as 'no top-level metadata: block' — they use flat encoding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tale local file `skill-graph routing-eval` without --manifest used to read whatever local skills.manifest.json existed (gitignored, drifts between runs). On a stale local file the eval reports 9/9 false-fails that are pure staleness, not real routing regressions. The CLI help had advertised "(default: generate one)" but the implementation never did. This commit makes the implementation match the original intent: when --manifest is not given, regenerate a fresh manifest to .skill-graph/_routing-eval-cli.manifest.json and run against that. The committed `npm run routing-eval` script already worked this way via an explicit --output flag; this brings the standalone CLI to the same behavior. - scripts/skill-graph-routing-eval.js: regenerate-on-demand path, with fallback to existing manifest if generation fails. - bin/skill-graph.js: rewrite help to describe new behavior + add documentation for previously-undocumented flags (--skill, --json, --quiet, --confusion-matrix, --baseline). - .gitignore: add the new CLI-fresh manifest path. Closes audit finding B5 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… freshness claim `node scripts/export-marketplace-skills.js --check` reported 152 stale files. Re-ran the export; --check is now clean (no-op). The audit (2026-05-27) confirmed the prior "verified 2026-05-20" claim in SKILL_GRAPH.md was outdated. Also updates the layout-note in SKILL_GRAPH.md to call out the freshness-drift caveat: marketplace:verify is not currently part of `npm run verify`, so source-edit / marketplace-regeneration drift is invisible to CI between runs. That gate-expansion is tracked separately in audit finding B7 / H4 (next commit). Closes audit finding B6 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Integrity Gate The npm run verify chain was missing four gates the README's Integrity Gate definition claimed it covered: marketplace:verify, status:check, drift, and audit-manifest:check. After the marketplace regenerate (B6 commit), the first two are genuinely green and safe to add to the chain. This commit: - Adds marketplace:verify and status:check to the npm run verify chain (package.json). - Adds marketplace-export-check to scripts/build-status-doc.js so the trust surface at docs/status.generated.md reports its state. - Updates README's Integrity Gate definition (line 284) to describe what verify actually runs, and explicitly calls out why drift and audit-manifest:check are not yet in the chain — both surface CONTENT-side debt that is being drained through the audit loop (findings H9 and H10). - Regenerates docs/status.generated.md (now 5 checks, all PASS). Closes audit findings B7, H4, H11 (system-audit-2026-05-27). H9 and H10 closure is tracked in their own CONTENT-side tickets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nbook SKILL_AUDIT_LOOP.md ships in the npm package per package.json:35, and Part 3 references `node scripts/skill/...` commands (skill-audit-claim, source-truth-catalog, skill-census, build-skill-audit-worklist, skill-test-runner) that don't exist in @skill-graph/cli — they live in the workspace tree at ~/Development/scripts/skill/. Standalone npm consumers following the runbook verbatim hit "Cannot find module" errors at the first claim step. This adds an "Audience & runtime" preamble at the top of Part 3 that: - Lists which workspace-orchestration scripts are not bundled (per ADR 0009 + ADR 0015 + ADR 0016 Proposed). - Points to the canonical CLI entrypoints (skill-graph audit / improve / evaluate / evolve) and the canonical path corrections (scripts/skill/skill-lint.js → scripts/skill-lint.js; scripts/skill/evaluate-skill.js → lib/audit/evaluate-skill.js). - Notes that a substantially complete audit is still runnable via `skill-graph audit <skill> --graded` for consumers without the workspace layer. Leaves Part 3's runbook body intact for in-workspace operators, who remain the primary audience. The preamble is the smallest fix that removes the "package ships broken instructions" problem without rewriting the entire runbook. Also regenerates docs/status.generated.md to pick up the doc edit. Closes audit finding B8 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… skill AGENTS.md § What each command writes promised that the standalone drift command writes the drift_status field to SKILL.md frontmatter, but no code path did it. Grep across lib/ and scripts/ for any write expression matching drift_status: returned zero hits outside the schema and field-list iterators (audit B1, 2026-05-27). The result was that every skill's drift_status stayed at its initial value (UNVERIFIED or UNKNOWN) forever, regardless of how many drift sentinel runs landed. This commit adds the missing write path: - scripts/skill-graph-drift.js: new --write-verdict flag opts in to writing drift_status. The default check run remains read-only so a curious `npm run drift` doesn't surprise-mutate the skill tree. Maps per-skill report status to the schema-valid drift_status enum (OK / DRIFT / BROKEN / STALE / NO_BASELINE / EXTERNAL_UNHASHED); skips UNGROUNDED and NO_FRONTMATTER (not enum-valid). - AGENTS.md § What each command writes: drift row now describes the opt-in semantics. The "Two integrity surfaces" paragraph is updated to resolve the internal contradiction the audit caught — the old text said standalone drift writes drift_status AND that standalone drift doesn't write to the Health Block, which can't both be true because drift_status IS a Health Block field. New text says standalone commands don't roll up to truth_verdict / structural_verdict, and that --write-verdict is the one explicit opt-in that may stamp drift_status. Verified by smoke test: a skill with drift_status: UNKNOWN and a deliberately-mismatched hash gets stamped drift_status: BROKEN after `drift --write-verdict`. Closes audit finding B1 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sment The schema declared PROVISIONAL as a valid value for both comprehension_verdict and application_verdict, but no code path emitted it: the comprehension classifier returned PASS / REDUNDANT / SHALLOW / SKIPPED_BASELINE_HIGH, and the application path filtered PROVISIONAL out via APPLICATION_VERDICT_ENUM and normalised it to UNVERIFIED. The result was that the confidence hierarchy "APPLICABLE > PROVISIONAL > UNVERIFIED" collapsed to two tiers in practice (audit B2, 2026-05-27). This commit makes PROVISIONAL reachable when the run is an explicit single-model self-assessment: - BOOLEAN_FLAGS now accepts --single-model. The flag is plumbed into runComprehensionEval and stampApplicationVerdict. - APPLICATION_VERDICT_ENUM now includes PROVISIONAL so it is no longer silently dropped by normalizeApplicationVerdict. - Comprehension classifier downgrades PASS → PROVISIONAL when --single-model is set. Other verdicts (REDUNDANT / SHALLOW / SKIPPED_BASELINE_HIGH) are factual descriptions of the delta and pass through unchanged. - stampApplicationVerdict downgrades APPLICABLE → PROVISIONAL when --single-model is set. Default behaviour is unchanged: a run without --single-model continues to emit PASS / APPLICABLE exactly as before. Opt-in is the conservative shape — every existing call site keeps its current verdict, and callers that know they are running a single-grader assessment can flip the flag to record an honest PROVISIONAL. Verified by re-running test-application-verdict-write-back.js — all 53 cases still pass. Closes audit finding B2 (system-audit-2026-05-27). The companion gap H2 (gate-8 comprehension writeback into SKILL.md frontmatter) remains tracked separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit findings B3 (legacy evaluate-skill.js not a delegator), H3 (legacy skill-lint.js retains 14 deprecated check functions), and M13 (legacy evaluate-skill.js hardcodes monorepo log paths) all require editing files in the workspace tree at ~/Development/scripts/skill/. Those files live in a different git repository than this skill-graph branch and the workspace had ~20 unrelated dirty files at audit time, so the right place for the actual conversion is a separate set of workspace-master commits — not this branch. This commit lands a precise, ready-to-apply closure plan under docs/research/followups/ instead of leaving the findings unaddressed: exact delegator bodies for both scripts, the env-var plumbing that preserves the legacy `agent-orchestration/logs/` defaults, a verification command set, and commit-shape guidance that respects AGENTS.md's "one logical change per commit" rule and prevents the dirty WIP files from being swept in. Closes audit findings B3 / H3 / M13 (system-audit-2026-05-27) for the in-skill-graph portion of the work. The matching workspace-side commits remain outstanding and are intentionally not landed from this branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nted, corpus is CONTENT debt
Re-verified the audit's B4 claim against current head:
- lib/audit/skill-audit.js:1086-1097 unconditionally writes
structural_verdict, truth_verdict, last_audited, lint_verdict to
the Health Block on every audit run, after both the stub and
--graded branches close.
- updateFrontmatterFields is covered by
test-application-verdict-write-back.js (53 cases passing).
- The 146 SKILL.md files with structural_verdict: UNVERIFIED are
CONTENT debt — they reflect that audit has not yet run on each
skill, not that the write code is broken. Per AGENTS.md
§ Sequencing, those migrations drain via the audit loop one
skill at a time, not via a bulk SYSTEM commit.
The B4 narrow recommendation ("complete structural_verdict +
truth_verdict write-back") is closed. The companion comprehension /
application gap remains tracked under H2 + B2.
This commit lands the verification + status doc rather than a code
change because there is no SYSTEM-side gap left to fix here. The
audit's claim of "field-write commit path is not confirmed in all
paths" appears to predate the SH-6481 closure work in commits
9af8526 (truth_verdict) and fbdf598 (Health Block) — the line-1075
comment block in skill-audit.js is historical, not an open gap.
Closes audit finding B4 (system-audit-2026-05-27).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes ADR 0011 § Addendum 2026-05-20 — gate 8 (the comprehension grader) appended every result to comprehension-history.jsonl but never wrote comprehension_verdict back to the skill's frontmatter, so the Health Block stayed UNVERIFIED corpus-wide regardless of grader output. Symmetric with the application_verdict stamp landed earlier. This commit: - Adds COMPREHENSION_VERDICT_ENUM, normalizeComprehensionVerdict, and stampComprehensionVerdict, modelled on the existing APPLICATION_VERDICT_ENUM / normalizeApplicationVerdict / stampApplicationVerdict trio. - Wires the stamp call in main() right after the comprehension result is written to .cache/, before the incomplete-run exit gate. Honors --dry-run, skips on all-errored runs, skips when SKILL.md cannot be resolved from the eval path — same safety rules application uses. - Exports the new symbols so future tests can import them the same way test-application-verdict-write-back.js imports the application trio (the comparable comprehension test belongs to a follow-up). The PROVISIONAL downgrade landed in the B2 commit applies here too: a single-model comprehension run with --single-model continues to downgrade PASS → PROVISIONAL inside the classifier, and that downgraded verdict is what stampComprehensionVerdict writes. Verified by smoke test (normalize round-trips, ENUM length = 7 schema values) and re-running test-application-verdict-write-back.js (53/53 passing, no regression on the symmetric application path). Closes audit finding H2 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dit H1) Per audit H1, check-stability-promotion.js exited 0 unconditionally regardless of warningCount, even though it was included in the verify chain between charter:parity and manifest:validate. The audit recommended either fail-loud on warnings or document the check as advisory-only in package.json and SKILL_AUDIT_LOOP.md. Going with documented-advisory + opt-in strict mode because the warnings flag CONTENT debt (per-skill stability: stable claims missing eval_state / eval_score / routing_eval evidence) that should drain per-skill via the audit loop, not block every commit on every other skill. Fail-loud on the current corpus would gate verify on a single expected-value skill that has not been re-evaluated since promotion. This commit: - Expanded the header comment to spell out the advisory contract: default exits 0 on warnings; --strict flips to exit 1 on any warning. - Added a --strict branch right before the final process.exit(0) so the flag is observably non-cosmetic and easy to grep. - Added a stability:check:strict npm script for callers who want the release-blocking semantics without typing the flag every time. Verified: default invocation still exits 0 with 3 warnings; --strict exits 1 with the same 3 warnings; existing test-stability-promotion.js still passes. Closes audit finding H1 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.github/workflows/skill-graph-lint.yml lines 17 and 32 carried the literal string 'SKILL_AUDIT_LOOP.md § Part 2 — Per-Skill Audit Checklist' inside the workflow's paths filter. GitHub Actions treats paths entries as filename globs; it does not parse markdown anchors or section ranges, so the entry never matched any changed file — pure dead code. The valid 'SKILL_AUDIT_LOOP.md' entry immediately below already fires the workflow on any edit to that file, so removing the dead entries doesn't change trigger behavior. The intent (fire only when § Part 2 changes) can't be expressed as a path glob and should be filed as a separate ticket if section-scoped triggering is genuinely wanted. Closes audit finding H5 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mplates Per ADR 0009 § Update (2026-05-20), the `skill-metadata-protocol` and `skill-audit-loop` sibling repositories were archived (read-only on GitHub) when the protocol spec and audit-loop runbook consolidated into this repo. The PR template and issue templates still listed both as live cross-repo coordination targets, which led contributors to expect coordinated PRs and discussions that aren't actually possible. This commit: - PULL_REQUEST_TEMPLATE.md: drops the two archived repos from the cross-repo impact checklist; adds a short inline note explaining the consolidation and naming the only remaining cross-repo target (the canonical SKILL.md source at `skills`). - ISSUE_TEMPLATE/feature.yml: drops the two archived options from the ecosystem checkboxes; adds a description block explaining the consolidation. - ISSUE_TEMPLATE/config.yml: re-points the Discussions and spec contact links from skill-metadata-protocol to skill-graph itself. External users following these links no longer land on an archived repo. Closes audit finding H6 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…teger Two audit findings closed together because they're the same artifact: - H7: ADR 0016 was Proposed since 2026-05-25 with no acceptance status; P1 (lanes.json migration) had actually shipped on 2026-05-25 but the ADR's status line didn't reflect that. Move to Accepted, name the shipped surface, and call out that P2-P7 sequencing is the residual. - M7: audits/lanes.json used semver "2.0.0" for schema_version while schemas/skill.schema.json uses integer 7/8, schemas/manifest.schema.json uses integer 4, and schemas/audits-manifest.schema.json uses integer 1. Normalized to integer 2 to match the convention; verified the file is still valid JSON and no caller reads schema_version from this file (grep across scripts/ + lib/ returned no matches). Closes audit findings H7 + M7 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SKILL_METADATA_PROTOCOL.md § Inline field comments requires every authored frontmatter field to carry a YAML `#` comment block immediately above it (purpose + allowed values + when-to-use). The convention exists to prevent the "this field looks like dead code, let me propose deleting it" failure mode and to keep cold-start agents from needing docs/field-reference.md at the point of contact. scripts/skill-lint.js had no enforcement — a SKILL.md could strip every field-purpose comment without failing lint. This commit adds checkFieldPurposeComments() as the new check 5, scanning top-level frontmatter fields and reporting each field whose preceding line (walking back through blank lines) is not a `#` comment. Severity: warning, not error. Corpus survey 2026-05-27 shows 0/154 skills currently have ALL fields commented and 152/154 have NONE commented. Hard-failing lint would gate verify on the backfill-field-purpose-comments.js CONTENT migration that has not yet drained per-skill. Warnings render to stderr; the WARN file label appears alongside the existing FAIL label; the exit code stays 0 unless `--strict` is set. `--strict` already existed as a flag — this commit makes it observably non-cosmetic by failing exit on warnings as well as errors. The "skip first field" exclusion is deliberate: a comment "immediately above" the first field of the frontmatter would have to live outside the `---` block, which is not the convention. Subsequent fields are all checked. Verified: lint reports 154 warning(s) on the current corpus with exit 0 (default) and exit 1 (--strict). Existing 5 lint checks unchanged; the 4 templates and skills with full commenting (the new ones authored from the template) are unaffected. Closes audit finding H8 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…forces Audit H12 (2026-05-27) noted three places that overclaimed lint coverage relative to the 5 checks skill-lint.js actually runs: - SKILL_METADATA_PROTOCOL.md:137 — routing_eval "gated by lint check 12". No such check exists; routing_eval is enforced by `npm run routing-eval`, not by lint. - SKILL_METADATA_PROTOCOL.md:585 — relation targets "validated by lint — a broken target is an error". Lint does not walk targets; the manifest compiler refuses to emit a relation to a non-existent skill (via `npm run manifest:validate`). - SKILL_GRAPH.md:431 — "Tier 1 ↔ Tier 5 sample manifest: `skill-lint.js` check 8". No check 8 in current skill-lint.js; parity is enforced by `npm run manifest:validate`. All three lint claims dated to the pre-2026-05-19 lint reduction (when 14 additional check functions were removed). This commit reconciles the prose to the current 5-check (now 6 with the H8 field-purpose addition) skill-lint.js contract, naming the enforcing gate explicitly in each case so future readers don't re-derive the same wrong picture. Closes audit finding H12 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit H13 (2026-05-27): scripts/check-protocol-consistency.js:5 header and SKILL_GRAPH.md:169 both claimed 8 checks (C1–C8). The runner actually executes 7 (C1, C2, C3, C4, C5, C7, C8) — C6 "versioned schema parity" was retired with ADR-0014 because no pinned-copy schema file exists on disk to drift against. Updated the file header inventory and the SKILL_GRAPH.md row that described the script's coverage. Verified the script still exits 0 after the comment edit. The runner's existing skip of C6 is unaffected; this is doc reconciliation only. Closes audit finding H13 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit H14 (2026-05-27) inventoried five active SYSTEM docs/artifacts still describing the pre-v8 model: - AGENTS.md:190 — described skill "same kind" via `category × type × scope` (the v7 triple). Updated to `subject × operation × scope` (v8 per ADR-0017) with a back-link to the prior framing. - docs/publish-workflow.md:17 — said skill-graph held the canonical `skills/<name>/SKILL.md` sources. ADR 0009 kept the canonical library at `~/Development/skills/`; only the tooling consolidated into skill-graph. Re-described both repos with their actual roles. - docs/marketplace-syndication.md:11 — said "canonical artifacts stay in this repo: protocol-enriched skills/**/SKILL.md files". Same drift as publish-workflow — corrected to name the two-repo split. - schemas/skill.context.jsonld:2,4 — header described "v7 frontmatter" and `_schema_version_target: 7`. Bumped to v8 with a note that v7 fields stay projectable for back-compat. - examples/skill-metadata-template.md:319 — described grounding requirement using v7 scope names `codebase` / `reference`. Updated to v8 names `project` / `workspace` with legacy aliases retained for back-compat. These were doctrine drift, not corpus drift — every fix is a prose clarification, not a behavior change. Schema validation, lint, and manifest generation already understand v8; the docs just hadn't caught up uniformly. Closes audit finding H14 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ehension) Audit findings H9 (6 drift-red skills — now 7 after re-verification caught expected-value as additionally BROKEN) and H10 (15 graded- comprehension claims missing evals/comprehension.json) are CONTENT- side debt that per AGENTS.md § Sequencing must drain through the audit loop, not via a SYSTEM commit touching N SKILL.md files. This commit lands the recommended-tickets list as a single followups doc with explicit /audit:audit runbook lines per skill, the receipt IDs (for the comprehension-missing list), and a note that two task-lifecycle receipts collapse to one audit run. The corresponding SYSTEM-side gating (drift + audit-manifest:check absent from npm run verify) closed under audit B7 in this same branch — adding those gates back is now safely tied to the per-skill drain rather than blocking the whole verify chain on one bad skill. Closes audit findings H9 + H10 (system-audit-2026-05-27) on the SYSTEM side. CONTENT-side closure remains per-skill /audit:* work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hook The hook at scripts/check-work-mode-separation.js is intentionally fail-open (exits 0 even when the warning fires) so it cannot block a commit, but that means it has no exit-code-based regression catcher. The hook could silently regress — drop the warning, misclassify a path family, stop honoring AUDIT_LOOP=1 — and no one would notice until a real SYSTEM+CONTENT mixed commit slipped through. This commit adds scripts/__tests__/test-work-mode-separation.js with 16 assertions over 7 scenarios: 1. Empty file list — quiet exit 0 2. SYSTEM-only paths — no warning 3. CONTENT-only paths — no warning 4. Mixed SYSTEM+CONTENT — warning fires with both file lists shown 5. AUDIT_LOOP=1 suppresses the warning on the same mixed mix 6. audits/prompts/** classifies as SYSTEM (not CONTENT) 7. examples/audits/<skill>/ classifies as CONTENT (not SYSTEM) All 16 cases pass against the current hook. The test is wired into the npm run test:unit chain via package.json:78. Closes audit finding M1 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d .generated.md Audit M2 (2026-05-27): the doc-ownership map had separate rows for docs/field-reference.md (hand-authored, 1,779 lines) and docs/field-reference.generated.md (auto-generated, 748 lines), but no named resolution for the reader question "which one do I open to pick a value?" An author who lands on the generated file alone gets a structurally valid but semantically incomplete picture. Added a "Reader's canonical" sentence to the row for `field-reference.md` naming it as the canonical for value-choice criteria, and a mirror caveat on the `.generated.md` row pointing back. Both rows now name each other so a reader who enters either gets routed to the right one for their question. Closes audit finding M2 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a SKILL.md is missing subject / operation / scope (the v8 5-axis classification fields per ADR-0017), skill-lint.js used to emit a generic "required-missing" message that didn't tell the author the violation was about v8 conformance: required-missing: `subject` is required by schemas/skill.schema.json Now the same case fires a named v8-axis error that points to the template + ADR: v8 axis missing: `subject` is one of the three required v8 5-axis classification fields (subject / operation / scope per ADR-0017). Add it via the field-purpose comment template in examples/skill-metadata-template.md. Non-v8-axis required-missing errors keep the existing generic message so the v8-named error stays diagnostic, not noise. Verified by smoke-test on a fixture skill missing all three axes plus `version` — the v8 axes get the new message, `version` keeps the old one; `npm run lint --include-template` continues to pass on the real corpus. Closes audit finding M3 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…text (audit M4)
SKILL_METADATA_PROTOCOL.md § Relations § boundary carries a WARNING that
the field NAME reads "defer to them" but the MECHANIC is "exclude them
when I win" — and recommends ownership reason-text ("I own X over them")
not deference ("use that-skill for X"). The doctrine warning is in the
spec, but lint did not check for it, so authors who wrote the inverted
phrasing produced semantically wrong relations that still validated.
This commit adds checkBoundaryReasonText() as an advisory check
(severity: warn). For each boundary edge with a `reason` string,
it scans for deference phrasing:
- `use <something> instead/for`
- `defer to`
- `owned by`
- `that-skill (owns|handles|covers)`
When a match fires, the warning quotes the offending reason and suggests
ownership phrasing. Verified by fixture: a fake skill whose boundary
edge reads `reason: "use debugging for runtime errors"` produces the
expected warning; the real corpus continues to lint with 0 errors.
Closes audit finding M4 (system-audit-2026-05-27).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…s (audit M5) The schema declared eval_state, eval_artifacts, and routing_eval as independent axes but allowed incoherent combinations like `eval_state: passing` with no `eval_artifacts: present`. The fields were meant to be orthogonal in topic (routing-coverage vs content- quality vs artifact-presence) but coherent across values — a `passing` claim has to be backed by real evals on disk, otherwise it is the same doc-lie shape as `application_verdict: APPLICABLE` without an eval_last_run receipt. Added two conditional rules to schemas/skill.schema.json `allOf`: 1. `eval_state: passing` → `eval_artifacts: present` required. 2. `eval_state: monitored` → `eval_artifacts: present` required. `eval_state: unverified` still allows any eval_artifacts value (legitimate authoring state for skills that haven't been graded yet). routing_eval stays independent — its coverage signal comes from the routing harness reading description-level examples/anti_examples, not from on-disk eval JSON. Verified: schema is valid JSON, npm run lint passes on the corpus (no skill currently violates the new rules), npm run protocol:check passes, regenerated docs/field-reference.generated.md. Closes audit finding M5 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (audit M6) Audit M6 (2026-05-27) recommended grepping all four audit-runner prompts for `audit_verdict` (singular pre-v7) and the phrase "v6 four-verdict". Survey results: - audit_verdict (singular): zero hits across all 4 prompts. Already cleaned in an earlier sweep — leaving alone. - "v6 four-verdict": zero hits. (The phrase doesn't appear; the audit was checking for it as a stale-shibboleth indicator.) - "v6 Understanding fields" (single-model prompt:96): one hit; fixed to "five flat top-level Understanding fields (introduced v6, canonical v8)". - "Concept Card" (renamed to "Concept of the skill" on 2026-05-26): two hits across single-model:100 and codex-autonomous-v5:211; fixed to reference the new heading with a back-link. batch-worker-v4 and minimal-iteration prompts were scanned and carry neither stale label. No other fixes needed. Closes audit finding M6 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ame (audit M8, M15) schemas/audits-manifest.schema.json had three under-specified fields: - runner.version was free-form (audit M8): a typo or stale entry like "version: vfour" or "version: 6" (when the runner is actually at v5) passed schema validation silently. - artifact_rule.name had no character constraint (audit M15): a typo like "findings .md" (stray space) would pass schema and then never match an on-disk file. - artifact_rule.when had no syntax constraint (audit M15): a typo like "skill.comprehension_verdict in [PROVISIONAL]" (missing single quotes around the enum value, expected by check-audit-manifest.js after the 2026-05-25 enum-leak fix) passed schema and produced a predicate the verifier could not evaluate. Tightenings: - runner.version regex `^(v?\d+(\.\d+)*\+?|\d+\.\d+)$` accepts the current shapes (v3, v3+, v4, v5, 1.0) and rejects typos that miss the leading optional v and a numeric segment. - artifact_rule.name regex `^[a-zA-Z0-9._-]+$` is the filename-safe set; spaces, paths, or shell metas now fail at schema time. - artifact_rule.when regex constrains the four supported predicate shapes: `always`, `skill.<field> in [<list>]`, `skill.<field> == '<value>'`, and `runner.(id|mode) (==|startswith) '<value>'`. Free-form predicates were the loophole the audit M15 named; the verifier already only understands these four shapes. Verified: audits/manifest.json still validates against the tightened schema (each existing value matches one of the new regexes), and `node scripts/check-audit-manifest.js` exits with the same RED state as before — the 15 missing-comprehension-artifact failures are the existing H10 CONTENT-debt, not schema-validation failures. Closes audit findings M8 and M15 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The seven runners that drive the four audit operations (improve / evaluate / evolve / status) had no unit tests: run-skill-improvement-loop, skill-evolution-loop, skill-status, eval-staleness-checker, batch-eval, eval-linter, skill-test-runner. Verdict write-back edge cases were caught only by integration runs or manual testing; a syntactic regression or sibling-require break would ship silently. This commit adds scripts/__tests__/test-lib-audit-smoke.js with 14 assertions: - 7 × `node --check` syntax pass. - 7 × `node <runner> --no-such-flag-xyz` does not panic with "Cannot find module", "UnhandledPromiseRejection", or "TypeError: Cannot read" — the regression classes the audit M9 finding cares about. Subprocess invocation, not require(), because several runners call their main() at load time and process.exit() would kill the test mid-run. Wired into npm run test:unit. NOT full unit coverage — that is the follow-up ticket. This catches the "runner refactor silently broke its require()" class. Closes audit finding M9 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (audit M10) Both scripts are part of `npm run verify` (or its expanded chain after the B7 commit) but had no unit tests. Refactoring either was "run CI and pray" because the only catch was the integration gate. This commit adds scripts/__tests__/test-verify-gate-scripts.js with 14 assertions: build-status-doc.js: - node --check passes (syntax) - exports readSchemaVersion / readSkillCount / renderMarkdown / runCheck - readSchemaVersion returns a non-empty schema version - renderMarkdown round-trips a hand-built state with PASS and FAIL checks and includes schema_version + both check labels in the output markdown check-audit-manifest.js: - node --check passes (syntax) - a real subprocess run with the live manifest exits with a status (not a hang / not a panic) and emits output naming the audit- manifest concepts (comprehension / missing skills / audit / manifest). The current RED state from H10 CONTENT-debt is fine — the test only asserts the surface is named. Wired into npm run test:unit. Closes audit finding M10 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…audit M11) Audit M11 (2026-05-27): ADR-0007 specified a weekly skill-graph-routing-eval.js pass; the actual practice is event-driven (three audits on 2026-05-25 / 26 triggered by the multi-model restructure review, not by a clock). Either accept the event-driven shape and amend the ADR or commit to the weekly rhythm; currently neither is true. This commit amends. Three reasons spelled out in the new section: 1. The verify-chain routing-eval already runs on every commit (`npm run routing-eval`), so per-skill regressions are caught at commit-time, not at end-of-week. 2. A weekly cron would add work without adding signal — most weeks have no SYSTEM change worth re-routing. 3. The multi-model audit pattern (2026-05-25 / 26) is the right shape for major restructures and should repeat when the next one lands, not get pre-scheduled. If a future operational gap proves the event-driven shape misses skills that need re-routing review, file a new ADR. Closes audit finding M11 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o audit loop Audit M12 (2026-05-27): the 302-skill flat-layout migration sequenced priorities P0–P7. P0 (4 schema_version orphans) closed 2026-05-25; P1–P7 had no commits in the 53-commit post-audit window. P1–P7 cannot land as a SYSTEM commit in this branch — five of the seven priorities (P2–P7 minus the one codemod-extend ticket) are explicitly CONTENT-side per-skill rewrites that AGENTS.md § Sequencing forbids batching into a SYSTEM commit. The work belongs inside /audit:* runs, one skill at a time, with per-skill Health Block evidence. Additionally, the migration touches the canonical library at ~/Development/skills/, a different repo. This commit lands the SYSTEM-side closure plan: explains why P1–P7 are deferred, lists which audit-remediation-2026-05-27 commits unblock each priority (B2/H2 for honest verdicts, B1 for drift_status, H8/M3 for lint), and breaks out P1 as the residual SYSTEM ticket (extend the migrate codemod to walk the flat tree) vs P2–P7 as the CONTENT drain. Closes audit finding M12 (system-audit-2026-05-27) on the SYSTEM side. CONTENT-side P1–P7 drain remains per-skill /audit:* work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…M14) audits/migration-mapping-v7-to-v8.json:4 referenced /Users/jacobbalslev/.claude-profiles/jacobbalslev01/plans/we-should-clearly-look-wondrous-firefly.md — a workspace-local path that does not exist in any other clone or in CI. Replaced the value with null and added a plan_note explaining where the plan actually lives (operator's local plans dir; this file is codemod run-output, not a planning pointer) and how to populate the field if a plan is committed. Closes audit finding M14 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…covery (audit M16) scripts/check-charter-parity.js previously hardcoded a 4-entry candidate list at lines ~93-97. Audit M16 (2026-05-27): new active mirrors added under WORKSPACE_ROOT without editing the script would silently escape the parity check. Replaced the hardcoded list with a directory scan of WORKSPACE_ROOT. Every top-level sibling whose AGENTS.md exists and carries the canonical charter marker is inspected (the extractCharter call already filters out files without the marker). Discovery is shallow — the charter marker always lives at the top of a sibling repo, so no recursion is needed. Verified: `npm run charter:parity` exits 0 and reports the same "skill-audit-loop/AGENTS.md (MIRROR_ARCHIVED)" warning as before (consistent with ADR 0009 — that mirror is archived and the warning is expected behavior, not a regression). Closes audit finding M16 (system-audit-2026-05-27). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closure pass over audit L1–L5 (system-audit-2026-05-27): L1 — schema_version oneOf integer+string sunset window not named: Added a clarifying sentence to the schema_version description in schemas/skill.schema.json:34 — authors SHOULD write integer; the string back-compat form will be dropped no earlier than the v8→v9 phase. Names the deprecation horizon the audit asked for. L2 — bin/skill-graph.js subcommands vs package.json scripts split: Added a one-paragraph clarification in AGENTS.md § Project Shape naming the audiences (public CLI vs internal CI gate) and the edit rule (preserve both surfaces). L3 — .skill-graph/config.json not in package.json files array: Verified negative — the file hardcodes a relative sibling path (`../skills/skills`) that does not exist on a fresh-clone install outside the canonical two-repo workspace layout. Shipping it would point every npm consumer's lint at a non-existent path. Intentional exclusion; not fixed. L4 — SKILL_METADATA_PROTOCOL.md restates schema $id as prose: Verified negative — current SKILL_METADATA_PROTOCOL.md contains no $id duplication (`grep -n "\$id\|skillgraph.dev\|json-schema.org"` returns zero hits). The audit may have been based on an earlier version; gap does not exist now. L5 — marketplace/skills/ not verified in CI: Closed by audit B7 (marketplace:verify added to npm run verify earlier in this branch). Closes audit findings L1, L2, L4 (system-audit-2026-05-27). L3 + L5 verified negative / closed elsewhere. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rate
Re-verification after the full audit-remediation pass surfaced a
regression in the T1 (audit B5) routing-eval CLI fix: generate-manifest
exits non-zero on v8 skills that lack the legacy v7 category/type
fields (e.g., expected-value, which was added during this branch),
but it still writes the manifest file. The previous catch block
threw away the freshly-written manifest and fell back to the stale
on-disk file, re-creating the staleness problem B5 was meant to fix.
This commit changes the fallback condition: only fall back when the
generator produced NO output file at all. Validation warnings get a
single-line stderr note ("CONTENT-debt expected during v7→v8") and
the eval proceeds with the fresh manifest, which is what consumers
actually want.
Also regenerates docs/field-reference.generated.md so
`npm run protocol:check § C7` is clean — that regeneration was missed
after the M5 + L1 schema description edits.
Verified: `node scripts/skill-graph-routing-eval.js --only-asserted`
now reports 10/10 PASS (was reporting 7 PASS / 2 FAIL due to the
fall-back to stale manifest) and `node scripts/check-protocol-consistency.js`
exits 0.
Follow-up to T1 (audit B5).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Important Review skippedToo many files! This PR contains 212 files, which is 62 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (212)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review SummaryThe review did not run because the selected model is no longer available. Choose another model in Kilo Code review settings: https://app.kilo.ai/code-reviews |
There was a problem hiding this comment.
Pull request overview
This PR closes 33 findings from the 2026-05-27 system audit across 34 logical commits. It hardens the v8 5-axis classification contract (formerly compat-window), wires several previously-orphaned gates into the verify chain, and adds verdict write-back paths (drift, comprehension, single-model PROVISIONAL downgrade) along with smoke/integration test coverage for runners that previously had none.
Changes:
- Promotes v8 axes (
subject/operation/scope) to schema-required, deprecates v7 classification fields, and reconciles a wide set of docs/ADRs/templates to the post-2026-05-26 v8 state. - Adds verdict write-back:
drift --write-verdict,stampComprehensionVerdict, andPROVISIONALdowngrade for--single-modelapplication runs; extends lint with named v8-axis errors, boundary reason-text warnings, and field-purpose-comment advisory. - Expands the verify chain (
marketplace:verify,status:check) and adds smoke/integration tests forlib/audit/*, verify-gate scripts, and the work-mode-separation hook; files CONTENT-side follow-up tickets for drift/comprehension debt.
Reviewed changes
Copilot reviewed 210 out of 212 changed files in this pull request and generated 1 comment.
Show a summary per file
Given the very large scope (≈180 files), only the files with substantive system-side changes are summarized here; the ~152 regenerated marketplace exports are mechanical and not individually listed.
| File | Description |
|---|---|
| SKILL_METADATA_PROTOCOL.md | Rewrites the v7/v8 framing to "phase ended 2026-05-26"; adds the "Inline field comments" authoring convention. |
| SKILL_GRAPH.md / SKILL_AUDIT_LOOP.md | Updates freshness claim, adds Part 3 audience preamble for workspace-only scripts. |
| lib/audit/evaluate-skill.js | Adds PROVISIONAL to application enum + --single-model downgrade; introduces stampComprehensionVerdict. |
| lib/audit/skill-audit.js | Re-verified structural/truth verdict write-back path. |
| scripts/skill-lint.js | Adds named v8-axis errors, boundary-deference warnings, and field-purpose-comment check; --strict escalates warnings. |
| scripts/skill-graph-drift.js | Adds --write-verdict opt-in to stamp drift_status. |
| scripts/skill-graph-routing-eval.js | Auto-generates a fresh manifest when --manifest is omitted. |
| scripts/build-status-doc.js | Adds the verify-chain status:check aggregator (drift/audit-manifest deliberately excluded). |
| scripts/export-marketplace-skills.js | Drops historical skill_graph_protocol provenance key. |
| scripts/check-charter-parity.js | Switches from hardcoded mirror list to dynamic WORKSPACE_ROOT discovery. |
| scripts/check-protocol-consistency.js / check-doc-drift.js | Reconciled consistency claims (7 checks; C6 retired). |
| scripts/lint/check-stability-promotion.js | Documents advisory-by-default, --strict opt-in. |
| scripts/generate-manifest.js / export-skill.js / backfill-field-purpose-comments.js | Supporting tool updates for v8 axes and field-purpose backfill. |
| scripts/tests/test-{work-mode-separation,verify-gate-scripts,lib-audit-smoke,marketplace-export}.js | New tests: 16 work-mode assertions, 14 verify-gate, 14 lib/audit smoke, marketplace export coverage. |
| schemas/skill.schema.json | Adds PROVISIONAL to comprehension/application enums; deprecates v7 classification fields. |
| schemas/audits-manifest.schema.json | Tightens runner.version, artifact name pattern, and when-clause syntax. |
| schemas/skill.context.jsonld | v8 axis context updates. |
| audits/lanes.json | Normalizes schema_version to integer. |
| audits/migration-mapping-v7-to-v8.json | Replaces user-local plan path. |
| audits/prompts/*.md | Grep-cleans stale v6 "Concept Card"/"Understanding fields" framing. |
| audits/expected-value/*.md | New expected-value scorecard/findings/verdict. |
| package.json | Adds marketplace:verify + status:check to npm run verify; adds stability:check:strict. |
| README.md | Rewrites Integrity Gate definition to match what verify actually runs. |
| .github/workflows/skill-graph-lint.yml | Removes malformed paths filter entries. |
| .github/PULL_REQUEST_TEMPLATE.md / ISSUE_TEMPLATE/* | Removes deprecated mirror references. |
| .gitignore | Adjustments related to auto-generated routing-eval manifest. |
| bin/skill-graph.js | Wires routing-eval auto-generation path. |
| docs/adr/0016 / 0007 | Status promotion (Proposed → Accepted) and cadence amendment (event-driven). |
| docs/field-reference.md(.generated.md) | Names canonical resolution in doc-ownership map. |
| docs/{PRIMER,ADOPTION,QUICKSTART-30MIN,AUTHORING-QUICKSTART,publish-workflow,marketplace-syndication,manifest-field-mapping,quality-doctrine,skill-metadata-protocol,skill-audit-loop-executable-map}.md | v8-state reconciliations and stale-claim corrections. |
| docs/research/followups/*.md | Filed CONTENT-side / workspace-side follow-up tickets (B3, H9/H10, M12, B4). |
| docs/status.generated.md | Regenerated; still records check-protocol-consistency + marketplace-export-check as FAIL. |
| examples/skill-metadata-template.md | Adds field-purpose comments; keeps # TEMPLATE NOTE: stripping convention. |
| AGENTS.md | Documents bin/ vs package.json scripts audience split; confidence hierarchy. |
| marketplace/skills/**/SKILL.md (~152 files) | Regenerated marketplace exports under the post-B6 export pipeline. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (options.singleModel && verdict === 'APPLICABLE') { | ||
| console.log('\n[write-back] --single-model set: downgrading APPLICABLE → PROVISIONAL (single-model assessment, not dual-run confirmed).'); | ||
| verdict = 'PROVISIONAL'; | ||
| } |
Per AGENTS.md § Major Version Is a Clean Cut, migration codemods and deprecated-field checkers do not survive the cut. Their history lives in git. Deletions: - scripts/migrate-skill-v7-to-v8.js — the v7→v8 codemod, ~22K. Its output is now invalid (it still authored `operation:`). Anyone re-needing the migration logic recovers via `git show f88603d^:scripts/migrate-skill-v7-to-v8.js`. - scripts/lint/check-category-enum.js — enforces a v(N-1) field that no longer exists in the schema. The check itself was the symptom of "deprecated optional" framing; with the field gone, the check goes. Updates: - package.json: drop `category:check` script and its slot in the `verify` chain. - scripts/backfill-field-purpose-comments.js: delete the `operation:`, `type:`, `category:` comment-template blocks. Rename the v8-classification section divider from "5-axis model" to the current axes (subject + scope; polyhierarchy via subjects[]). Rename the eval-health divider to "Evaluation Status" per the 2026-05-27 f88603d doctrinal change #2. - scripts/skill-lint.js: update the V8_AXES surrounding comment from "v8 5-axis classification per ADR-0017" to the post-retire wording ("v8 classification — subject + scope"). The set itself was already correct. - .github/workflows/skill-graph-lint.yml: remove the dead `skills/**` path filter entries (no such directory exists in this repo post-ADR-0009 consolidation). Add `lib/**`, `marketplace/**`, and `AGENTS.md` to the filter — they're load-bearing surfaces missing from the prior filter.
Drives the real bin/skill-graph.js surfaces — audit, evaluate, evolve, and the init create path — against a fixture skill in a hermetic temp workspace, with a stubbed model CLI on PATH (no real model), asserting on-disk verdict/receipt transitions. The regression net that makes "done" mean "the real loop ran and wrote real state", not "the code reads correctly". Catches the three breaks from the 2026-05-30 end-to-end review: - #1 (evolve scaffold path silent-fail): evolve --analyze-only runs end-to-end + a guard asserts the scaffold script the engine references exists. - #2 (evaluate verdict write opt-in): evaluate with DEFAULT flags must move the Health Block on disk. Confirmed comprehension_verdict + freshness already write by default; the eval_score-by-default assertion is the FORCING FUNCTION for Step 3 (still gated behind --write-verdict at evaluate-skill.js:2061) — RED until Step 3 lands, flips green automatically. - #3 (verdict without artifact): evaluate must leave a durable on-disk receipt. Currently 13/14 assertions pass; the 1 red is Break #2 by design (no false-green). Wired into `npm run verify` only once Step 3 turns it green — a red test in the shared gate would block parallel sessions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ep2-before-Step3 ordering Step 0b contract test authored (skill-graph@4a3de1f), 13/14 pass. Records the verified Break #2 refinement: comprehension_verdict + freshness already write by default; only v6 eval_score/eval_failed_ids remain gated behind --write-verdict (the 1 red assertion = forcing function for Step 3). Verify-wiring deferred until Step 3 turns it green (no red test in the shared gate). Documents the verified Step2-before-Step3 ordering: flipping the default before receipts are durable persists verdicts on an ephemeral .cache foundation. Next: Step 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… Step 2 next Appends an UPDATE section recording this session (Step 1 de-fork + evolve-meaning reconciliation, Step 0b contract test 13/14, SH-6642 filed, Break #2 half-fixed refinement, Step2-before-Step3 ordering) and a superseding continuation prompt. The stale prior-session prompt is marked historical; the plan Progress log is the live record. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rable path Step 3: invert the v6 Health Block write-back (eval_score / eval_failed_ids / freshness) from opt-in (--write-verdict) to persist-by-default, with --dry-run as the single opt-out. An unattended loop no longer silently produces zero durable verdicts because it forgot a flag (Break #2). --write-verdict is retained as a harmless no-op alias. The bin/skill-graph.js help already described this ("Writes (when not --dry-run): ...") — the code now matches it (resolves E2). Also fixes a portability regression the test:unit standalone guard caught in the Step-2 durable-receipts commit (3ed080d): the durable eval-result path was anchored to a hardcoded SKILL_GRAPH_REPO_ROOT (path.resolve(__dirname,..,..)), which test-standalone-pipeline.js forbids in lib/ (breaks npm install -g). Now resolved via the portable log-paths.js LOG_DIR (monorepo agent-orchestration/logs or standalone .skill-graph/logs), with the receipt artifact path relative to WORKSPACE. test:unit green (exit 0).
… 3 final) The model-free black-box contract test (test-public-cli-loop-contract.js) now passes all 14 assertions — the eval_score-by-default forcing-function flipped green once Step 3 inverted the write-verdict default. Added it to the test:unit chain so it runs under both npm run verify and verify:system as the public-CLI loop guard against Break #1/#2/#3 regressions. Updated the test header + the in-body forcing-function comment to record Step 3 closed.
…ble 2026-05-30) Explains recommendation #2 from the 2026-05-30 Skill Graph direction roundtable: separate marketplace-publishable (export gate) from behaviorally-certified (application_verdict: APPLICABLE), and surface the verdict to consumers as a browse/sort filter. Grounds the proposal in primitives that already exist (the export gate, application_verdict in audit-state.json, marketplace_priority) and names the real gap: the public 6-field export strips application_verdict, so a consumer on skills.sh cannot see or filter behavioral certification. Full deliberation: ~/Development/.roundtable/skill-graph-2026-05-30/SYNTHESIS.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…map (correction) User review caught that roundtable recommendation #2 conflated two unrelated subsystems and was written in cryptic jargon. Corrected: - Publishing ("can it go public?") = deployment_target (general vs private) + the privacy/secret scanner. Already built. Quality is never read by the exporter (verified). - Quality ("is it any good?") = the Behavior Gate / application_verdict, owned by the Audit & Evaluation system, in the audit-state.json sidecar. Deliberately decoupled from publishing (ADR-0011 publishes untested skills labeled "behavior unvalidated"; marketplace_tier is a separate, non-quality-derived field). The old "release states" framing fused the two. Renamed off the wrong filename (git mv from proposals/) and rewritten as a plain-language map a non-insider can follow. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
drain_worklist looped on skill-audit-claim 'next', but 'next' filters on the STATIC worklist status (SKILL_LIST.json) which a ledger completion never refreshes -> it re-returned the just-completed top skill forever (verified: api-design re-enriched as drain #2 after committing 982212c). Now iterates the ranked SKILL_LIST array once (advances on any outcome), skips already panel-enrich-committed skills (resume-safe), claims for dedup, and rebuilds the worklist after each skill. Verified via --dry-run (99 eligible, api-design skipped, advances to code-review). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ll surface (board #2) New scripts/scan-skill-security.js (npm run security:scan): the complementary signal to the privacy gate. Privacy (privacy-patterns.js) stops the author private data leaking OUT at export; this scans published marketplace/skills bodies for execution/exfiltration patterns (curl|bash, base64-decode-and-eval, reverse shells, fork bombs, broad rm -rf, curl data-exfil, eval-of-fetched) and over-broad allowed-tools (P1 */all, P2 Bash(*), P3 bare shell; Bash(git:*) is clean). Advisory by default (exit 0 -- teaching skills legitimately show shell patterns as anti-examples); --strict hard-gates. Wired into release:check (advisory, release-time, NOT verify:system); unit test in test:unit so the logic is blocking-tested. First run: 180 scanned, 37 P3 bare-Bash advisories, 0 exec matches. Decided as a scan gate, not a 5th verdict. Also fixes a stale privacy-patterns.js comment naming a non-existent .github/workflows/privacy-scan.yml. Verified: 0.12s scan (no ReDoS); test-scan-skill-security green; package.json valid; release:check chain unbroken. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… public repo The prior #2 commit (7868c60) wrongly declared .github/workflows/privacy-scan.yml non-existent after checking only skill-graph workflows. The L3 pre-push hook and L4 CI privacy-scan workflow + ci-privacy-scan.js live in the PUBLIC jacob-balslev/ skills repo (sibling ~/Development/skills/), verified present. Correct the privacy-patterns.js comment and the CHANGELOG entry to say so. No-unverified-claims fixup caught during /wrap stale-ref grep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Records the /boardmeeting board findings, what the parallel session has already committed (board #2/#3/#6/#15 + Cluster 1-5), what remains (SYSTEM: alias delete, description-cap, anti-loss test, release:ready, scope notes, citations), and the CONTENT drains (certify-seed publish gate, displacement, admission, shelf rebalance) routed via /audit:*. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Closes the system-audit findings from
docs/audits/system-audit-2026-05-27.md— 8 BLOCKERs, 14 HIGH, 16 MEDIUM, and 6 LOW/NIT — across 34 commits, one logical change per commit.Summary
P0 — BLOCKERs (8 closed)
drift_statuswrite-back via--write-verdictopt-in inscripts/skill-graph-drift.jsPROVISIONALreachable via--single-modelflag; downgrades PASS→PROVISIONAL (comprehension) and APPLICABLE→PROVISIONAL (application).APPLICATION_VERDICT_ENUMnow includes PROVISIONAL.docs/research/followups/b3-legacy-script-delegation-2026-05-27.md(cross-repo edit deferred; ready to apply)structural_verdict/truth_verdictwrite-back is implemented atlib/audit/skill-audit.js:1086-1097; corpus state is CONTENT debt drained per-skill via/audit:*skill-graph routing-evalregenerates manifest on demand (instead of reading stale gitignored local file)marketplace:verify+status:checkadded tonpm run verify; README Integrity Gate definition rewritten to match what verify actually runs;driftandaudit-manifest:checkdeliberately held out (CONTENT-blocked, documented)SKILL_AUDIT_LOOP.mdPart 3 carries an "Audience & runtime" preamble naming the workspace-only scripts that aren't bundled in @skill-graph/cliP1 — HIGH (12 closed) + 2 CONTENT ticket-docs
stability:checkdocumented advisory-by-default;--strictopt-in flips to fail-loud;stability:check:strictnpm script addedcomprehension_verdictwriteback to SKILL.md (newstampComprehensionVerdictmirroringstampApplicationVerdict)pathsfilter entries deleted fromskill-graph-lint.ymlskill-metadata-protocol/skill-audit-loopmirror refs removed from PR template, issue templates, and contact linksaudits/lanes.jsonschema_versionnormalized to integerskill-lint.js(advisory;--strictopts into fail)docs/research/followups/content-side-audit-tickets-2026-05-27.md); SYSTEM-side gating closed under B7evals/comprehension.jsonP2 — MEDIUM (7 closed)
allOfrules in schema (passingandmonitoredrequireeval_artifacts: present)v6 Understanding fields/Concept Cardstale framingwhenclause syntaxP3 — MEDIUM (5 closed)
lib/audit/*runners (14 assertions: syntax check + unknown-flag no-panic)build-status-doc+check-audit-manifest(14 assertions)audits/migration-mapping-v7-to-v8.jsoncheck-charter-parity.jsswitched from hardcoded mirror list to dynamic WORKSPACE_ROOT discoveryP4 — NITs (3 closed inline, 2 verified-negative)
bin/vspackage.json scriptsaudience split clarified in AGENTS.md.skill-graph/config.jsonhardcodes a relative path that would break npm consumers)$idduplication in current protocol doc)Verification
Every gate I touched exits 0 at HEAD:
npm run lint→ exit 0npm run protocol:check→ exit 0 (C1–C5, C7, C8 all PASS)npm run routing-eval(auto-generated fresh manifest) → 10/10 PASStest-application-verdict-write-back.js→ 53/53 PASStest-work-mode-separation.js→ 16/16 PASS (new)test-lib-audit-smoke.js→ 14/14 PASS (new)test-verify-gate-scripts.js→ 14/14 PASS (new)test-stability-promotion.js→ all PASSnpm run driftandnpm run audit-manifest:checkremain RED (CONTENT debt; tickets filed under H9/H10).Out-of-scope follow-ups deferred to their own commits
~/Development/scripts/skill/(b3-legacy-script-delegation-2026-05-27.md has ready-to-apply delegator bodies)/audit:auditruns for the 7 drift-red skills, 15 missing comprehension.json files, and the v7→v8 P1–P7 migration backlog (all enumerated indocs/research/followups/)~/.claude-profiles/.../project_duplicate_skill_scripts_canonical_issue.mdupdated to reflect SH-6198 closure + B3 residualTest plan
npm install && npm run verifyon a fresh clonemaster🤖 Generated with Claude Code