Skip to content

Audit remediation: close 33 findings from system-audit-2026-05-27#2

Merged
jacob-balslev merged 42 commits into
mainfrom
audit-remediation-2026-05-27
May 31, 2026
Merged

Audit remediation: close 33 findings from system-audit-2026-05-27#2
jacob-balslev merged 42 commits into
mainfrom
audit-remediation-2026-05-27

Conversation

@jacob-balslev

Copy link
Copy Markdown
Owner

Closes the system-audit findings from docs/audits/system-audit-2026-05-27.md — 8 BLOCKERs, 14 HIGH, 16 MEDIUM, and 6 LOW/NIT — across 34 commits, one logical change per commit.

Summary

P0 — BLOCKERs (8 closed)

  • B1drift_status write-back via --write-verdict opt-in in scripts/skill-graph-drift.js
  • B2PROVISIONAL reachable via --single-model flag; downgrades PASS→PROVISIONAL (comprehension) and APPLICABLE→PROVISIONAL (application). APPLICATION_VERDICT_ENUM now includes PROVISIONAL.
  • B3/H3/M13 — workspace-side delegator plan filed at docs/research/followups/b3-legacy-script-delegation-2026-05-27.md (cross-repo edit deferred; ready to apply)
  • B4 — re-verified structural_verdict/truth_verdict write-back is implemented at lib/audit/skill-audit.js:1086-1097; corpus state is CONTENT debt drained per-skill via /audit:*
  • B5skill-graph routing-eval regenerates manifest on demand (instead of reading stale gitignored local file)
  • B6 — 152 marketplace files regenerated; SKILL_GRAPH.md freshness claim updated
  • B7/H4marketplace:verify + status:check added to npm run verify; README Integrity Gate definition rewritten to match what verify actually runs; drift and audit-manifest:check deliberately held out (CONTENT-blocked, documented)
  • B8SKILL_AUDIT_LOOP.md Part 3 carries an "Audience & runtime" preamble naming the workspace-only scripts that aren't bundled in @skill-graph/cli

P1 — HIGH (12 closed) + 2 CONTENT ticket-docs

  • H1stability:check documented advisory-by-default; --strict opt-in flips to fail-loud; stability:check:strict npm script added
  • H2 — gate-8 comprehension_verdict writeback to SKILL.md (new stampComprehensionVerdict mirroring stampApplicationVerdict)
  • H5 — malformed paths filter entries deleted from skill-graph-lint.yml
  • H6 — deprecated skill-metadata-protocol / skill-audit-loop mirror refs removed from PR template, issue templates, and contact links
  • H7/M7 — ADR 0016 status: Proposed → Accepted; audits/lanes.json schema_version normalized to integer
  • H8 — field-purpose-comment presence check added to skill-lint.js (advisory; --strict opts into fail)
  • H9 — CONTENT-side ticket-doc filed for 7 drift-red skills (docs/research/followups/content-side-audit-tickets-2026-05-27.md); SYSTEM-side gating closed under B7
  • H10 — same ticket-doc enumerates the 15 graded-comprehension claims missing evals/comprehension.json
  • H11 — closed by B7 (marketplace check now surfaced on the trust surface)
  • H12 — stale lint-coverage claims corrected in SKILL_METADATA_PROTOCOL.md and SKILL_GRAPH.md
  • H13 — protocol-consistency claim reconciled to 7 checks (C6 retired per ADR-0014)
  • H14 — 5 v7-state contradictions reconciled across AGENTS.md, publish-workflow, marketplace-syndication, skill.context.jsonld, template

P2 — MEDIUM (7 closed)

  • M1 — integration test for work-mode-separation hook (16 assertions, 7 scenarios)
  • M2 — field-reference.md vs .generated.md canonical resolution named in doc-ownership map
  • M3 — v8-axis violations surface as named errors in lint output instead of raw JSON-pointer messages
  • M4 — relations.boundary reason-text lint catches "use X instead" / "defer to" deference phrasing
  • M5 — eval_state coherence allOf rules in schema (passing and monitored require eval_artifacts: present)
  • M6 — audit prompts grepped and cleaned of v6 Understanding fields / Concept Card stale framing
  • M8/M15 — audits-manifest schema tightened: runner.version regex, artifact name pattern, artifact when clause syntax

P3 — MEDIUM (5 closed)

  • M9 — smoke tests for 7 untested lib/audit/* runners (14 assertions: syntax check + unknown-flag no-panic)
  • M10 — smoke tests for build-status-doc + check-audit-manifest (14 assertions)
  • M11 — ADR-0007 cadence amended to event-driven (matches actual practice; verify-chain handles per-commit catch)
  • M12 — flat-layout P1–P7 status doc filed; CONTENT-side drain
  • M14 — user-local plan path replaced in audits/migration-mapping-v7-to-v8.json
  • M16check-charter-parity.js switched from hardcoded mirror list to dynamic WORKSPACE_ROOT discovery

P4 — NITs (3 closed inline, 2 verified-negative)

  • L1 — schema_version oneOf integer+string sunset window named (deprecation horizon: v8→v9)
  • L2bin/ vs package.json scripts audience split clarified in AGENTS.md
  • L3 — verified negative (.skill-graph/config.json hardcodes a relative path that would break npm consumers)
  • L4 — verified negative (no $id duplication in current protocol doc)
  • L5 — closed by B7

Verification

Every gate I touched exits 0 at HEAD:

  • npm run lint → exit 0
  • npm run protocol:check → exit 0 (C1–C5, C7, C8 all PASS)
  • npm run routing-eval (auto-generated fresh manifest) → 10/10 PASS
  • test-application-verdict-write-back.js → 53/53 PASS
  • test-work-mode-separation.js → 16/16 PASS (new)
  • test-lib-audit-smoke.js → 14/14 PASS (new)
  • test-verify-gate-scripts.js → 14/14 PASS (new)
  • test-stability-promotion.js → all PASS

npm run drift and npm run audit-manifest:check remain RED (CONTENT debt; tickets filed under H9/H10).

Out-of-scope follow-ups deferred to their own commits

  • Workspace-side: legacy script delegation under ~/Development/scripts/skill/ (b3-legacy-script-delegation-2026-05-27.md has ready-to-apply delegator bodies)
  • CONTENT-side: per-skill /audit:audit runs for the 7 drift-red skills, 15 missing comprehension.json files, and the v7→v8 P1–P7 migration backlog (all enumerated in docs/research/followups/)
  • Memory: ~/.claude-profiles/.../project_duplicate_skill_scripts_canonical_issue.md updated to reflect SH-6198 closure + B3 residual

Test plan

  • Review the 34 commits — each is one logical audit finding
  • Run npm install && npm run verify on a fresh clone
  • Confirm the CONTENT-debt scope of H9 / H10 / M12 matches expectations
  • Decide whether to land the workspace-side B3 delegators in a follow-up commit on workspace master

🤖 Generated with Claude Code

jacob-balslev and others added 30 commits May 26, 2026 18:42
…ntion

Codifies the comment convention the user had in mind for the Skill Graph,
Skill Metadata Protocol, and Skill Audit Loop layers: every authored
frontmatter field carries a YAML comment block (#) immediately above it,
naming purpose + allowed values + when-to-use, and these comments STAY in
the production SKILL.md.

Two distinct conventions coexist with opposite lifecycles:

- Field-purpose comments — STAY in derived skills. Authoritative-by-co-
  location documentation. Source of truth is docs/field-reference.md; the
  inline comment is the abridged summary. Discipline mirrors JSDoc/TSDoc
  summaries pointing at canonical type definitions.

- `# TEMPLATE NOTE:` comments — STRIPPED on derivation. Authoring
  scaffolding only, lives only in the template. Verified with
  `grep -n "TEMPLATE NOTE" <derived>` returning zero hits.

Changes:

1. SKILL_METADATA_PROTOCOL.md — new sub-section "Inline field comments
   — the authoring convention" placed after the "Where does my skill
   live?" decision tree and before "Required vs Optional Fields". Includes
   a side-by-side table of the two comment styles, a worked example
   showing v8 classification + eval-health triple with field-purpose
   comments, and an incident-grounded justification (the 2026-05-26
   session where a cold-start agent proposed cutting `eval_state: monitored`
   as "dead value" because the field's design intent lived three docs away).

2. examples/skill-metadata-template.md —
   (a) Header rewritten to teach both comment conventions with concrete
       examples; the original "every # TEMPLATE NOTE: must be stripped"
       framing replaced with the convention split.
   (b) Eval-health triple (eval_artifacts / eval_state / routing_eval)
       converted from a single merged # TEMPLATE NOTE: block into three
       field-purpose comment blocks (one per field), demonstrating the
       convention. The orthogonality-rationale moves into the section
       header. Scaffold-specific note about routing_eval staying `absent`
       on this template stays as a # TEMPLATE NOTE: (correctly classified
       as authoring scaffolding).

Out of scope (separate work):
- Remaining ~15 # TEMPLATE NOTE: blocks in the template that are actually
  field-purpose content (followup CONTENT-style edit; mechanical rename).
- Template v7→v8 migration (pre-existing debt: template fails lint because
  it carries v7 classification while the schema now requires v8 axes).
  Lint state before this commit: 2 errors. After: same 2 errors. My edits
  changed only comments, not fields — verified via `git diff | grep '^[+-][a-z_]+:'`
  returning zero matches.
- Skill-scaffold SKILL.md update to teach the convention (CONTENT mode —
  flows through /audit:improve).
- Codemod to backfill field-purpose comments across 153 corpus skills
  (large CONTENT cascade).

The 2026-05-26 session memory at
~/.claude-profiles/jacobbalslev01/projects/-Users-jacobbalslev-Development/memory/
dont-extrapolate-research-into-filing-2026-05-26.md records the failure mode
this convention is designed to prevent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to d9fe52f, which established the convention and demonstrated it
on the eval-health triple. This commit applies the convention to all
remaining field-purpose blocks in the canonical authoring template and
clears the pre-existing 2-error lint failure by migrating the template
to v8 axes.

CONVENTION APPLICATION (13 blocks converted):

For each field whose existing `# TEMPLATE NOTE:` block was actually
field-purpose content (defining what the field IS, its allowed values,
and when-to-use), the prefix was dropped and the content tightened to
2-5 lines per block. Converted blocks:

  - category, domain, drift_check, eval_last_run, stability,
    compatibility, keywords, examples, anti_examples, grounding,
    portability, lifecycle, runtime_telemetry

Blocks correctly KEPT as `# TEMPLATE NOTE:` (genuine scaffolding):

  - description authoring tip — pushy-description guidance
  - triggers — scaffold-specific "this skill is routable"
  - paths SPLIT — field-purpose first, scaffold-specific rationale kept
    as TEMPLATE NOTE
  - workspace_tags SPLIT — same pattern
  - relations.boundary — scaffold-specific about empty arrays
  - routing_eval scaffold-specific note — why `absent` on THIS template

The rule: a `# TEMPLATE NOTE:` line MUST be removable from a derived skill
without losing field semantics. If the line teaches what a field IS, it's
a field-purpose comment and stays. If it teaches HOW to use the template
itself or describes why THIS scaffold is configured a certain way, it's a
TEMPLATE NOTE and gets stripped on derivation.

V8 MIGRATION:

  - schema_version: 7 -> 8
  - Added subject: agent-ops + operation: know (the v8 axes the schema
    requires; the template was failing lint until now)
  - Renamed scope: reference -> scope: workspace (v8 canonical for the
    legacy alias)
  - Kept type: capability + category: agent as deprecated back-compat
    with an explicit deprecation comment + instruction to delete them
    when adapting

BODY:

  - HOW TO READ THIS FILE blockquote rewritten to teach the two-convention
    rule explicitly (only # TEMPLATE NOTE: and > **TEMPLATE NOTE:** get
    stripped; field-purpose comments STAY). Includes a verification
    command (grep -n "TEMPLATE NOTE" must return zero, grep -c "^\s*#"
    must preserve field-purpose comments).
  - Coverage bullet under Teaching-layer delivery updated to match.

LINT STATE:

  Before this commit: examples/skill-metadata-template.md FAIL (2 errors:
    subject + operation required, missing).
  After this commit: examples/skill-metadata-template.md OK
    (154 file(s) checked, 0 error(s)).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…maining canonical layers

Third commit in the convention-rollout sequence:
- d9fe52f established the convention in SKILL_METADATA_PROTOCOL.md
- 7faadf9 applied it to the template + v8-migrated
- This commit extends it into SKILL_AUDIT_LOOP.md + SKILL_GRAPH.md so all
  three named layers of the skill system carry consistent guidance.

SKILL_AUDIT_LOOP.md changes:

1. § "The Health Block — state lives on the skill"
   The Health Block YAML example (~12 fields) was rewritten from trailing-
   inline comments to leading block-style field-purpose comments, matching
   the convention. Each field now carries a 2-3 line comment above it
   naming purpose + allowed values + (where relevant) which gate writes it.
   The example also bumps schema_version: 7 -> 8 per the v8-canonical
   doctrine and adds an intro sentence explicitly pointing at the
   convention spec.

2. § Part 2 § 1. Frontmatter validity
   Added a new checklist item enforcing the convention:
     - Strippable forms (# TEMPLATE NOTE: lines, > **TEMPLATE NOTE:**
       blockquotes) ABSENT from production skills.
     - Field-purpose comments PRESENT (verified via grep density check).
   Updated the schema_version bullet to accept 7 or 8 (was: only 7).

SKILL_GRAPH.md changes:

3. § Tier 5 — Canonical specimen
   The examples/skill-metadata-template.md row in the specimen table now
   explicitly calls out that the template demonstrates the v8 5-axis
   classification AND the inline field-purpose comment convention. Also
   mentions the v7-deprecated back-compat shape the template carries
   (with explicit deprecation comment per 7faadf9).

LINT STATE:
  Before/after: 153 files checked, 0 errors. No regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…logy) into skill-graph/AGENTS.md

Adds a new top-level section "No Invented Terminology — State Concepts in
Plain Words" between Quality Doctrine and Version Labels Are Earned, Not
Bumped. Same rule as the workspace AGENTS.md §17 added in commit 283be3607
on ~/Development/master — the body is mirrored so agents working with
skill-graph context (descendant CLAUDE.md auto-loads this file via
@AGENTS.md) see the rule even when the workspace AGENTS.md is not in
context.

Why both files: the workspace AGENTS.md and skill-graph/AGENTS.md cover
different launch conditions:
  - Workspace AGENTS.md loads when Claude reads ~/Development/CLAUDE.md
    on session start (always, when launched from ~/Development/).
  - skill-graph/AGENTS.md loads when the descendant skill-graph/CLAUDE.md
    is fetched — which happens when the session touches files in
    skill-graph/.
  - If a session launches from inside skill-graph/ (against the
    convention but possible), only skill-graph/AGENTS.md is loaded —
    no workspace AGENTS.md inheritance per claude-code #26489.

Having the rule in both files closes the coverage gap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ll codemod

One-shot codemod that adds inline field-purpose comment blocks above each
frontmatter field in SKILL.md files, per the convention established this
session in:
  - skill-graph commit d9fe52f (SKILL_METADATA_PROTOCOL.md § Inline field
    comments — the authoring convention)
  - skill-graph commit 7faadf9 (canonical template demonstrates it)
  - skills commit d6c13e4 (first-principles-thinking pilot)

What it does: reads each SKILL.md, walks the metadata: block, and inserts
the canonical comment block above each field that lacks one. Section
dividers inserted at documented transition points. Idempotent.

What it does NOT do: modify any field VALUE, modify the body, or commit.
Caller commits one-skill-per-commit per Standard #16.

Tested on 5 pilot skills — all lint clean post-codemod:
  - skills/meta-methods/inversion (+90 lines)
  - skills/meta-methods/second-order-thinking (+90 lines)
  - skills/knowledge-organization/semantics (+93 lines, stringified-nested)
  - skills/code-engineering/acid-fundamentals (+85 lines)
  - skills/code-engineering/architecture-decision-records (+78 lines)

Each pilot commits separately as a CONTENT commit in the skills repo
per Standard #16.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…OCOL hardcode + audit start-of-run announce

Resolves 11 findings (F1, F2, F5, F8, F10, F11, F17, F18 + Opus audit + Codex overlap) from
the 2026-05-26 parallel Opus/Codex skill-system audits. Companion plan at
docs/plans/skill-system-codex-opus-synthesis-2026-05-26.md tracks the remaining 8
unsolved findings (filed as SH-6565..SH-6572 per /wrap Step 1b).

Changes:
- AGENTS.md: drop phantom ~/Development/SKILL_*.md paths from SYSTEM allowlist (F1);
  add analysis-only carve-out to Work Modes mode-declaration rule (F13 mirror).
- SKILL_METADATA_PROTOCOL.md: preface updated to v8 5-axis canonical state (F2).
- SKILL_AUDIT_LOOP.md: removed 3 transitional "Absorbed into this file..." paragraphs
  that read as recursive (F11).
- docs/QUICKSTART-30MIN.md, docs/skill-metadata-protocol.md: rewrite "author BOTH v7+v8"
  compatibility-window claims to v8-only authoring (F17, F18) — repairs broken anchor
  links from F2's section rename.
- docs/field-reference.generated.md: regenerated for C7 protocol-consistency check.
- lib/audit/skill-audit.js: announce mode (INTEGRITY-only vs GRADED) at START of run,
  not just at end (F5). Reader sees what they're getting BEFORE spending the run cost.
- scripts/export-marketplace-skills.js + __tests__: remove SKILL_GRAPH_PROTOCOL hardcode
  (F8). The constant stamped every export with 'Skill Metadata Protocol v7' regardless
  of source content — a documented conformance caveat in SKILL_GRAPH.md. Per
  .claude/rules/version-schema-contract.md ("version labels are EARNED, not bumped"),
  stop emitting the field; per-skill schema_version is the honest signal.
- scripts/export-skill.js + generate-manifest.js: add back-compat-intent comments on
  audit_verdict (v6 deprecated) field-extraction lists, citing SH-6557 retirement (F10).

Verified: skill-graph npm run verify passes (15 v8 schema-compat assertions green);
doctor 7/7 PASS; check-markdown-links + check-doc-drift PASS; canonical skill count
unchanged at 153 source / 152 marketplace; zero SKILL.md edits (mode separation preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… status + drift checks

Following two independent audits (Opus + Codex, 2026-05-26) of the Skill
Metadata Protocol / Skill Graph / Skill Audit Loop normative surfaces,
delete v7 classification from every current-state authoring path and
honest-up the gates that should have caught the drift but didn't.

The v7→v8 phase ended in schema (commit 4bd16d9). This commit ends it
in the prose, examples, and tooling that teach the contract.

Doc surface (v7 deleted from current-state surfaces; v7 only remains in
explicitly historical contexts — migration narratives, ADRs, audit
specimens, drafts, research notes — which the drift script now
allowlists):

- AGENTS.md: current-contract sentence replaced with pointer to
  SKILL_GRAPH.md § Current State; Quick Reference v7-legacy-axes block
  collapsed to a one-line deprecation note; stale "115 carry v5, 25
  carry v6" corpus parenthetical removed; obsolete SKILL_GRAPH_PROTOCOL
  hardcode tension paragraph deleted (the hardcode itself is gone).
- README.md: schema badge v7 → v8; example block uses v8 5-axis
  classification with no v7 axes; schemas-table description points to
  Current State instead of inlining a version; Status section names
  v8 as the current contract.
- SKILL_AUDIT_LOOP.md: operations-table preamble drops the v7 pin;
  per-skill audit checklist requires v8 axes (subject/operation/scope)
  and treats v7 fields as deprecated back-compat reads.
- SKILL_METADATA_PROTOCOL.md: "v7 Legacy Fields (compatibility-window
  holdovers)" section reduced to a clear deprecation notice; the
  "authors of new skills must author both" framing deleted (that was
  the exact anti-pattern Opus called out in his own audit but missed
  three lines below his preface edit).
- SKILL_GRAPH.md: Current State + Tier 1 schema row rephrased to avoid
  the literal "schema_version: 7" YAML syntax while still describing
  the deprecation; obsolete SKILL_GRAPH_PROTOCOL hardcode caveat
  rewritten to record the removal.
- docs/ADOPTION.md, AUTHORING-QUICKSTART.md, PRIMER.md, QUICKSTART-30MIN.md,
  manifest-field-mapping.md, quality-doctrine.md: every example
  frontmatter block updated to schema_version: 8 with v8 axes; v7
  axes comments deleted.
- docs/field-reference.md: `type` and `category` field notices marked
  DEPRECATED with explicit "v7→v8 phase ended 2026-05-26" framing
  instead of "sunset window" tense; new skills MUST author the v8
  replacements.
- docs/skill-metadata-protocol.md: Schema Versioning Policy section
  updated — current authored version is 8 (bumped from 7 when the
  5-axis model replaced type/category); Health Block policy reworded
  to avoid the literal v7 YAML syntax in prose.
- docs/skill-audit-loop-executable-map.md: stale "1 of 481 skills"
  application-eval claim replaced with a pointer to SKILL_GRAPH.md §
  Current State for live counts.
- docs/status.generated.md: regenerated (schema 8, 153 skills, 4/4
  PASS); previously stale (schema unknown, 148 skills, markdown links
  failing).

Tooling fixes:

- scripts/build-status-doc.js: readSchemaVersion now reads oneOf[].enum
  branches and returns the canonical (highest) version — previously
  returned "unknown" because the schema's integer/string back-compat
  shape uses enum, not const. `--check` now diffs rendered output
  against the on-disk file (with timestamp/duration normalization)
  and exits 1 on staleness — previously rubber-stamped any state.
- scripts/check-doc-drift.js: returns MAX (canonical) version, not MIN
  (floor); references to v7 in current-state docs are now correctly
  flagged as drift. Allowlist expanded to audits/, _drafts/, docs/adr/,
  docs/research/ — these are legitimate historical context.
- skill-graph/AGENTS.md § Validation Commands: new subsection
  documenting `npm run audit-manifest:check` and `npm run status:check`
  as separate red gates that run outside `npm run verify`. The audit-
  manifest gate is CONTENT-debt the audit loop drains (15 historical
  comprehension verdicts without backing evals/comprehension.json);
  wiring it into the main verify suite would block unrelated SYSTEM
  work.

Verification:

- node bin/skill-graph.js doctor: 7/7 PASS
- npm run verify: exit 0
- npm run status:check: exit 0
- npm run audit-manifest:check: exit 1, 15 historical mismatches
  (expected, documented as separate red gate)
- node scripts/check-doc-drift.js: 54 active docs scanned against
  schema v8, 0 stale references
- node scripts/check-protocol-consistency.js: PASS (C7 field-reference
  parity preserved)

Follow-ups filed:

- SH-6573: Cold-start one-screen doc at top of SKILL_GRAPH.md
- SH-6546 (pre-existing): Ship ADR-0018 boundary→suppresses rename

References: skill-system audits 2026-05-26 (Opus + Codex);
docs/research/skill-system-current-state-addendum-2026-05-26.md (Codex
addendum, untracked); workspace AGENTS.md § Non-Negotiable Standards
#16-#17; .claude/rules/version-schema-contract.md § Companion rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backfill-field-purpose-comments.js now handles both physical encodings
of SKILL.md frontmatter per SKILL_METADATA_PROTOCOL.md § Two physical
encodings: (a) nested Agent-Skills-compatible (everything under metadata:
at 2-space indent) and (b) flat top-level (every field at root, no
metadata: block).

Detection: presence of a metadata: line at top level. Indent and start
position derived from that. The FIELD_COMMENTS map values stay encoding-
agnostic; the caller prepends the encoding's indent.

Closes the 2-file coverage gap from the initial codemod pass (commit
17615df), where methodical and task-path-optimization were correctly
skipped as 'no top-level metadata: block' — they use flat encoding.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tale local file

`skill-graph routing-eval` without --manifest used to read whatever
local skills.manifest.json existed (gitignored, drifts between runs).
On a stale local file the eval reports 9/9 false-fails that are pure
staleness, not real routing regressions.

The CLI help had advertised "(default: generate one)" but the
implementation never did. This commit makes the implementation match
the original intent: when --manifest is not given, regenerate a fresh
manifest to .skill-graph/_routing-eval-cli.manifest.json and run
against that. The committed `npm run routing-eval` script already
worked this way via an explicit --output flag; this brings the
standalone CLI to the same behavior.

- scripts/skill-graph-routing-eval.js: regenerate-on-demand path,
  with fallback to existing manifest if generation fails.
- bin/skill-graph.js: rewrite help to describe new behavior + add
  documentation for previously-undocumented flags (--skill, --json,
  --quiet, --confusion-matrix, --baseline).
- .gitignore: add the new CLI-fresh manifest path.

Closes audit finding B5 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… freshness claim

`node scripts/export-marketplace-skills.js --check` reported 152 stale
files. Re-ran the export; --check is now clean (no-op). The audit
(2026-05-27) confirmed the prior "verified 2026-05-20" claim in
SKILL_GRAPH.md was outdated.

Also updates the layout-note in SKILL_GRAPH.md to call out the
freshness-drift caveat: marketplace:verify is not currently part of
`npm run verify`, so source-edit / marketplace-regeneration drift is
invisible to CI between runs. That gate-expansion is tracked
separately in audit finding B7 / H4 (next commit).

Closes audit finding B6 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Integrity Gate

The npm run verify chain was missing four gates the README's Integrity
Gate definition claimed it covered: marketplace:verify, status:check,
drift, and audit-manifest:check. After the marketplace regenerate (B6
commit), the first two are genuinely green and safe to add to the
chain.

This commit:
- Adds marketplace:verify and status:check to the npm run verify chain
  (package.json).
- Adds marketplace-export-check to scripts/build-status-doc.js so the
  trust surface at docs/status.generated.md reports its state.
- Updates README's Integrity Gate definition (line 284) to describe
  what verify actually runs, and explicitly calls out why drift and
  audit-manifest:check are not yet in the chain — both surface
  CONTENT-side debt that is being drained through the audit loop
  (findings H9 and H10).
- Regenerates docs/status.generated.md (now 5 checks, all PASS).

Closes audit findings B7, H4, H11 (system-audit-2026-05-27).
H9 and H10 closure is tracked in their own CONTENT-side tickets.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nbook

SKILL_AUDIT_LOOP.md ships in the npm package per package.json:35, and
Part 3 references `node scripts/skill/...` commands (skill-audit-claim,
source-truth-catalog, skill-census, build-skill-audit-worklist,
skill-test-runner) that don't exist in @skill-graph/cli — they live in
the workspace tree at ~/Development/scripts/skill/. Standalone npm
consumers following the runbook verbatim hit "Cannot find module"
errors at the first claim step.

This adds an "Audience & runtime" preamble at the top of Part 3 that:
- Lists which workspace-orchestration scripts are not bundled (per
  ADR 0009 + ADR 0015 + ADR 0016 Proposed).
- Points to the canonical CLI entrypoints (skill-graph audit /
  improve / evaluate / evolve) and the canonical path corrections
  (scripts/skill/skill-lint.js → scripts/skill-lint.js;
  scripts/skill/evaluate-skill.js → lib/audit/evaluate-skill.js).
- Notes that a substantially complete audit is still runnable via
  `skill-graph audit <skill> --graded` for consumers without the
  workspace layer.

Leaves Part 3's runbook body intact for in-workspace operators, who
remain the primary audience. The preamble is the smallest fix that
removes the "package ships broken instructions" problem without
rewriting the entire runbook.

Also regenerates docs/status.generated.md to pick up the doc edit.

Closes audit finding B8 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… skill

AGENTS.md § What each command writes promised that the standalone
drift command writes the drift_status field to SKILL.md frontmatter,
but no code path did it. Grep across lib/ and scripts/ for any
write expression matching drift_status: returned zero hits outside the
schema and field-list iterators (audit B1, 2026-05-27). The result was
that every skill's drift_status stayed at its initial value (UNVERIFIED
or UNKNOWN) forever, regardless of how many drift sentinel runs landed.

This commit adds the missing write path:
- scripts/skill-graph-drift.js: new --write-verdict flag opts in to
  writing drift_status. The default check run remains read-only so a
  curious `npm run drift` doesn't surprise-mutate the skill tree. Maps
  per-skill report status to the schema-valid drift_status enum
  (OK / DRIFT / BROKEN / STALE / NO_BASELINE / EXTERNAL_UNHASHED);
  skips UNGROUNDED and NO_FRONTMATTER (not enum-valid).
- AGENTS.md § What each command writes: drift row now describes the
  opt-in semantics. The "Two integrity surfaces" paragraph is updated
  to resolve the internal contradiction the audit caught — the old
  text said standalone drift writes drift_status AND that standalone
  drift doesn't write to the Health Block, which can't both be true
  because drift_status IS a Health Block field. New text says
  standalone commands don't roll up to truth_verdict /
  structural_verdict, and that --write-verdict is the one explicit
  opt-in that may stamp drift_status.

Verified by smoke test: a skill with drift_status: UNKNOWN and a
deliberately-mismatched hash gets stamped drift_status: BROKEN after
`drift --write-verdict`.

Closes audit finding B1 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sment

The schema declared PROVISIONAL as a valid value for both
comprehension_verdict and application_verdict, but no code path
emitted it: the comprehension classifier returned PASS / REDUNDANT /
SHALLOW / SKIPPED_BASELINE_HIGH, and the application path filtered
PROVISIONAL out via APPLICATION_VERDICT_ENUM and normalised it to
UNVERIFIED. The result was that the confidence hierarchy
"APPLICABLE > PROVISIONAL > UNVERIFIED" collapsed to two tiers in
practice (audit B2, 2026-05-27).

This commit makes PROVISIONAL reachable when the run is an explicit
single-model self-assessment:
- BOOLEAN_FLAGS now accepts --single-model. The flag is plumbed into
  runComprehensionEval and stampApplicationVerdict.
- APPLICATION_VERDICT_ENUM now includes PROVISIONAL so it is no longer
  silently dropped by normalizeApplicationVerdict.
- Comprehension classifier downgrades PASS → PROVISIONAL when
  --single-model is set. Other verdicts (REDUNDANT / SHALLOW /
  SKIPPED_BASELINE_HIGH) are factual descriptions of the delta and
  pass through unchanged.
- stampApplicationVerdict downgrades APPLICABLE → PROVISIONAL when
  --single-model is set.

Default behaviour is unchanged: a run without --single-model continues
to emit PASS / APPLICABLE exactly as before. Opt-in is the conservative
shape — every existing call site keeps its current verdict, and
callers that know they are running a single-grader assessment can flip
the flag to record an honest PROVISIONAL.

Verified by re-running test-application-verdict-write-back.js — all 53
cases still pass.

Closes audit finding B2 (system-audit-2026-05-27). The companion gap
H2 (gate-8 comprehension writeback into SKILL.md frontmatter) remains
tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit findings B3 (legacy evaluate-skill.js not a delegator),
H3 (legacy skill-lint.js retains 14 deprecated check functions), and
M13 (legacy evaluate-skill.js hardcodes monorepo log paths) all require
editing files in the workspace tree at ~/Development/scripts/skill/.
Those files live in a different git repository than this skill-graph
branch and the workspace had ~20 unrelated dirty files at audit time,
so the right place for the actual conversion is a separate set of
workspace-master commits — not this branch.

This commit lands a precise, ready-to-apply closure plan under
docs/research/followups/ instead of leaving the findings unaddressed:
exact delegator bodies for both scripts, the env-var plumbing that
preserves the legacy `agent-orchestration/logs/` defaults, a
verification command set, and commit-shape guidance that respects
AGENTS.md's "one logical change per commit" rule and prevents the
dirty WIP files from being swept in.

Closes audit findings B3 / H3 / M13 (system-audit-2026-05-27) for the
in-skill-graph portion of the work. The matching workspace-side commits
remain outstanding and are intentionally not landed from this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nted, corpus is CONTENT debt

Re-verified the audit's B4 claim against current head:
- lib/audit/skill-audit.js:1086-1097 unconditionally writes
  structural_verdict, truth_verdict, last_audited, lint_verdict to
  the Health Block on every audit run, after both the stub and
  --graded branches close.
- updateFrontmatterFields is covered by
  test-application-verdict-write-back.js (53 cases passing).
- The 146 SKILL.md files with structural_verdict: UNVERIFIED are
  CONTENT debt — they reflect that audit has not yet run on each
  skill, not that the write code is broken. Per AGENTS.md
  § Sequencing, those migrations drain via the audit loop one
  skill at a time, not via a bulk SYSTEM commit.

The B4 narrow recommendation ("complete structural_verdict +
truth_verdict write-back") is closed. The companion comprehension /
application gap remains tracked under H2 + B2.

This commit lands the verification + status doc rather than a code
change because there is no SYSTEM-side gap left to fix here. The
audit's claim of "field-write commit path is not confirmed in all
paths" appears to predate the SH-6481 closure work in commits
9af8526 (truth_verdict) and fbdf598 (Health Block) — the line-1075
comment block in skill-audit.js is historical, not an open gap.

Closes audit finding B4 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes ADR 0011 § Addendum 2026-05-20 — gate 8 (the comprehension
grader) appended every result to comprehension-history.jsonl but never
wrote comprehension_verdict back to the skill's frontmatter, so the
Health Block stayed UNVERIFIED corpus-wide regardless of grader
output. Symmetric with the application_verdict stamp landed earlier.

This commit:
- Adds COMPREHENSION_VERDICT_ENUM, normalizeComprehensionVerdict, and
  stampComprehensionVerdict, modelled on the existing
  APPLICATION_VERDICT_ENUM / normalizeApplicationVerdict /
  stampApplicationVerdict trio.
- Wires the stamp call in main() right after the comprehension result
  is written to .cache/, before the incomplete-run exit gate. Honors
  --dry-run, skips on all-errored runs, skips when SKILL.md cannot be
  resolved from the eval path — same safety rules application uses.
- Exports the new symbols so future tests can import them the same way
  test-application-verdict-write-back.js imports the application
  trio (the comparable comprehension test belongs to a follow-up).

The PROVISIONAL downgrade landed in the B2 commit applies here too:
a single-model comprehension run with --single-model continues to
downgrade PASS → PROVISIONAL inside the classifier, and that
downgraded verdict is what stampComprehensionVerdict writes.

Verified by smoke test (normalize round-trips, ENUM length = 7
schema values) and re-running test-application-verdict-write-back.js
(53/53 passing, no regression on the symmetric application path).

Closes audit finding H2 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…dit H1)

Per audit H1, check-stability-promotion.js exited 0 unconditionally
regardless of warningCount, even though it was included in the verify
chain between charter:parity and manifest:validate. The audit
recommended either fail-loud on warnings or document the check as
advisory-only in package.json and SKILL_AUDIT_LOOP.md.

Going with documented-advisory + opt-in strict mode because the
warnings flag CONTENT debt (per-skill stability: stable claims missing
eval_state / eval_score / routing_eval evidence) that should drain
per-skill via the audit loop, not block every commit on every other
skill. Fail-loud on the current corpus would gate verify on a single
expected-value skill that has not been re-evaluated since promotion.

This commit:
- Expanded the header comment to spell out the advisory contract:
  default exits 0 on warnings; --strict flips to exit 1 on any warning.
- Added a --strict branch right before the final process.exit(0) so the
  flag is observably non-cosmetic and easy to grep.
- Added a stability:check:strict npm script for callers who want the
  release-blocking semantics without typing the flag every time.

Verified: default invocation still exits 0 with 3 warnings; --strict
exits 1 with the same 3 warnings; existing test-stability-promotion.js
still passes.

Closes audit finding H1 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
.github/workflows/skill-graph-lint.yml lines 17 and 32 carried the
literal string 'SKILL_AUDIT_LOOP.md § Part 2 — Per-Skill Audit
Checklist' inside the workflow's paths filter. GitHub Actions treats
paths entries as filename globs; it does not parse markdown anchors
or section ranges, so the entry never matched any changed file — pure
dead code.

The valid 'SKILL_AUDIT_LOOP.md' entry immediately below already fires
the workflow on any edit to that file, so removing the dead entries
doesn't change trigger behavior. The intent (fire only when § Part 2
changes) can't be expressed as a path glob and should be filed as a
separate ticket if section-scoped triggering is genuinely wanted.

Closes audit finding H5 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mplates

Per ADR 0009 § Update (2026-05-20), the `skill-metadata-protocol` and
`skill-audit-loop` sibling repositories were archived (read-only on
GitHub) when the protocol spec and audit-loop runbook consolidated into
this repo. The PR template and issue templates still listed both as
live cross-repo coordination targets, which led contributors to expect
coordinated PRs and discussions that aren't actually possible.

This commit:
- PULL_REQUEST_TEMPLATE.md: drops the two archived repos from the
  cross-repo impact checklist; adds a short inline note explaining
  the consolidation and naming the only remaining cross-repo target
  (the canonical SKILL.md source at `skills`).
- ISSUE_TEMPLATE/feature.yml: drops the two archived options from the
  ecosystem checkboxes; adds a description block explaining the
  consolidation.
- ISSUE_TEMPLATE/config.yml: re-points the Discussions and spec
  contact links from skill-metadata-protocol to skill-graph itself.
  External users following these links no longer land on an archived
  repo.

Closes audit finding H6 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…teger

Two audit findings closed together because they're the same artifact:
- H7: ADR 0016 was Proposed since 2026-05-25 with no acceptance status;
  P1 (lanes.json migration) had actually shipped on 2026-05-25 but the
  ADR's status line didn't reflect that. Move to Accepted, name the
  shipped surface, and call out that P2-P7 sequencing is the residual.
- M7: audits/lanes.json used semver "2.0.0" for schema_version while
  schemas/skill.schema.json uses integer 7/8, schemas/manifest.schema.json
  uses integer 4, and schemas/audits-manifest.schema.json uses integer 1.
  Normalized to integer 2 to match the convention; verified the file is
  still valid JSON and no caller reads schema_version from this file
  (grep across scripts/ + lib/ returned no matches).

Closes audit findings H7 + M7 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SKILL_METADATA_PROTOCOL.md § Inline field comments requires every
authored frontmatter field to carry a YAML `#` comment block
immediately above it (purpose + allowed values + when-to-use). The
convention exists to prevent the "this field looks like dead code,
let me propose deleting it" failure mode and to keep cold-start
agents from needing docs/field-reference.md at the point of contact.

scripts/skill-lint.js had no enforcement — a SKILL.md could strip
every field-purpose comment without failing lint. This commit adds
checkFieldPurposeComments() as the new check 5, scanning top-level
frontmatter fields and reporting each field whose preceding line
(walking back through blank lines) is not a `#` comment.

Severity: warning, not error. Corpus survey 2026-05-27 shows 0/154
skills currently have ALL fields commented and 152/154 have NONE
commented. Hard-failing lint would gate verify on the
backfill-field-purpose-comments.js CONTENT migration that has not
yet drained per-skill. Warnings render to stderr; the WARN file
label appears alongside the existing FAIL label; the exit code stays
0 unless `--strict` is set. `--strict` already existed as a flag —
this commit makes it observably non-cosmetic by failing exit on
warnings as well as errors.

The "skip first field" exclusion is deliberate: a comment "immediately
above" the first field of the frontmatter would have to live outside
the `---` block, which is not the convention. Subsequent fields are
all checked.

Verified: lint reports 154 warning(s) on the current corpus with
exit 0 (default) and exit 1 (--strict). Existing 5 lint checks
unchanged; the 4 templates and skills with full commenting (the new
ones authored from the template) are unaffected.

Closes audit finding H8 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…forces

Audit H12 (2026-05-27) noted three places that overclaimed lint
coverage relative to the 5 checks skill-lint.js actually runs:
- SKILL_METADATA_PROTOCOL.md:137 — routing_eval "gated by lint check
  12". No such check exists; routing_eval is enforced by `npm run
  routing-eval`, not by lint.
- SKILL_METADATA_PROTOCOL.md:585 — relation targets "validated by lint
  — a broken target is an error". Lint does not walk targets; the
  manifest compiler refuses to emit a relation to a non-existent skill
  (via `npm run manifest:validate`).
- SKILL_GRAPH.md:431 — "Tier 1 ↔ Tier 5 sample manifest: `skill-lint.js`
  check 8". No check 8 in current skill-lint.js; parity is enforced by
  `npm run manifest:validate`.

All three lint claims dated to the pre-2026-05-19 lint reduction (when
14 additional check functions were removed). This commit reconciles
the prose to the current 5-check (now 6 with the H8 field-purpose
addition) skill-lint.js contract, naming the enforcing gate explicitly
in each case so future readers don't re-derive the same wrong picture.

Closes audit finding H12 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit H13 (2026-05-27): scripts/check-protocol-consistency.js:5
header and SKILL_GRAPH.md:169 both claimed 8 checks (C1–C8). The
runner actually executes 7 (C1, C2, C3, C4, C5, C7, C8) — C6
"versioned schema parity" was retired with ADR-0014 because no
pinned-copy schema file exists on disk to drift against.

Updated the file header inventory and the SKILL_GRAPH.md row that
described the script's coverage. Verified the script still exits 0
after the comment edit. The runner's existing skip of C6 is
unaffected; this is doc reconciliation only.

Closes audit finding H13 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit H14 (2026-05-27) inventoried five active SYSTEM docs/artifacts
still describing the pre-v8 model:

- AGENTS.md:190 — described skill "same kind" via `category × type ×
  scope` (the v7 triple). Updated to `subject × operation × scope`
  (v8 per ADR-0017) with a back-link to the prior framing.
- docs/publish-workflow.md:17 — said skill-graph held the canonical
  `skills/<name>/SKILL.md` sources. ADR 0009 kept the canonical
  library at `~/Development/skills/`; only the tooling consolidated
  into skill-graph. Re-described both repos with their actual roles.
- docs/marketplace-syndication.md:11 — said "canonical artifacts stay
  in this repo: protocol-enriched skills/**/SKILL.md files". Same
  drift as publish-workflow — corrected to name the two-repo split.
- schemas/skill.context.jsonld:2,4 — header described "v7 frontmatter"
  and `_schema_version_target: 7`. Bumped to v8 with a note that v7
  fields stay projectable for back-compat.
- examples/skill-metadata-template.md:319 — described grounding
  requirement using v7 scope names `codebase` / `reference`. Updated
  to v8 names `project` / `workspace` with legacy aliases retained
  for back-compat.

These were doctrine drift, not corpus drift — every fix is a prose
clarification, not a behavior change. Schema validation, lint, and
manifest generation already understand v8; the docs just hadn't
caught up uniformly.

Closes audit finding H14 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ehension)

Audit findings H9 (6 drift-red skills — now 7 after re-verification
caught expected-value as additionally BROKEN) and H10 (15 graded-
comprehension claims missing evals/comprehension.json) are CONTENT-
side debt that per AGENTS.md § Sequencing must drain through the
audit loop, not via a SYSTEM commit touching N SKILL.md files.

This commit lands the recommended-tickets list as a single
followups doc with explicit /audit:audit runbook lines per skill,
the receipt IDs (for the comprehension-missing list), and a note
that two task-lifecycle receipts collapse to one audit run.

The corresponding SYSTEM-side gating (drift + audit-manifest:check
absent from npm run verify) closed under audit B7 in this same
branch — adding those gates back is now safely tied to the per-skill
drain rather than blocking the whole verify chain on one bad skill.

Closes audit findings H9 + H10 (system-audit-2026-05-27) on the
SYSTEM side. CONTENT-side closure remains per-skill /audit:* work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hook

The hook at scripts/check-work-mode-separation.js is intentionally
fail-open (exits 0 even when the warning fires) so it cannot block
a commit, but that means it has no exit-code-based regression catcher.
The hook could silently regress — drop the warning, misclassify a
path family, stop honoring AUDIT_LOOP=1 — and no one would notice
until a real SYSTEM+CONTENT mixed commit slipped through.

This commit adds scripts/__tests__/test-work-mode-separation.js with
16 assertions over 7 scenarios:
1. Empty file list — quiet exit 0
2. SYSTEM-only paths — no warning
3. CONTENT-only paths — no warning
4. Mixed SYSTEM+CONTENT — warning fires with both file lists shown
5. AUDIT_LOOP=1 suppresses the warning on the same mixed mix
6. audits/prompts/** classifies as SYSTEM (not CONTENT)
7. examples/audits/<skill>/ classifies as CONTENT (not SYSTEM)

All 16 cases pass against the current hook. The test is wired into
the npm run test:unit chain via package.json:78.

Closes audit finding M1 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d .generated.md

Audit M2 (2026-05-27): the doc-ownership map had separate rows for
docs/field-reference.md (hand-authored, 1,779 lines) and
docs/field-reference.generated.md (auto-generated, 748 lines), but no
named resolution for the reader question "which one do I open to pick
a value?" An author who lands on the generated file alone gets a
structurally valid but semantically incomplete picture.

Added a "Reader's canonical" sentence to the row for
`field-reference.md` naming it as the canonical for value-choice
criteria, and a mirror caveat on the `.generated.md` row pointing back.
Both rows now name each other so a reader who enters either gets
routed to the right one for their question.

Closes audit finding M2 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a SKILL.md is missing subject / operation / scope (the v8
5-axis classification fields per ADR-0017), skill-lint.js used to
emit a generic "required-missing" message that didn't tell the
author the violation was about v8 conformance:

  required-missing: `subject` is required by schemas/skill.schema.json

Now the same case fires a named v8-axis error that points to the
template + ADR:

  v8 axis missing: `subject` is one of the three required v8 5-axis
  classification fields (subject / operation / scope per ADR-0017).
  Add it via the field-purpose comment template in
  examples/skill-metadata-template.md.

Non-v8-axis required-missing errors keep the existing generic
message so the v8-named error stays diagnostic, not noise. Verified
by smoke-test on a fixture skill missing all three axes plus
`version` — the v8 axes get the new message, `version` keeps the
old one; `npm run lint --include-template` continues to pass on the
real corpus.

Closes audit finding M3 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…text (audit M4)

SKILL_METADATA_PROTOCOL.md § Relations § boundary carries a WARNING that
the field NAME reads "defer to them" but the MECHANIC is "exclude them
when I win" — and recommends ownership reason-text ("I own X over them")
not deference ("use that-skill for X"). The doctrine warning is in the
spec, but lint did not check for it, so authors who wrote the inverted
phrasing produced semantically wrong relations that still validated.

This commit adds checkBoundaryReasonText() as an advisory check
(severity: warn). For each boundary edge with a `reason` string,
it scans for deference phrasing:
- `use <something> instead/for`
- `defer to`
- `owned by`
- `that-skill (owns|handles|covers)`

When a match fires, the warning quotes the offending reason and suggests
ownership phrasing. Verified by fixture: a fake skill whose boundary
edge reads `reason: "use debugging for runtime errors"` produces the
expected warning; the real corpus continues to lint with 0 errors.

Closes audit finding M4 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jacob-balslev and others added 12 commits May 27, 2026 02:26
…s (audit M5)

The schema declared eval_state, eval_artifacts, and routing_eval as
independent axes but allowed incoherent combinations like
`eval_state: passing` with no `eval_artifacts: present`. The fields
were meant to be orthogonal in topic (routing-coverage vs content-
quality vs artifact-presence) but coherent across values — a `passing`
claim has to be backed by real evals on disk, otherwise it is the
same doc-lie shape as `application_verdict: APPLICABLE` without an
eval_last_run receipt.

Added two conditional rules to schemas/skill.schema.json `allOf`:
1. `eval_state: passing` → `eval_artifacts: present` required.
2. `eval_state: monitored` → `eval_artifacts: present` required.

`eval_state: unverified` still allows any eval_artifacts value
(legitimate authoring state for skills that haven't been graded yet).
routing_eval stays independent — its coverage signal comes from the
routing harness reading description-level examples/anti_examples,
not from on-disk eval JSON.

Verified: schema is valid JSON, npm run lint passes on the corpus
(no skill currently violates the new rules), npm run protocol:check
passes, regenerated docs/field-reference.generated.md.

Closes audit finding M5 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (audit M6)

Audit M6 (2026-05-27) recommended grepping all four audit-runner
prompts for `audit_verdict` (singular pre-v7) and the phrase
"v6 four-verdict". Survey results:

- audit_verdict (singular): zero hits across all 4 prompts. Already
  cleaned in an earlier sweep — leaving alone.
- "v6 four-verdict": zero hits. (The phrase doesn't appear; the audit
  was checking for it as a stale-shibboleth indicator.)
- "v6 Understanding fields" (single-model prompt:96): one hit; fixed
  to "five flat top-level Understanding fields (introduced v6,
  canonical v8)".
- "Concept Card" (renamed to "Concept of the skill" on 2026-05-26):
  two hits across single-model:100 and codex-autonomous-v5:211;
  fixed to reference the new heading with a back-link.

batch-worker-v4 and minimal-iteration prompts were scanned and carry
neither stale label. No other fixes needed.

Closes audit finding M6 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ame (audit M8, M15)

schemas/audits-manifest.schema.json had three under-specified fields:
- runner.version was free-form (audit M8): a typo or stale entry like
  "version: vfour" or "version: 6" (when the runner is actually at v5)
  passed schema validation silently.
- artifact_rule.name had no character constraint (audit M15): a typo
  like "findings .md" (stray space) would pass schema and then never
  match an on-disk file.
- artifact_rule.when had no syntax constraint (audit M15): a typo like
  "skill.comprehension_verdict in [PROVISIONAL]" (missing single
  quotes around the enum value, expected by check-audit-manifest.js
  after the 2026-05-25 enum-leak fix) passed schema and produced a
  predicate the verifier could not evaluate.

Tightenings:
- runner.version regex `^(v?\d+(\.\d+)*\+?|\d+\.\d+)$` accepts the
  current shapes (v3, v3+, v4, v5, 1.0) and rejects typos that miss
  the leading optional v and a numeric segment.
- artifact_rule.name regex `^[a-zA-Z0-9._-]+$` is the filename-safe
  set; spaces, paths, or shell metas now fail at schema time.
- artifact_rule.when regex constrains the four supported predicate
  shapes: `always`, `skill.<field> in [<list>]`,
  `skill.<field> == '<value>'`, and
  `runner.(id|mode) (==|startswith) '<value>'`. Free-form predicates
  were the loophole the audit M15 named; the verifier already only
  understands these four shapes.

Verified: audits/manifest.json still validates against the tightened
schema (each existing value matches one of the new regexes), and
`node scripts/check-audit-manifest.js` exits with the same RED state
as before — the 15 missing-comprehension-artifact failures are the
existing H10 CONTENT-debt, not schema-validation failures.

Closes audit findings M8 and M15 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The seven runners that drive the four audit operations
(improve / evaluate / evolve / status) had no unit tests:
run-skill-improvement-loop, skill-evolution-loop, skill-status,
eval-staleness-checker, batch-eval, eval-linter, skill-test-runner.
Verdict write-back edge cases were caught only by integration runs
or manual testing; a syntactic regression or sibling-require break
would ship silently.

This commit adds scripts/__tests__/test-lib-audit-smoke.js with 14
assertions:
- 7 × `node --check` syntax pass.
- 7 × `node <runner> --no-such-flag-xyz` does not panic with
  "Cannot find module", "UnhandledPromiseRejection", or
  "TypeError: Cannot read" — the regression classes the audit M9
  finding cares about.

Subprocess invocation, not require(), because several runners call
their main() at load time and process.exit() would kill the test
mid-run. Wired into npm run test:unit.

NOT full unit coverage — that is the follow-up ticket. This catches
the "runner refactor silently broke its require()" class.

Closes audit finding M9 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… (audit M10)

Both scripts are part of `npm run verify` (or its expanded chain
after the B7 commit) but had no unit tests. Refactoring either was
"run CI and pray" because the only catch was the integration gate.

This commit adds scripts/__tests__/test-verify-gate-scripts.js
with 14 assertions:

build-status-doc.js:
- node --check passes (syntax)
- exports readSchemaVersion / readSkillCount / renderMarkdown / runCheck
- readSchemaVersion returns a non-empty schema version
- renderMarkdown round-trips a hand-built state with PASS and FAIL
  checks and includes schema_version + both check labels in the
  output markdown

check-audit-manifest.js:
- node --check passes (syntax)
- a real subprocess run with the live manifest exits with a status
  (not a hang / not a panic) and emits output naming the audit-
  manifest concepts (comprehension / missing skills / audit /
  manifest). The current RED state from H10 CONTENT-debt is fine —
  the test only asserts the surface is named.

Wired into npm run test:unit.

Closes audit finding M10 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…audit M11)

Audit M11 (2026-05-27): ADR-0007 specified a weekly
skill-graph-routing-eval.js pass; the actual practice is event-driven
(three audits on 2026-05-25 / 26 triggered by the multi-model
restructure review, not by a clock). Either accept the event-driven
shape and amend the ADR or commit to the weekly rhythm; currently
neither is true.

This commit amends. Three reasons spelled out in the new section:
1. The verify-chain routing-eval already runs on every commit
   (`npm run routing-eval`), so per-skill regressions are caught at
   commit-time, not at end-of-week.
2. A weekly cron would add work without adding signal — most weeks
   have no SYSTEM change worth re-routing.
3. The multi-model audit pattern (2026-05-25 / 26) is the right
   shape for major restructures and should repeat when the next
   one lands, not get pre-scheduled.

If a future operational gap proves the event-driven shape misses
skills that need re-routing review, file a new ADR.

Closes audit finding M11 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…o audit loop

Audit M12 (2026-05-27): the 302-skill flat-layout migration sequenced
priorities P0–P7. P0 (4 schema_version orphans) closed 2026-05-25;
P1–P7 had no commits in the 53-commit post-audit window.

P1–P7 cannot land as a SYSTEM commit in this branch — five of the
seven priorities (P2–P7 minus the one codemod-extend ticket) are
explicitly CONTENT-side per-skill rewrites that AGENTS.md § Sequencing
forbids batching into a SYSTEM commit. The work belongs inside
/audit:* runs, one skill at a time, with per-skill Health Block
evidence. Additionally, the migration touches the canonical library
at ~/Development/skills/, a different repo.

This commit lands the SYSTEM-side closure plan: explains why P1–P7
are deferred, lists which audit-remediation-2026-05-27 commits unblock
each priority (B2/H2 for honest verdicts, B1 for drift_status,
H8/M3 for lint), and breaks out P1 as the residual SYSTEM ticket
(extend the migrate codemod to walk the flat tree) vs P2–P7 as
the CONTENT drain.

Closes audit finding M12 (system-audit-2026-05-27) on the SYSTEM
side. CONTENT-side P1–P7 drain remains per-skill /audit:* work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…M14)

audits/migration-mapping-v7-to-v8.json:4 referenced
/Users/jacobbalslev/.claude-profiles/jacobbalslev01/plans/we-should-clearly-look-wondrous-firefly.md
— a workspace-local path that does not exist in any other clone or
in CI. Replaced the value with null and added a plan_note explaining
where the plan actually lives (operator's local plans dir; this file
is codemod run-output, not a planning pointer) and how to populate
the field if a plan is committed.

Closes audit finding M14 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…covery (audit M16)

scripts/check-charter-parity.js previously hardcoded a 4-entry
candidate list at lines ~93-97. Audit M16 (2026-05-27): new active
mirrors added under WORKSPACE_ROOT without editing the script would
silently escape the parity check.

Replaced the hardcoded list with a directory scan of WORKSPACE_ROOT.
Every top-level sibling whose AGENTS.md exists and carries the
canonical charter marker is inspected (the extractCharter call
already filters out files without the marker). Discovery is shallow —
the charter marker always lives at the top of a sibling repo, so no
recursion is needed.

Verified: `npm run charter:parity` exits 0 and reports the same
"skill-audit-loop/AGENTS.md (MIRROR_ARCHIVED)" warning as before
(consistent with ADR 0009 — that mirror is archived and the warning
is expected behavior, not a regression).

Closes audit finding M16 (system-audit-2026-05-27).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closure pass over audit L1–L5 (system-audit-2026-05-27):

L1 — schema_version oneOf integer+string sunset window not named:
  Added a clarifying sentence to the schema_version description in
  schemas/skill.schema.json:34 — authors SHOULD write integer; the
  string back-compat form will be dropped no earlier than the v8→v9
  phase. Names the deprecation horizon the audit asked for.

L2 — bin/skill-graph.js subcommands vs package.json scripts split:
  Added a one-paragraph clarification in AGENTS.md § Project Shape
  naming the audiences (public CLI vs internal CI gate) and the
  edit rule (preserve both surfaces).

L3 — .skill-graph/config.json not in package.json files array:
  Verified negative — the file hardcodes a relative sibling path
  (`../skills/skills`) that does not exist on a fresh-clone install
  outside the canonical two-repo workspace layout. Shipping it
  would point every npm consumer's lint at a non-existent path.
  Intentional exclusion; not fixed.

L4 — SKILL_METADATA_PROTOCOL.md restates schema $id as prose:
  Verified negative — current SKILL_METADATA_PROTOCOL.md contains no
  $id duplication (`grep -n "\$id\|skillgraph.dev\|json-schema.org"`
  returns zero hits). The audit may have been based on an earlier
  version; gap does not exist now.

L5 — marketplace/skills/ not verified in CI:
  Closed by audit B7 (marketplace:verify added to npm run verify
  earlier in this branch).

Closes audit findings L1, L2, L4 (system-audit-2026-05-27).
L3 + L5 verified negative / closed elsewhere.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rate

Re-verification after the full audit-remediation pass surfaced a
regression in the T1 (audit B5) routing-eval CLI fix: generate-manifest
exits non-zero on v8 skills that lack the legacy v7 category/type
fields (e.g., expected-value, which was added during this branch),
but it still writes the manifest file. The previous catch block
threw away the freshly-written manifest and fell back to the stale
on-disk file, re-creating the staleness problem B5 was meant to fix.

This commit changes the fallback condition: only fall back when the
generator produced NO output file at all. Validation warnings get a
single-line stderr note ("CONTENT-debt expected during v7→v8") and
the eval proceeds with the fresh manifest, which is what consumers
actually want.

Also regenerates docs/field-reference.generated.md so
`npm run protocol:check § C7` is clean — that regeneration was missed
after the M5 + L1 schema description edits.

Verified: `node scripts/skill-graph-routing-eval.js --only-asserted`
now reports 10/10 PASS (was reporting 7 PASS / 2 FAIL due to the
fall-back to stale manifest) and `node scripts/check-protocol-consistency.js`
exits 0.

Follow-up to T1 (audit B5).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 27, 2026 00:47
@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown

Important

Review skipped

Too many files!

This PR contains 212 files, which is 62 over the limit of 150.

To get a review, narrow the scope:
• coderabbit review --type committed # exclude uncommitted changes
• coderabbit review --dir # limit to a subdirectory
• coderabbit review --base # compare against a closer base

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b1a04d71-f9c2-46c0-beba-6677f2396701

📥 Commits

Reviewing files that changed from the base of the PR and between cff57d7 and c26ee1e.

📒 Files selected for processing (212)
  • .github/ISSUE_TEMPLATE/config.yml
  • .github/ISSUE_TEMPLATE/feature.yml
  • .github/PULL_REQUEST_TEMPLATE.md
  • .github/workflows/skill-graph-lint.yml
  • .gitignore
  • AGENTS.md
  • README.md
  • SKILL_AUDIT_LOOP.md
  • SKILL_GRAPH.md
  • SKILL_METADATA_PROTOCOL.md
  • audits/expected-value/findings.md
  • audits/expected-value/scorecard.md
  • audits/expected-value/verdict.md
  • audits/lanes.json
  • audits/migration-mapping-v7-to-v8.json
  • audits/prompts/skill-audit-loop-codex-autonomous-v5.md
  • audits/prompts/skill-audit-loop-single-model.md
  • bin/skill-graph.js
  • docs/ADOPTION.md
  • docs/AUTHORING-QUICKSTART.md
  • docs/PRIMER.md
  • docs/QUICKSTART-30MIN.md
  • docs/adr/0007-audit-loop-cadence.md
  • docs/adr/0016-operational-data-ownership.md
  • docs/field-reference.generated.md
  • docs/field-reference.md
  • docs/manifest-field-mapping.md
  • docs/marketplace-syndication.md
  • docs/publish-workflow.md
  • docs/quality-doctrine.md
  • docs/research/followups/b3-legacy-script-delegation-2026-05-27.md
  • docs/research/followups/b4-verdict-writeback-status-2026-05-27.md
  • docs/research/followups/content-side-audit-tickets-2026-05-27.md
  • docs/research/followups/m12-flat-layout-migration-status-2026-05-27.md
  • docs/skill-audit-loop-executable-map.md
  • docs/skill-metadata-protocol.md
  • docs/status.generated.md
  • examples/skill-metadata-template.md
  • lib/audit/evaluate-skill.js
  • lib/audit/skill-audit.js
  • marketplace/skills/a11y/SKILL.md
  • marketplace/skills/acid-fundamentals/SKILL.md
  • marketplace/skills/agent-engineering/SKILL.md
  • marketplace/skills/agent-eval-design/SKILL.md
  • marketplace/skills/ai-native-development/SKILL.md
  • marketplace/skills/api-design/SKILL.md
  • marketplace/skills/architecture-decision-records/SKILL.md
  • marketplace/skills/autonomous-loop-patterns/SKILL.md
  • marketplace/skills/background-jobs/SKILL.md
  • marketplace/skills/bayesian-reasoning/SKILL.md
  • marketplace/skills/best-practice/SKILL.md
  • marketplace/skills/bounded-context-mapping/SKILL.md
  • marketplace/skills/cap-theorem-tradeoffs/SKILL.md
  • marketplace/skills/client-server-boundary/SKILL.md
  • marketplace/skills/code-review/SKILL.md
  • marketplace/skills/cognitive-load-theory/SKILL.md
  • marketplace/skills/color-system-design/SKILL.md
  • marketplace/skills/component-architecture/SKILL.md
  • marketplace/skills/compression/SKILL.md
  • marketplace/skills/conceptual-modeling/SKILL.md
  • marketplace/skills/connection-pooling/SKILL.md
  • marketplace/skills/constraint-awareness/SKILL.md
  • marketplace/skills/content-monitor/SKILL.md
  • marketplace/skills/context-engineering/SKILL.md
  • marketplace/skills/context-graph/SKILL.md
  • marketplace/skills/context-management/SKILL.md
  • marketplace/skills/context-window/SKILL.md
  • marketplace/skills/contract-testing/SKILL.md
  • marketplace/skills/cron-scheduling/SKILL.md
  • marketplace/skills/dark-mode-implementation/SKILL.md
  • marketplace/skills/data-modeling-fundamentals/SKILL.md
  • marketplace/skills/data-modeling/SKILL.md
  • marketplace/skills/database-migration/SKILL.md
  • marketplace/skills/debugging/SKILL.md
  • marketplace/skills/dependency-architecture/SKILL.md
  • marketplace/skills/design-module-composition/SKILL.md
  • marketplace/skills/design-system-architecture/SKILL.md
  • marketplace/skills/design-thinking/SKILL.md
  • marketplace/skills/diagnosis/SKILL.md
  • marketplace/skills/diff-analysis/SKILL.md
  • marketplace/skills/e2e-test-design/SKILL.md
  • marketplace/skills/entity-relationship-modeling/SKILL.md
  • marketplace/skills/epistemic-grounding/SKILL.md
  • marketplace/skills/error-boundary/SKILL.md
  • marketplace/skills/error-tracking/SKILL.md
  • marketplace/skills/eval-driven-development/SKILL.md
  • marketplace/skills/evaluation/SKILL.md
  • marketplace/skills/event-contract-design/SKILL.md
  • marketplace/skills/event-storming/SKILL.md
  • marketplace/skills/first-principles-thinking/SKILL.md
  • marketplace/skills/form-ux-architecture/SKILL.md
  • marketplace/skills/framework-fit-analysis/SKILL.md
  • marketplace/skills/frontend-architecture/SKILL.md
  • marketplace/skills/generative-ui/SKILL.md
  • marketplace/skills/guardrails/SKILL.md
  • marketplace/skills/hooks-patterns/SKILL.md
  • marketplace/skills/http-semantics/SKILL.md
  • marketplace/skills/ideation/SKILL.md
  • marketplace/skills/indexing-strategy/SKILL.md
  • marketplace/skills/information-architecture/SKILL.md
  • marketplace/skills/integration-test-design/SKILL.md
  • marketplace/skills/intent-recognition/SKILL.md
  • marketplace/skills/interaction-feedback/SKILL.md
  • marketplace/skills/interaction-patterns/SKILL.md
  • marketplace/skills/inversion/SKILL.md
  • marketplace/skills/journey-mapping/SKILL.md
  • marketplace/skills/keywords/SKILL.md
  • marketplace/skills/knowledge-modeling/SKILL.md
  • marketplace/skills/layout-composition/SKILL.md
  • marketplace/skills/linguistics/SKILL.md
  • marketplace/skills/lint-overlay/SKILL.md
  • marketplace/skills/mental-models/SKILL.md
  • marketplace/skills/merge-queue/SKILL.md
  • marketplace/skills/methodical/SKILL.md
  • marketplace/skills/methodology/SKILL.md
  • marketplace/skills/microcopy/SKILL.md
  • marketplace/skills/middleware-patterns/SKILL.md
  • marketplace/skills/mobile-responsive-ux/SKILL.md
  • marketplace/skills/mutation-testing/SKILL.md
  • marketplace/skills/naming-conventions/SKILL.md
  • marketplace/skills/observability-modeling/SKILL.md
  • marketplace/skills/ontology-modeling/SKILL.md
  • marketplace/skills/owasp-security/SKILL.md
  • marketplace/skills/pattern-recognition/SKILL.md
  • marketplace/skills/performance-budgets/SKILL.md
  • marketplace/skills/performance-engineering/SKILL.md
  • marketplace/skills/performance-testing/SKILL.md
  • marketplace/skills/playing-to-win/SKILL.md
  • marketplace/skills/porters-five-forces/SKILL.md
  • marketplace/skills/printify/SKILL.md
  • marketplace/skills/prioritization/SKILL.md
  • marketplace/skills/problem-approach-router/SKILL.md
  • marketplace/skills/problem-framing/SKILL.md
  • marketplace/skills/problem-locating-solving/SKILL.md
  • marketplace/skills/project-knowledge-extraction/SKILL.md
  • marketplace/skills/prompt-craft/SKILL.md
  • marketplace/skills/prompt-injection-defense/SKILL.md
  • marketplace/skills/property-based-testing/SKILL.md
  • marketplace/skills/prototyping/SKILL.md
  • marketplace/skills/query-optimization/SKILL.md
  • marketplace/skills/real-time-updates/SKILL.md
  • marketplace/skills/ref-patterns/SKILL.md
  • marketplace/skills/refactor/SKILL.md
  • marketplace/skills/rendering-models/SKILL.md
  • marketplace/skills/replication-patterns/SKILL.md
  • marketplace/skills/research-synthesis/SKILL.md
  • marketplace/skills/route-handler-design/SKILL.md
  • marketplace/skills/schema-evolution/SKILL.md
  • marketplace/skills/second-order-thinking/SKILL.md
  • marketplace/skills/security-fundamentals/SKILL.md
  • marketplace/skills/semantic-center/SKILL.md
  • marketplace/skills/semantic-relations/SKILL.md
  • marketplace/skills/semantics/SKILL.md
  • marketplace/skills/semiotics/SKILL.md
  • marketplace/skills/seo-strategy/SKILL.md
  • marketplace/skills/server-actions-design/SKILL.md
  • marketplace/skills/server-components-design/SKILL.md
  • marketplace/skills/seven-powers/SKILL.md
  • marketplace/skills/sharding-strategy/SKILL.md
  • marketplace/skills/shopify/SKILL.md
  • marketplace/skills/skill-infrastructure/SKILL.md
  • marketplace/skills/skill-router/SKILL.md
  • marketplace/skills/skill-scaffold/SKILL.md
  • marketplace/skills/snapshot-testing/SKILL.md
  • marketplace/skills/spec-driven-development/SKILL.md
  • marketplace/skills/state-machine-modeling/SKILL.md
  • marketplace/skills/state-management/SKILL.md
  • marketplace/skills/streaming-architecture/SKILL.md
  • marketplace/skills/summarization/SKILL.md
  • marketplace/skills/suspense-patterns/SKILL.md
  • marketplace/skills/system-interface-contracts/SKILL.md
  • marketplace/skills/task-analysis/SKILL.md
  • marketplace/skills/task-path-optimization/SKILL.md
  • marketplace/skills/taxonomy-design/SKILL.md
  • marketplace/skills/test-coverage-strategy/SKILL.md
  • marketplace/skills/test-doubles-design/SKILL.md
  • marketplace/skills/test-driven-development/SKILL.md
  • marketplace/skills/testing-strategy/SKILL.md
  • marketplace/skills/theme-system-design/SKILL.md
  • marketplace/skills/tool-call-flow/SKILL.md
  • marketplace/skills/tool-call-strategy/SKILL.md
  • marketplace/skills/transaction-isolation/SKILL.md
  • marketplace/skills/type-safety/SKILL.md
  • marketplace/skills/typography-system/SKILL.md
  • marketplace/skills/usability-testing/SKILL.md
  • marketplace/skills/user-research/SKILL.md
  • marketplace/skills/vercel-composition-patterns/SKILL.md
  • marketplace/skills/version-control/SKILL.md
  • marketplace/skills/visual-design-foundations/SKILL.md
  • marketplace/skills/visual-hierarchy/SKILL.md
  • marketplace/skills/webhook-integration/SKILL.md
  • marketplace/skills/writing-humanizer/SKILL.md
  • package.json
  • schemas/audits-manifest.schema.json
  • schemas/skill.context.jsonld
  • schemas/skill.schema.json
  • scripts/__tests__/test-lib-audit-smoke.js
  • scripts/__tests__/test-marketplace-export.js
  • scripts/__tests__/test-verify-gate-scripts.js
  • scripts/__tests__/test-work-mode-separation.js
  • scripts/backfill-field-purpose-comments.js
  • scripts/build-status-doc.js
  • scripts/check-charter-parity.js
  • scripts/check-doc-drift.js
  • scripts/check-protocol-consistency.js
  • scripts/export-marketplace-skills.js
  • scripts/export-skill.js
  • scripts/generate-manifest.js
  • scripts/lint/check-stability-promotion.js
  • scripts/skill-graph-drift.js
  • scripts/skill-graph-routing-eval.js
  • scripts/skill-lint.js

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch audit-remediation-2026-05-27

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kilo-code-bot

kilo-code-bot Bot commented May 27, 2026

Copy link
Copy Markdown

Code Review Summary

The review did not run because the selected model is no longer available.

Choose another model in Kilo Code review settings: https://app.kilo.ai/code-reviews

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR closes 33 findings from the 2026-05-27 system audit across 34 logical commits. It hardens the v8 5-axis classification contract (formerly compat-window), wires several previously-orphaned gates into the verify chain, and adds verdict write-back paths (drift, comprehension, single-model PROVISIONAL downgrade) along with smoke/integration test coverage for runners that previously had none.

Changes:

  • Promotes v8 axes (subject/operation/scope) to schema-required, deprecates v7 classification fields, and reconciles a wide set of docs/ADRs/templates to the post-2026-05-26 v8 state.
  • Adds verdict write-back: drift --write-verdict, stampComprehensionVerdict, and PROVISIONAL downgrade for --single-model application runs; extends lint with named v8-axis errors, boundary reason-text warnings, and field-purpose-comment advisory.
  • Expands the verify chain (marketplace:verify, status:check) and adds smoke/integration tests for lib/audit/*, verify-gate scripts, and the work-mode-separation hook; files CONTENT-side follow-up tickets for drift/comprehension debt.

Reviewed changes

Copilot reviewed 210 out of 212 changed files in this pull request and generated 1 comment.

Show a summary per file

Given the very large scope (≈180 files), only the files with substantive system-side changes are summarized here; the ~152 regenerated marketplace exports are mechanical and not individually listed.

File Description
SKILL_METADATA_PROTOCOL.md Rewrites the v7/v8 framing to "phase ended 2026-05-26"; adds the "Inline field comments" authoring convention.
SKILL_GRAPH.md / SKILL_AUDIT_LOOP.md Updates freshness claim, adds Part 3 audience preamble for workspace-only scripts.
lib/audit/evaluate-skill.js Adds PROVISIONAL to application enum + --single-model downgrade; introduces stampComprehensionVerdict.
lib/audit/skill-audit.js Re-verified structural/truth verdict write-back path.
scripts/skill-lint.js Adds named v8-axis errors, boundary-deference warnings, and field-purpose-comment check; --strict escalates warnings.
scripts/skill-graph-drift.js Adds --write-verdict opt-in to stamp drift_status.
scripts/skill-graph-routing-eval.js Auto-generates a fresh manifest when --manifest is omitted.
scripts/build-status-doc.js Adds the verify-chain status:check aggregator (drift/audit-manifest deliberately excluded).
scripts/export-marketplace-skills.js Drops historical skill_graph_protocol provenance key.
scripts/check-charter-parity.js Switches from hardcoded mirror list to dynamic WORKSPACE_ROOT discovery.
scripts/check-protocol-consistency.js / check-doc-drift.js Reconciled consistency claims (7 checks; C6 retired).
scripts/lint/check-stability-promotion.js Documents advisory-by-default, --strict opt-in.
scripts/generate-manifest.js / export-skill.js / backfill-field-purpose-comments.js Supporting tool updates for v8 axes and field-purpose backfill.
scripts/tests/test-{work-mode-separation,verify-gate-scripts,lib-audit-smoke,marketplace-export}.js New tests: 16 work-mode assertions, 14 verify-gate, 14 lib/audit smoke, marketplace export coverage.
schemas/skill.schema.json Adds PROVISIONAL to comprehension/application enums; deprecates v7 classification fields.
schemas/audits-manifest.schema.json Tightens runner.version, artifact name pattern, and when-clause syntax.
schemas/skill.context.jsonld v8 axis context updates.
audits/lanes.json Normalizes schema_version to integer.
audits/migration-mapping-v7-to-v8.json Replaces user-local plan path.
audits/prompts/*.md Grep-cleans stale v6 "Concept Card"/"Understanding fields" framing.
audits/expected-value/*.md New expected-value scorecard/findings/verdict.
package.json Adds marketplace:verify + status:check to npm run verify; adds stability:check:strict.
README.md Rewrites Integrity Gate definition to match what verify actually runs.
.github/workflows/skill-graph-lint.yml Removes malformed paths filter entries.
.github/PULL_REQUEST_TEMPLATE.md / ISSUE_TEMPLATE/* Removes deprecated mirror references.
.gitignore Adjustments related to auto-generated routing-eval manifest.
bin/skill-graph.js Wires routing-eval auto-generation path.
docs/adr/0016 / 0007 Status promotion (Proposed → Accepted) and cadence amendment (event-driven).
docs/field-reference.md(.generated.md) Names canonical resolution in doc-ownership map.
docs/{PRIMER,ADOPTION,QUICKSTART-30MIN,AUTHORING-QUICKSTART,publish-workflow,marketplace-syndication,manifest-field-mapping,quality-doctrine,skill-metadata-protocol,skill-audit-loop-executable-map}.md v8-state reconciliations and stale-claim corrections.
docs/research/followups/*.md Filed CONTENT-side / workspace-side follow-up tickets (B3, H9/H10, M12, B4).
docs/status.generated.md Regenerated; still records check-protocol-consistency + marketplace-export-check as FAIL.
examples/skill-metadata-template.md Adds field-purpose comments; keeps # TEMPLATE NOTE: stripping convention.
AGENTS.md Documents bin/ vs package.json scripts audience split; confidence hierarchy.
marketplace/skills/**/SKILL.md (~152 files) Regenerated marketplace exports under the post-B6 export pipeline.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1517 to +1520
if (options.singleModel && verdict === 'APPLICABLE') {
console.log('\n[write-back] --single-model set: downgrading APPLICABLE → PROVISIONAL (single-model assessment, not dual-run confirmed).');
verdict = 'PROVISIONAL';
}
jacob-balslev added a commit that referenced this pull request May 31, 2026
Per AGENTS.md § Major Version Is a Clean Cut, migration codemods and
deprecated-field checkers do not survive the cut. Their history lives
in git.

Deletions:
- scripts/migrate-skill-v7-to-v8.js — the v7→v8 codemod, ~22K. Its
  output is now invalid (it still authored `operation:`). Anyone
  re-needing the migration logic recovers via
  `git show f88603d^:scripts/migrate-skill-v7-to-v8.js`.
- scripts/lint/check-category-enum.js — enforces a v(N-1) field that
  no longer exists in the schema. The check itself was the symptom of
  "deprecated optional" framing; with the field gone, the check goes.

Updates:
- package.json: drop `category:check` script and its slot in the
  `verify` chain.
- scripts/backfill-field-purpose-comments.js: delete the `operation:`,
  `type:`, `category:` comment-template blocks. Rename the
  v8-classification section divider from "5-axis model" to the
  current axes (subject + scope; polyhierarchy via subjects[]).
  Rename the eval-health divider to "Evaluation Status" per the
  2026-05-27 f88603d doctrinal change #2.
- scripts/skill-lint.js: update the V8_AXES surrounding comment from
  "v8 5-axis classification per ADR-0017" to the post-retire wording
  ("v8 classification — subject + scope"). The set itself was already
  correct.
- .github/workflows/skill-graph-lint.yml: remove the dead `skills/**`
  path filter entries (no such directory exists in this repo
  post-ADR-0009 consolidation). Add `lib/**`, `marketplace/**`, and
  `AGENTS.md` to the filter — they're load-bearing surfaces missing
  from the prior filter.
jacob-balslev added a commit that referenced this pull request May 31, 2026
Drives the real bin/skill-graph.js surfaces — audit, evaluate, evolve, and the
init create path — against a fixture skill in a hermetic temp workspace, with a
stubbed model CLI on PATH (no real model), asserting on-disk verdict/receipt
transitions. The regression net that makes "done" mean "the real loop ran and
wrote real state", not "the code reads correctly".

Catches the three breaks from the 2026-05-30 end-to-end review:
- #1 (evolve scaffold path silent-fail): evolve --analyze-only runs end-to-end +
  a guard asserts the scaffold script the engine references exists.
- #2 (evaluate verdict write opt-in): evaluate with DEFAULT flags must move the
  Health Block on disk. Confirmed comprehension_verdict + freshness already write
  by default; the eval_score-by-default assertion is the FORCING FUNCTION for
  Step 3 (still gated behind --write-verdict at evaluate-skill.js:2061) — RED
  until Step 3 lands, flips green automatically.
- #3 (verdict without artifact): evaluate must leave a durable on-disk receipt.

Currently 13/14 assertions pass; the 1 red is Break #2 by design (no false-green).
Wired into `npm run verify` only once Step 3 turns it green — a red test in the
shared gate would block parallel sessions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jacob-balslev jacob-balslev merged commit c26ee1e into main May 31, 2026
2 of 4 checks passed
jacob-balslev added a commit that referenced this pull request May 31, 2026
…ep2-before-Step3 ordering

Step 0b contract test authored (skill-graph@4a3de1f), 13/14 pass. Records the
verified Break #2 refinement: comprehension_verdict + freshness already write by
default; only v6 eval_score/eval_failed_ids remain gated behind --write-verdict
(the 1 red assertion = forcing function for Step 3). Verify-wiring deferred until
Step 3 turns it green (no red test in the shared gate). Documents the verified
Step2-before-Step3 ordering: flipping the default before receipts are durable
persists verdicts on an ephemeral .cache foundation. Next: Step 2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request May 31, 2026
… Step 2 next

Appends an UPDATE section recording this session (Step 1 de-fork + evolve-meaning
reconciliation, Step 0b contract test 13/14, SH-6642 filed, Break #2 half-fixed
refinement, Step2-before-Step3 ordering) and a superseding continuation prompt.
The stale prior-session prompt is marked historical; the plan Progress log is the
live record.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request May 31, 2026
…rable path

Step 3: invert the v6 Health Block write-back (eval_score / eval_failed_ids /
freshness) from opt-in (--write-verdict) to persist-by-default, with --dry-run as
the single opt-out. An unattended loop no longer silently produces zero durable
verdicts because it forgot a flag (Break #2). --write-verdict is retained as a
harmless no-op alias. The bin/skill-graph.js help already described this
("Writes (when not --dry-run): ...") — the code now matches it (resolves E2).

Also fixes a portability regression the test:unit standalone guard caught in the
Step-2 durable-receipts commit (3ed080d): the durable eval-result path was
anchored to a hardcoded SKILL_GRAPH_REPO_ROOT (path.resolve(__dirname,..,..)),
which test-standalone-pipeline.js forbids in lib/ (breaks npm install -g). Now
resolved via the portable log-paths.js LOG_DIR (monorepo agent-orchestration/logs
or standalone .skill-graph/logs), with the receipt artifact path relative to
WORKSPACE. test:unit green (exit 0).
jacob-balslev added a commit that referenced this pull request May 31, 2026
… 3 final)

The model-free black-box contract test (test-public-cli-loop-contract.js) now
passes all 14 assertions — the eval_score-by-default forcing-function flipped
green once Step 3 inverted the write-verdict default. Added it to the test:unit
chain so it runs under both npm run verify and verify:system as the public-CLI
loop guard against Break #1/#2/#3 regressions. Updated the test header + the
in-body forcing-function comment to record Step 3 closed.
jacob-balslev added a commit that referenced this pull request Jun 3, 2026
…ble 2026-05-30)

Explains recommendation #2 from the 2026-05-30 Skill Graph direction
roundtable: separate marketplace-publishable (export gate) from
behaviorally-certified (application_verdict: APPLICABLE), and surface the
verdict to consumers as a browse/sort filter.

Grounds the proposal in primitives that already exist (the export gate,
application_verdict in audit-state.json, marketplace_priority) and names
the real gap: the public 6-field export strips application_verdict, so a
consumer on skills.sh cannot see or filter behavioral certification.

Full deliberation: ~/Development/.roundtable/skill-graph-2026-05-30/SYNTHESIS.md

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request Jun 3, 2026
…map (correction)

User review caught that roundtable recommendation #2 conflated two
unrelated subsystems and was written in cryptic jargon. Corrected:

- Publishing ("can it go public?") = deployment_target (general vs
  private) + the privacy/secret scanner. Already built. Quality is
  never read by the exporter (verified).
- Quality ("is it any good?") = the Behavior Gate / application_verdict,
  owned by the Audit & Evaluation system, in the audit-state.json
  sidecar. Deliberately decoupled from publishing (ADR-0011 publishes
  untested skills labeled "behavior unvalidated"; marketplace_tier is a
  separate, non-quality-derived field).

The old "release states" framing fused the two. Renamed off the wrong
filename (git mv from proposals/) and rewritten as a plain-language map
a non-insider can follow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request Jun 10, 2026
drain_worklist looped on skill-audit-claim 'next', but 'next' filters on the
STATIC worklist status (SKILL_LIST.json) which a ledger completion never
refreshes -> it re-returned the just-completed top skill forever (verified:
api-design re-enriched as drain #2 after committing 982212c). Now iterates the
ranked SKILL_LIST array once (advances on any outcome), skips already
panel-enrich-committed skills (resume-safe), claims for dedup, and rebuilds the
worklist after each skill. Verified via --dry-run (99 eligible, api-design
skipped, advances to code-review).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request Jun 11, 2026
…ll surface (board #2)

New scripts/scan-skill-security.js (npm run security:scan): the complementary
signal to the privacy gate. Privacy (privacy-patterns.js) stops the author
private data leaking OUT at export; this scans published marketplace/skills
bodies for execution/exfiltration patterns (curl|bash, base64-decode-and-eval,
reverse shells, fork bombs, broad rm -rf, curl data-exfil, eval-of-fetched)
and over-broad allowed-tools (P1 */all, P2 Bash(*), P3 bare shell;
Bash(git:*) is clean).

Advisory by default (exit 0 -- teaching skills legitimately show shell patterns
as anti-examples); --strict hard-gates. Wired into release:check (advisory,
release-time, NOT verify:system); unit test in test:unit so the logic is
blocking-tested. First run: 180 scanned, 37 P3 bare-Bash advisories, 0 exec
matches. Decided as a scan gate, not a 5th verdict.

Also fixes a stale privacy-patterns.js comment naming a non-existent
.github/workflows/privacy-scan.yml.

Verified: 0.12s scan (no ReDoS); test-scan-skill-security green; package.json
valid; release:check chain unbroken.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request Jun 11, 2026
… public repo

The prior #2 commit (7868c60) wrongly declared .github/workflows/privacy-scan.yml
non-existent after checking only skill-graph workflows. The L3 pre-push hook and
L4 CI privacy-scan workflow + ci-privacy-scan.js live in the PUBLIC jacob-balslev/
skills repo (sibling ~/Development/skills/), verified present. Correct the
privacy-patterns.js comment and the CHANGELOG entry to say so. No-unverified-claims
fixup caught during /wrap stale-ref grep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jacob-balslev added a commit that referenced this pull request Jun 11, 2026
Records the /boardmeeting board findings, what the parallel session has
already committed (board #2/#3/#6/#15 + Cluster 1-5), what remains
(SYSTEM: alias delete, description-cap, anti-loss test, release:ready,
scope notes, citations), and the CONTENT drains (certify-seed publish
gate, displacement, admission, shelf rebalance) routed via /audit:*.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants