Skip to content

chore: reconcile skills + scenario-traces + casus leak-guard#785

Open
humanintheloop25 wants to merge 15 commits into
mainfrom
chore/reconcile-skills
Open

chore: reconcile skills + scenario-traces + casus leak-guard#785
humanintheloop25 wants to merge 15 commits into
mainfrom
chore/reconcile-skills

Conversation

@humanintheloop25

@humanintheloop25 humanintheloop25 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Wat

Voegt de uiteengelopen user-level en engine-repo kopieën van de regelrecht-skills samen tot één canonieke set, en borgt dat er geen trajectinformatie naar deze publieke repo lekt.

Waarom

De skills bestonden in twee kopieën (~/.claude/skills/ en .claude/skills/ hier) die uit elkaar waren gelopen. Per-bestand gereconcilieerd:

  • stelselanalyse / dossier / audit-products: de engine-versies waren inhoudelijk vooruit (Step 0 drift-check, source-refs integriteitsscan, striktere private-repo-guard, "Bewijs-poort"). De scenario-traces-kruisverwijzingen die alléén user-level bestonden zijn erin geport.
  • Onderweg bleek elk user-level bestand een stray </content>-artefact te bevatten; de engine-versies zijn schoon.

Wijzigingen

  • regelrecht-scenario-traces als nieuwe, dossier-agnostische skill (9 bestanden).
  • ✏️ scenario-traces-verwijzingen geport in stelselanalyse (SKILL.md, engine-tests.md), dossier (SKILL.md, routing.md), audit-products (testcase-scenarios.md).
  • 🔒 script/check-skills-no-casus.sh + pre-commit hook skills-no-casus: een leak-guard die casus-specifieke inhoud in .claude/skills/ blokkeert (de repo is publiek). Hybride opzet: het publieke script bevat alleen domein-loze structuur-vormen en draait altijd (fail-closed); de casus-/sector-specifieke patronen staan in een git-ignored aanvulbestand en worden in CI via een secret aangeleverd — nooit in de publieke repo. Zo publiceert de guard niet zelf wat hij moet weren.

Verificatie

  • Leak-guard groen; skills dossier-agnostisch (onafhankelijke review: geen organisatie-/casus-namen, geen persoonsdata, geen private paden).
  • /security-review op de diff: geen bevindingen.

claude[bot]
claude Bot previously requested changes Jun 10, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross-law integrity gate + leak-guard — review

Security & correctness

🟠 Significant — check-skills-no-casus.sh: fixed /tmp path is a race condition

if grep -rEnI "$PATTERN" "$SKILLS" > /tmp/skills_casus_leak 2>/dev/null; then
    sed "s#${ROOT}/##" /tmp/skills_casus_leak >&2
    ...
    rm -f /tmp/skills_casus_leak

/tmp/skills_casus_leak is a shared path. Two concurrent jobs on the same GitHub-hosted runner pool can race: job B's grep overwrites job A's file mid-flight, causing either a garbled error message or a false-pass (rm between write and read). Use $(mktemp):

LEAKFILE=$(mktemp)
if grep -rEnI "$PATTERN" "$SKILLS" > "$LEAKFILE" 2>/dev/null; then
    ...
    rm -f "$LEAKFILE"
    exit 1
fi
rm -f "$LEAKFILE"

🟠 Significant — PLAIN_MARKERS in cross-law-integriteit.py contradicts law-generate/SKILL.md

law-generate/SKILL.md (new section, this PR) lists three forbidden plain-param markers:

"conceptueel", "forward naar", or "tijdelijk als directe parameter"

cross-law-integriteit.py explicitly excludes the middle one:

# Note: do NOT include generic words like "forward naar" — those legitimately
# describe leaf parameters that FEED a binding's parameters-mapping
PLAIN_MARKERS = ('conceptueel', 'tijdelijk als directe parameter')

A description containing "forward naar [andere regeling] output Y" without a source: block silently passes the CI gate but violates the skill's stated rule. The Python comment's rationale is reasonable — "forward naar" can appear in legitimate parameter descriptions — but the skill text doesn't reflect this distinction. One of the two must be updated: either remove "forward naar" from the SKILL.md's forbidden list, or add a note that it's deliberately excluded from the automated gate with the reason.


Minor

🟡 cross-law-integriteit.py — unclosed file handle and implicit encoding

doc = yaml.safe_load(open(path))

open() is never closed; on a large corpus this can exhaust file descriptors under concurrent processes. The broad except Exception: continue silently swallows encoding errors too — a law file with a platform-encoding mismatch is silently skipped rather than flagged. Use:

with open(path, encoding='utf-8') as f:
    doc = yaml.safe_load(f)

🟡 CI: pip install pyyaml unpinned

An uncontrolled pyyaml upgrade (including a future major release) could silently change YAML parsing behaviour or break the script. Pin to a tested range, e.g. pip install "pyyaml>=6.0,<7", or use a requirements-ci.txt file.


🟡 check-skills-no-casus.sh: comment says "Woordgrenzen" but pattern has no \b

# Casus-markers (uitbreidbaar). Woordgrenzen waar een losse term anders zou matchen.
PATTERN='HHNK|Hollands Noorderkwartier|Noorderkwartier|regelrecht-corpus-(HHNK|BES)|WS[0-9]{4}'

The comment implies word-boundary anchors, but the grep pattern has none. WS[0-9]{4} would match inside longer tokens (e.g. AWS1234). Either add \b anchors around the relevant terms or drop the misleading comment. Low practical risk in this directory, but the comment is incorrect.


Not flagged

The changes.outputs.ci guard on the new cross-law-integrity job is fine — corpus/regulation/** is in the ci filter, so corpus-only PRs will trigger the gate.

The doc comment added to coverage_score in models.rs is clear and accurate.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Preview Deployment — harvester-admin

Your changes have been deployed to a preview environment:

URL: https://harvester-admin-pr785-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Preview Deployment — harvester-worker

Your changes have been deployed to a preview environment:

URL: https://harvester-worker-pr785-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

Preview Deployment — enrichworker

Your changes have been deployed to a preview environment:

URL: https://enrichworker-pr785-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl

This deployment will be automatically cleaned up when the PR is closed.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review

This PR adds skill documentation (reviewed as guidance prose, not executable code), a Python corpus-integrity CI gate, two shell scripts (a push-guard + its smoke-test), a pre-commit hook, and a Rust doc comment. Only the executable artefacts are reviewed below.


Shell scripts

script/check-skills-no-casus.sh

🟡 Minor — Layer 1 regex only catches uppercase corpus-repo names

PUBLIC_PATTERNS=(
  'regelrecht-corpus-[A-Z][A-Za-z0-9-]*'
)

[A-Z] requires the first character after regelrecht-corpus- to be uppercase. Standard GitHub repo names are all-lowercase (e.g. regelrecht-corpus-waterschap-hhnk). If the real private corpus repos follow that convention, layer 1 matches nothing and the entire guard falls back to layer 2 (which is absent in fork PR runs and on machines without the supplement file). The inline comment claims the pattern "catches every current and future corpus dossier without naming any of them" — that claim is only true when the repos happen to start with [A-Z]. If uppercase is a guaranteed naming convention for all corpus repos, document it; otherwise widen to [A-Za-z0-9].

.claude/skills/regelrecht-stelselanalyse/scripts/assert-private-repo.sh and test-assert-private-repo.sh

Logic and test coverage look correct. The explicit github.com-only host check, the case "$slug" in */*) slug validation, the fail-closed retry loop, and the 11 smoke-test cases cover the stated invariants.


Python (cross-law-integriteit.py)

🟡 Minor — str(None) produces spurious IMPL-DANGLING

tlaw, tart, term = im.get('law'), str(im.get('article')), im.get('open_term')
# …
elif term not in open_terms_idx[tlaw].get(tart, set()):
    impl_dangling.append(…)

If an implements dict is missing the article key (malformed YAML that passes yaml.safe_load), str(im.get('article')) evaluates to the string 'None'. The index stores article numbers as '1', '2', …, so open_terms_idx[tlaw].get('None', set()) always returns the empty set → false IMPL-DANGLING for every such entry. Schema validation should prevent this in practice, but the guard is fragile. Prefer an explicit None-check and skip or report a parse error:

tart_raw = im.get('article')
if tart_raw is None:
    # malformed implements entry — skip or raise
    continue
tart = str(tart_raw)

🟡 Minor — unclosed file handles in YAML-loading loop

doc = yaml.safe_load(open(path))

open(path) is never closed. On large corpora this exhausts the per-process file-descriptor limit. Use a context manager:

with open(path) as f:
    doc = yaml.safe_load(f)

CI (.github/workflows/ci.yml)

The new cross-law-integrity job and the secret-provisioning step in pre-commit look correct. Secret value is written with printf '%s\n' (not echo -e, no format-string risk), is never logged, and the fork-PR warning path is explicit. Order of steps (provision → pre-commit/action) is correct.


packages/pipeline/src/models.rs

Documentation-only addition. No issues.

humanintheloop25 and others added 13 commits June 16, 2026 14:58
Voegt drie samenhangende Claude-skills toe onder .claude/skills/:

- regelrecht-dossier: front-door router die een binnenkomende casus triageert
  en routeert (feitelijke defecten -> desk, oordeels-/praktijkvragen -> workshop),
  met routing.md (canonieke flow) en zakkaart.md (onboarding-A4 voor analisten).
- regelrecht-stelselanalyse: desk-review & corpus-completion-cyclus met 4-weg-
  classificatie, multi-agent review, verificatie, /loop- en /schedule-ondersteuning
  en een fail-closed private-only push-guard.
- regelrecht-audit-products: producten voor live expert-validatie (scope-analyse,
  per-artikel audit-doc, facilitatiekit, testcases, verslagen).

Dossier-agnostisch: geen casus-specifieke referenties.
Adds the law-version-drift-check skill alongside the existing desk-layer skills.
It runs as Step 0 of every regelrecht-stelselanalyse micro-cycle and detects drift
between a YAML's text blocks (plus competent_authority names and url anchors) and
the binding geldende wettekst at the YAML's valid_from.

Core principle: the geldende wettekst is leading and is mirrored verbatim,
including errata; one allowed normalization (whitespace within a single
paragraph). Two diff axes per article: structural first (lid- and onderdeel-
counts), then textual at word level. Verdicts: CLEAN / DRIFT-tekst /
DRIFT-structureel / NIET-VERIFIEERBAAR. Strict by design — no bypass; partial
failure produces scope-restrictie (frozen articles), not a green report.

Calibration ijkpunten live in the consuming corpus, not in the skill, keeping
the skill dossier-agnostisch. Findings flow into resolutie-tracker as 4-weg
klasse 1 (modellering-fout).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- assert-private-repo.sh: expliciete host-check vóór slug-extractie;
  niet-github.com remotes (GitLab, GHE, Bitbucket) blokkeren fail-closed.
  Voorheen strippeerde de sed-keten elke host blind en queryde github.com,
  waardoor een toevallig matchende private repo daar GUARD OK gaf voor
  een push naar een ander forge.
- assert-private-repo.sh: sleep 1 tussen retry-pogingen, zodat de retry
  zin heeft bij transiënte gh-haperingen.
- scheduled-routine.md: heading 'geïnstalleerd' → 'lokaal instellen' +
  expliciete noot dat de hook machine-lokaal is en niet met de repo
  meereist.
- routing.md: ASCII-diagram opgeschoond (hangende rand opgeruimd,
  terugkoppel-pijl recht).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ram placeholders, no dangling bindings)

- law-generate: hard rule that every cross-law input needs a real source:
  block, never just a description; add target output first if missing;
  multiple source: bindings per article are supported (schema v0.5.2);
  name-mismatch rule for source.output vs local name.
- regelrecht-stelselanalyse: mandatory non-skippable source-refs integrity
  scan algorithm; dangling + plain-param are always modelling errors, never
  engine limitations; add reusable references/cross-law-integriteit.py.
- law-reverse-validate: treat a source binding to a non-existent target
  output as a hallucination (dangling-binding check).
- engine-limitaties template: a claimed engine limitation may only be
  recorded after a reproducible failing engine run; untested assumptions are
  open questions, not limitations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r parameters)

The most common hidden cross-law defect: a source: block placed under
parameters: instead of input:. The engine's Parameter struct has no source
field, so the binding is silently dropped at parse time and degrades to a
plain caller-supplied parameter; scenarios that inject the value directly
mask it.

- cross-law-integriteit.py: rewritten YAML-based (robust vs regex). Adds
  MISPLACED detection (source under parameters:), keeps DANGLING (target
  output missing) and PLAIN-PARAM (conceptueel/tijdelijk marker without
  source). Drops the over-eager 'forward naar' marker (false-positives on
  legitimate leaf parameters that feed a binding mapping). Exit!=0 as CI gate.
- law-generate: hard 'Section placement' rule — a source: MUST live under
  input:, never parameters:; leaf inputs that feed source.parameters stay
  under parameters:; verify with a BDD scenario that loads the target law
  and sets leaf inputs so the dependent output flips.
- regelrecht-stelselanalyse: source-refs integrity scan now reports
  clean/misplaced/dangling/plain-param; MISPLACED listed as the most common
  hidden form; all three are always modelling errors, never engine limits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s Step 0

- Remove trailing </content> (40 files) and </invoke> (audit-products SKILL)
  leaked from the authoring tool-call format. All files keep their final newline.
- Integrate law-version-drift-check into regelrecht-stelselanalyse: new Step 0
  section + Werkstroom item 0, a Zustersskills cross-reference block, and a
  Step 0 gate in references/cycle-workflow.md (section + volgorde-diagram).
  Closes the orphan where the drift-check called itself a hard prerequisite
  while the owning skill never referenced it.
…truth, ASCII

- Add scripts/test-assert-private-repo.sh: stubs git/gh and asserts the guard
  invariants (gitlab/enterprise/bitbucket/public/internal/no-remote/unreachable
  → block, github private → pass). 11 cases, all green. Referenced from SKILL.md.
- Point the two 4-weg-classificatie copies that lacked it (dossier SKILL,
  zakkaart) at classification.md as the canonical definition source, containing
  definition-drift while keeping each skill standalone-readable.
- Redraw the routing.md flow diagram: the return-loop now forms one continuous
  vertical (┐│┘ aligned), annotations moved to an omlaag/omhoog legend below.
- Extend cross-law-integriteit.py with IoC checks: IMPL-DANGLING (implements
  pointing at an undeclared open_term) and IMPL-NO-DATE (implementing regulation
  without valid_from, which RFC-003's temporal filter then matches for every date).
- Add a `cross-law-integrity` CI job running the script over corpus/regulation as
  a fail-closed gate (gap 1); add the script path to the `ci` paths filter.
- Clarify coverage_score is a COMPLETENESS metric, not correctness (doc-comment
  on LawEntry.coverage_score) (gap 4).
- Add references/check-keten.md: full inventory of zuiverheid/cross-law checks,
  each with its oracle (waartegen) and whether it is a CI-gate or methodological.
  Documents why text-fidelity (gap 3) needs a separate golden-text mechanism.
… van de wet

Nieuwe casus-agnostische skill: auditeert of een machine_readable-model trouw is aan de LETTER
van de wettekst en scheidt wettekst (leidend) strikt van toelichting (uitleg). Detecteert vier
afwijkingsklassen (toelichting-bleed, ontbrekend verbatim-bestanddeel, verkeerde legal_basis-
verankering, verbatim-drift), checkt positief-lid vs uitsluitings-lid + de "in afwijking van"-
precedence in de chapeau, en classificeert elke bevinding als modelfout / wettekst-gevolg /
letter-vs-toelichting-vraag (jurist beslist wat leidend is + herzieningssignaal). Fideliteit
vóór outcome-toetsing, om outcome-bias te vermijden. Verwijst naar law-version-drift-check,
regelrecht-stelselanalyse en law-generate.
…-ontologie

Beslist en audit het referentietype van een waarde (cross-law / intra-law source-binding /
open-norm leaf / extern-feit leaf) en vangt begrip-conflatie tussen wetten. Kern: het type volgt
uit WIE het afleidt; een binding is sound alleen als producer en consumer hetzelfde BEGRIP
aanduiden — het woord is niet het begrip. Detecteert false friends (één term, verschillende
begrippen: register-gebaseerd vs feitelijke "naar de omstandigheden"-weging), orphan plain-params,
negatie/complement-re-entry (¬X als losse leaf naast X) en duplicatie (één begrip als meerdere
onafhankelijke leafs). Levert een law-level glossary + stelsel-level concept-map met sound
same-concept-edges en gemarkeerde lexical-match-but-concept-mismatch-edges. Cardinale regel: een
register mag een feitelijke open norm voeden (indicator), nooit stilzwijgend vervangen; symmetrie
over een stelsel. Orthogonaal aan law-letter-fidelity-audit; bovenop regelrecht-stelselanalyse.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Voegt de uiteengelopen user-level en engine-repo skill-kopieën samen tot één
canonieke set. Engine-versies waren inhoudelijk vooruit (Step 0 drift-check,
source-refs integriteitsscan, striktere private-repo-guard); de scenario-traces-
kruisverwijzingen die alleen user-level bestonden zijn erin geport.

- regelrecht-scenario-traces toegevoegd als nieuwe (dossier-agnostische) skill
- script/check-skills-no-casus.sh + pre-commit hook 'skills-no-casus' die
  casus-specifieke inhoud (organisatie-/sector-namen, codes, private repo-namen)
  blokkeert zodat trajectinformatie niet in de publieke skills lekt
…atronen lokaal

Het publieke script bevatte letterlijke casus-/sector-namen (zelf-onthullend in
een publieke repo). Nu twee lagen:
- laag 1 (publiek): domein-loze structuur-vorm (corpus-repo-naamvorm), draait
  altijd, fail-closed — geen vals-groen bij ontbrekend aanvulbestand;
- laag 2 (git-ignored / CI-secret): casus-/sector-specifieke patronen, nooit in
  de publieke repo.

Geverifieerd: detectie in beide lagen, geen false positives, laag 1 vangt ook
zonder aanvulbestand.
Voorziet het git-ignored aanvulbestand in de pre-commit-job vanuit het secret
SKILLS_CASUS_DENYLIST, zodat de skills-no-casus hook in CI op volle sterkte
draait zonder casus-/sector-patronen in de publieke repo te zetten. De waarde
wordt nooit gelogd; ontbreekt het secret (forks) dan draait alleen laag 1 met
een expliciete waarschuwing (geen vals 'volledig gescand').
Comment thread .github/workflows/ci.yml Outdated
# are all modelling errors. The script exits non-zero on any finding.
- name: Check cross-law binding integrity
run: |
python3 .claude/skills/regelrecht-stelselanalyse/references/cross-law-integriteit.py corpus/regulation

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Productie-CI hangt nu rechtstreeks aan een script binnen .claude/skills/. Dat koppelt de corpus-gate aan skill-content: wie deze skill verplaatst/hernoemt/opschoont breekt stil de corpus-CI. Andere corpus-tooling staat in script/ — overweeg dit script daarheen te verplaatsen (en de CI + paths-filter daarnaar te laten wijzen), zodat de gate niet afhankelijk is van de skills-map.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goed punt — cross-law-integriteit.py is verplaatst naar script/; de CI-run en de paths-filter wijzen nu daarheen, en de skill-docs ook. De expliciete skills-pad-regel in de filter was redundant naast script/** en is verwijderd. Komt mee in de merge.

Comment thread .github/workflows/ci.yml
# De waarde wordt nooit gelogd. Ontbreekt het secret (bv. PR's vanuit forks,
# die geen toegang tot secrets hebben), dan draait alleen laag 1 — expliciet
# gelogd, zodat een groene run nooit "volledig gescand" suggereert.
- name: Provision leak-guard supplement (laag 2)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let op: het secret SKILLS_CASUS_DENYLIST bestaat nog niet in de repo (alleen ADMIN_TOKEN, CLAUDE_CODE_OAUTH_TOKEN, GEMINI_API_KEY, RIG_API_KEY). Zolang dat secret leeg is, draait deze stap altijd de else-tak en valt de guard terug op laag 1 — dat ene regelrecht-corpus-... patroon dat vandaag niets matcht. Met andere woorden: de leak-guard doet effectief niets totdat deze denylist is aangemaakt. Graag het secret vullen (en evt. een lokaal script/.skills-casus-denylist.local voor pre-commit) voordat dit als bescherming geldt.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Terecht. De casus-/sector-patronen staan bewust niet in de publieke repo (hybride opzet); de bescherming werkt pas zodra de externe opslag gevuld is. Actie: we maken het secret SKILLS_CASUS_DENYLIST aan en houden script/.skills-casus-denylist.local lokaal voor pre-commit. Laag 1 (regelrecht-corpus-…) is de generieke baseline-tripwire die ook zonder secret afgaat bij een nieuwe corpus-repo-naam, en de stap logt een waarschuwing als het secret leeg is, zodat een groene run geen volledige scan suggereert.

…-gate van skills)

De cross-law-integrity CI-gate draaide een script binnen .claude/skills/, waardoor
het hernoemen/opschonen van die skill stil de productie-corpus-CI zou breken (review
@tdjager). Verplaatst naar script/ bij de overige corpus-tooling; CI-run en
paths-filter wijzen nu daarheen (de expliciete skills-pad-filterregel was redundant
naast script/** en is verwijderd). Skill-docs verwijzen naar het nieuwe pad.
@github-actions github-actions Bot deleted a comment from claude Bot Jun 16, 2026
@humanintheloop25 humanintheloop25 enabled auto-merge (squash) June 16, 2026 15:04

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #785

Reviewed dimensions: script correctness, CI gate integrity, shell guard logic, and the pipeline model comment. The skill/template Markdown content (the bulk of the diff) falls outside REVIEW.md scope — it carries no executable or schema-parseable logic.


Script correctness

script/cross-law-integriteit.py

🟡 Minor — unclosed file handle (cross-law-integriteit.py line 8 in the new file, open(path) without a context manager):

doc = yaml.safe_load(open(path))

The handle is never explicitly closed. This is a resource leak; on a large corpus it risks hitting OS file-descriptor limits before GC collects them. Replace with:

with open(path) as f:
    doc = yaml.safe_load(f)

🟡 Minor — silent ok for reg ≠ None, out = None cross-law binding (cross-law-integriteit.py, the else-branch of the input loop):

if reg not in law_outputs or (out is not None and out not in law_outputs[reg]):
    dangling.append(...)
else:
    ok += 1

When reg is set but out is None, the out is not None short-circuit means only the existence of the target law is checked, not any output. If a source binding with regulation but no output is schema-invalid (the law-generate skill always shows both fields), this case should be flagged as DANGLING rather than counted as clean. At minimum it should continue instead of ok += 1, so it is not silently reported as a verified binding.


script/check-skills-no-casus.sh

🟡 Minor — pattern requires uppercase-initial suffix (line 7 of the new file in the PUBLIC_PATTERNS array):

'regelrecht-corpus-[A-Z][A-Za-z0-9-]*'

The grep call does not use -i (the -I flag is for binary-file exclusion, not case-folding), so this pattern only catches names where the character immediately after regelrecht-corpus- is uppercase. A future private corpus repo named regelrecht-corpus-amsterdam (all-lowercase) would slip through. The PR description claims the pattern catches "elk huidig én toekomstig corpus-dossier" — that claim only holds while the naming convention is enforced out-of-band. Consider regelrecht-corpus-[A-Za-z][A-Za-z0-9-]* or add a note in the script that the convention (uppercase initial) is load-bearing.


CI gate integrity — OK

The cross-law-integrity job's if: needs.changes.outputs.ci == 'true' is correct: the changes job's ci filter covers corpus/regulation/** and script/**, so the gate runs on every corpus YAML change. The sys.exit(1 if (...) else 0) at the end of the Python script is present and correct — the gate is properly fail-closed.

Leak-guard / pre-commit hook — OK

pass_filenames: false + language: system is the right combination for a whole-directory scan. The laag 1 / laag 2 design (public domein-loze patterns + git-ignored supplement provisioned from a CI secret) is sound. Fork PRs that lack the secret run laag 1 only, which is explicitly logged — no silent degradation.

packages/pipeline/src/models.rs — OK

The added doc comment on coverage_score is accurate and the COVERAGE ≠ CORRECTNESS distinction is worth stating.

@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Code review — PR #785


Harvester / pipeline / corpus

packages/pipeline/src/models.rs

Only change is a doc comment on coverage_score. The comment correctly separates coverage (completeness) from correctness and points to the right verification layers. No functional change; nothing to flag.


Script / CI

script/cross-law-integriteit.py

🟡 Minor — unclosed file handles

doc = yaml.safe_load(open(path))   # line ~12 of the new file

open() without a context manager leaks the file descriptor for every YAML file loaded. CPython's reference-counting GC will close them soon enough on a small corpus, but as the corpus grows this could approach the OS default of ~1 024 open FDs per process. Change to:

with open(path) as fh:
    doc = yaml.safe_load(fh)

🟡 Minor — PLAIN-PARAM detection gap in input: block

The PLAIN_MARKERS check (conceptueel, tijdelijk als directe parameter) only iterates over parameters: items:

for p in ex.get('parameters', []) or []:
    ...
    if any(mk in d for mk in PLAIN_MARKERS):
        plain.append(...)

The docstring says a PLAIN-PARAM is "an input whose description references another regulation… but has no source: block at all". That definition covers input: entries too. An input: item with a conceptual marker and no source: block passes through undetected. The current coverage is still better than nothing; just worth noting as a future-detection gap.

🟡 Minor — misleading clean=N counter in output line

ok is incremented only for clean input: source-bindings and clean implements: bindings. Plain parameters: entries (which have no binding to validate) are never counted. The summary line clean=N misplaced=… dangling=… therefore understates "clean" relative to the total items scanned. Not a correctness issue for the gate logic, but could confuse consumers of the output.

.github/workflows/ci.ymlcross-law-integrity job

The path filter (ci output, which includes corpus/regulation/** and script/**) is correct. The job will fire on corpus changes and on changes to the script itself. No skip-echo step is added for the ci != 'true' case; all other jobs in the file do have one, but this is a cosmetic inconsistency, not a functional issue — a job-level if: skip is equivalent.

No pip caching for pyyaml was added, but pyyaml is small enough that the cold install on each run is negligible.

script/check-skills-no-casus.sh + .pre-commit-config.yaml

The two-layer design (public structural pattern + git-ignored supplement) is sound. The pass_filenames: false + files: filter combination is the right pre-commit pattern here — the hook is triggered by .claude/skills/ changes but scans the whole directory. Fail-closed via set -euo pipefail and the explicit exit 1 on any hit. No issues.

.claude/skills/regelrecht-stelselanalyse/scripts/assert-private-repo.sh

The host-check (case "$url" allowing only git@github.com:*, https://github.com/*, ssh://git@github.com/*) combined with the slug extraction and retry loop is robust. The sleep 1 between retries is intentional and present. The test harness (test-assert-private-repo.sh) covers gitlab, GHE, bitbucket, no-remote, and unreachable-gh scenarios — all 11 cases.


No findings in

  • Skill documentation files (.claude/skills/**) — dossier-agnostisch by design; no organisation names, private paths, or BSN-format strings present
  • .gitignore addition
  • All new skill templates and reference files

Summary: three minor findings in cross-law-integriteit.py (unclosed FDs, PLAIN-PARAM gap in input:, misleading clean counter). No critical or significant issues. The shell scripts, CI configuration, and Rust change are clean.

@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Inline — script/cross-law-integriteit.py, file handle leak

# Current (leaks FDs on large corpus):
doc = yaml.safe_load(open(path))

# Fix:
with open(path) as fh:
    doc = yaml.safe_load(fh)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants