chore: reconcile skills + scenario-traces + casus leak-guard#785
chore: reconcile skills + scenario-traces + casus leak-guard#785humanintheloop25 wants to merge 15 commits into
Conversation
There was a problem hiding this comment.
Cross-law integrity gate + leak-guard — review
Security & correctness
🟠 Significant — check-skills-no-casus.sh: fixed /tmp path is a race condition
if grep -rEnI "$PATTERN" "$SKILLS" > /tmp/skills_casus_leak 2>/dev/null; then
sed "s#${ROOT}/##" /tmp/skills_casus_leak >&2
...
rm -f /tmp/skills_casus_leak/tmp/skills_casus_leak is a shared path. Two concurrent jobs on the same GitHub-hosted runner pool can race: job B's grep overwrites job A's file mid-flight, causing either a garbled error message or a false-pass (rm between write and read). Use $(mktemp):
LEAKFILE=$(mktemp)
if grep -rEnI "$PATTERN" "$SKILLS" > "$LEAKFILE" 2>/dev/null; then
...
rm -f "$LEAKFILE"
exit 1
fi
rm -f "$LEAKFILE"🟠 Significant — PLAIN_MARKERS in cross-law-integriteit.py contradicts law-generate/SKILL.md
law-generate/SKILL.md (new section, this PR) lists three forbidden plain-param markers:
"conceptueel", "forward naar", or "tijdelijk als directe parameter"
cross-law-integriteit.py explicitly excludes the middle one:
# Note: do NOT include generic words like "forward naar" — those legitimately
# describe leaf parameters that FEED a binding's parameters-mapping
PLAIN_MARKERS = ('conceptueel', 'tijdelijk als directe parameter')A description containing "forward naar [andere regeling] output Y" without a source: block silently passes the CI gate but violates the skill's stated rule. The Python comment's rationale is reasonable — "forward naar" can appear in legitimate parameter descriptions — but the skill text doesn't reflect this distinction. One of the two must be updated: either remove "forward naar" from the SKILL.md's forbidden list, or add a note that it's deliberately excluded from the automated gate with the reason.
Minor
🟡 cross-law-integriteit.py — unclosed file handle and implicit encoding
doc = yaml.safe_load(open(path))open() is never closed; on a large corpus this can exhaust file descriptors under concurrent processes. The broad except Exception: continue silently swallows encoding errors too — a law file with a platform-encoding mismatch is silently skipped rather than flagged. Use:
with open(path, encoding='utf-8') as f:
doc = yaml.safe_load(f)🟡 CI: pip install pyyaml unpinned
An uncontrolled pyyaml upgrade (including a future major release) could silently change YAML parsing behaviour or break the script. Pin to a tested range, e.g. pip install "pyyaml>=6.0,<7", or use a requirements-ci.txt file.
🟡 check-skills-no-casus.sh: comment says "Woordgrenzen" but pattern has no \b
# Casus-markers (uitbreidbaar). Woordgrenzen waar een losse term anders zou matchen.
PATTERN='HHNK|Hollands Noorderkwartier|Noorderkwartier|regelrecht-corpus-(HHNK|BES)|WS[0-9]{4}'The comment implies word-boundary anchors, but the grep pattern has none. WS[0-9]{4} would match inside longer tokens (e.g. AWS1234). Either add \b anchors around the relevant terms or drop the misleading comment. Low practical risk in this directory, but the comment is incorrect.
Not flagged
The changes.outputs.ci guard on the new cross-law-integrity job is fine — corpus/regulation/** is in the ci filter, so corpus-only PRs will trigger the gate.
The doc comment added to coverage_score in models.rs is clear and accurate.
Preview Deployment — harvester-adminYour changes have been deployed to a preview environment: URL: https://harvester-admin-pr785-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
Preview Deployment — harvester-workerYour changes have been deployed to a preview environment: URL: https://harvester-worker-pr785-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
Preview Deployment — enrichworkerYour changes have been deployed to a preview environment: URL: https://enrichworker-pr785-regel-k4c.rig.prd1.gn2.quattro.rijksapps.nl This deployment will be automatically cleaned up when the PR is closed. |
f02ac1e to
0e331dd
Compare
Superseded by new review
There was a problem hiding this comment.
Code review
This PR adds skill documentation (reviewed as guidance prose, not executable code), a Python corpus-integrity CI gate, two shell scripts (a push-guard + its smoke-test), a pre-commit hook, and a Rust doc comment. Only the executable artefacts are reviewed below.
Shell scripts
script/check-skills-no-casus.sh
🟡 Minor — Layer 1 regex only catches uppercase corpus-repo names
PUBLIC_PATTERNS=(
'regelrecht-corpus-[A-Z][A-Za-z0-9-]*'
)[A-Z] requires the first character after regelrecht-corpus- to be uppercase. Standard GitHub repo names are all-lowercase (e.g. regelrecht-corpus-waterschap-hhnk). If the real private corpus repos follow that convention, layer 1 matches nothing and the entire guard falls back to layer 2 (which is absent in fork PR runs and on machines without the supplement file). The inline comment claims the pattern "catches every current and future corpus dossier without naming any of them" — that claim is only true when the repos happen to start with [A-Z]. If uppercase is a guaranteed naming convention for all corpus repos, document it; otherwise widen to [A-Za-z0-9].
.claude/skills/regelrecht-stelselanalyse/scripts/assert-private-repo.sh and test-assert-private-repo.sh
Logic and test coverage look correct. The explicit github.com-only host check, the case "$slug" in */*) slug validation, the fail-closed retry loop, and the 11 smoke-test cases cover the stated invariants.
Python (cross-law-integriteit.py)
🟡 Minor — str(None) produces spurious IMPL-DANGLING
tlaw, tart, term = im.get('law'), str(im.get('article')), im.get('open_term')
# …
elif term not in open_terms_idx[tlaw].get(tart, set()):
impl_dangling.append(…)If an implements dict is missing the article key (malformed YAML that passes yaml.safe_load), str(im.get('article')) evaluates to the string 'None'. The index stores article numbers as '1', '2', …, so open_terms_idx[tlaw].get('None', set()) always returns the empty set → false IMPL-DANGLING for every such entry. Schema validation should prevent this in practice, but the guard is fragile. Prefer an explicit None-check and skip or report a parse error:
tart_raw = im.get('article')
if tart_raw is None:
# malformed implements entry — skip or raise
continue
tart = str(tart_raw)🟡 Minor — unclosed file handles in YAML-loading loop
doc = yaml.safe_load(open(path))open(path) is never closed. On large corpora this exhausts the per-process file-descriptor limit. Use a context manager:
with open(path) as f:
doc = yaml.safe_load(f)CI (.github/workflows/ci.yml)
The new cross-law-integrity job and the secret-provisioning step in pre-commit look correct. Secret value is written with printf '%s\n' (not echo -e, no format-string risk), is never logged, and the fork-PR warning path is explicit. Order of steps (provision → pre-commit/action) is correct.
packages/pipeline/src/models.rs
Documentation-only addition. No issues.
Voegt drie samenhangende Claude-skills toe onder .claude/skills/: - regelrecht-dossier: front-door router die een binnenkomende casus triageert en routeert (feitelijke defecten -> desk, oordeels-/praktijkvragen -> workshop), met routing.md (canonieke flow) en zakkaart.md (onboarding-A4 voor analisten). - regelrecht-stelselanalyse: desk-review & corpus-completion-cyclus met 4-weg- classificatie, multi-agent review, verificatie, /loop- en /schedule-ondersteuning en een fail-closed private-only push-guard. - regelrecht-audit-products: producten voor live expert-validatie (scope-analyse, per-artikel audit-doc, facilitatiekit, testcases, verslagen). Dossier-agnostisch: geen casus-specifieke referenties.
Adds the law-version-drift-check skill alongside the existing desk-layer skills. It runs as Step 0 of every regelrecht-stelselanalyse micro-cycle and detects drift between a YAML's text blocks (plus competent_authority names and url anchors) and the binding geldende wettekst at the YAML's valid_from. Core principle: the geldende wettekst is leading and is mirrored verbatim, including errata; one allowed normalization (whitespace within a single paragraph). Two diff axes per article: structural first (lid- and onderdeel- counts), then textual at word level. Verdicts: CLEAN / DRIFT-tekst / DRIFT-structureel / NIET-VERIFIEERBAAR. Strict by design — no bypass; partial failure produces scope-restrictie (frozen articles), not a green report. Calibration ijkpunten live in the consuming corpus, not in the skill, keeping the skill dossier-agnostisch. Findings flow into resolutie-tracker as 4-weg klasse 1 (modellering-fout). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- assert-private-repo.sh: expliciete host-check vóór slug-extractie; niet-github.com remotes (GitLab, GHE, Bitbucket) blokkeren fail-closed. Voorheen strippeerde de sed-keten elke host blind en queryde github.com, waardoor een toevallig matchende private repo daar GUARD OK gaf voor een push naar een ander forge. - assert-private-repo.sh: sleep 1 tussen retry-pogingen, zodat de retry zin heeft bij transiënte gh-haperingen. - scheduled-routine.md: heading 'geïnstalleerd' → 'lokaal instellen' + expliciete noot dat de hook machine-lokaal is en niet met de repo meereist. - routing.md: ASCII-diagram opgeschoond (hangende rand opgeruimd, terugkoppel-pijl recht). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ram placeholders, no dangling bindings) - law-generate: hard rule that every cross-law input needs a real source: block, never just a description; add target output first if missing; multiple source: bindings per article are supported (schema v0.5.2); name-mismatch rule for source.output vs local name. - regelrecht-stelselanalyse: mandatory non-skippable source-refs integrity scan algorithm; dangling + plain-param are always modelling errors, never engine limitations; add reusable references/cross-law-integriteit.py. - law-reverse-validate: treat a source binding to a non-existent target output as a hallucination (dangling-binding check). - engine-limitaties template: a claimed engine limitation may only be recorded after a reproducible failing engine run; untested assumptions are open questions, not limitations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r parameters) The most common hidden cross-law defect: a source: block placed under parameters: instead of input:. The engine's Parameter struct has no source field, so the binding is silently dropped at parse time and degrades to a plain caller-supplied parameter; scenarios that inject the value directly mask it. - cross-law-integriteit.py: rewritten YAML-based (robust vs regex). Adds MISPLACED detection (source under parameters:), keeps DANGLING (target output missing) and PLAIN-PARAM (conceptueel/tijdelijk marker without source). Drops the over-eager 'forward naar' marker (false-positives on legitimate leaf parameters that feed a binding mapping). Exit!=0 as CI gate. - law-generate: hard 'Section placement' rule — a source: MUST live under input:, never parameters:; leaf inputs that feed source.parameters stay under parameters:; verify with a BDD scenario that loads the target law and sets leaf inputs so the dependent output flips. - regelrecht-stelselanalyse: source-refs integrity scan now reports clean/misplaced/dangling/plain-param; MISPLACED listed as the most common hidden form; all three are always modelling errors, never engine limits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s Step 0 - Remove trailing </content> (40 files) and </invoke> (audit-products SKILL) leaked from the authoring tool-call format. All files keep their final newline. - Integrate law-version-drift-check into regelrecht-stelselanalyse: new Step 0 section + Werkstroom item 0, a Zustersskills cross-reference block, and a Step 0 gate in references/cycle-workflow.md (section + volgorde-diagram). Closes the orphan where the drift-check called itself a hard prerequisite while the owning skill never referenced it.
…truth, ASCII - Add scripts/test-assert-private-repo.sh: stubs git/gh and asserts the guard invariants (gitlab/enterprise/bitbucket/public/internal/no-remote/unreachable → block, github private → pass). 11 cases, all green. Referenced from SKILL.md. - Point the two 4-weg-classificatie copies that lacked it (dossier SKILL, zakkaart) at classification.md as the canonical definition source, containing definition-drift while keeping each skill standalone-readable. - Redraw the routing.md flow diagram: the return-loop now forms one continuous vertical (┐│┘ aligned), annotations moved to an omlaag/omhoog legend below.
- Extend cross-law-integriteit.py with IoC checks: IMPL-DANGLING (implements pointing at an undeclared open_term) and IMPL-NO-DATE (implementing regulation without valid_from, which RFC-003's temporal filter then matches for every date). - Add a `cross-law-integrity` CI job running the script over corpus/regulation as a fail-closed gate (gap 1); add the script path to the `ci` paths filter. - Clarify coverage_score is a COMPLETENESS metric, not correctness (doc-comment on LawEntry.coverage_score) (gap 4). - Add references/check-keten.md: full inventory of zuiverheid/cross-law checks, each with its oracle (waartegen) and whether it is a CI-gate or methodological. Documents why text-fidelity (gap 3) needs a separate golden-text mechanism.
… van de wet Nieuwe casus-agnostische skill: auditeert of een machine_readable-model trouw is aan de LETTER van de wettekst en scheidt wettekst (leidend) strikt van toelichting (uitleg). Detecteert vier afwijkingsklassen (toelichting-bleed, ontbrekend verbatim-bestanddeel, verkeerde legal_basis- verankering, verbatim-drift), checkt positief-lid vs uitsluitings-lid + de "in afwijking van"- precedence in de chapeau, en classificeert elke bevinding als modelfout / wettekst-gevolg / letter-vs-toelichting-vraag (jurist beslist wat leidend is + herzieningssignaal). Fideliteit vóór outcome-toetsing, om outcome-bias te vermijden. Verwijst naar law-version-drift-check, regelrecht-stelselanalyse en law-generate.
…-ontologie Beslist en audit het referentietype van een waarde (cross-law / intra-law source-binding / open-norm leaf / extern-feit leaf) en vangt begrip-conflatie tussen wetten. Kern: het type volgt uit WIE het afleidt; een binding is sound alleen als producer en consumer hetzelfde BEGRIP aanduiden — het woord is niet het begrip. Detecteert false friends (één term, verschillende begrippen: register-gebaseerd vs feitelijke "naar de omstandigheden"-weging), orphan plain-params, negatie/complement-re-entry (¬X als losse leaf naast X) en duplicatie (één begrip als meerdere onafhankelijke leafs). Levert een law-level glossary + stelsel-level concept-map met sound same-concept-edges en gemarkeerde lexical-match-but-concept-mismatch-edges. Cardinale regel: een register mag een feitelijke open norm voeden (indicator), nooit stilzwijgend vervangen; symmetrie over een stelsel. Orthogonaal aan law-letter-fidelity-audit; bovenop regelrecht-stelselanalyse. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Voegt de uiteengelopen user-level en engine-repo skill-kopieën samen tot één canonieke set. Engine-versies waren inhoudelijk vooruit (Step 0 drift-check, source-refs integriteitsscan, striktere private-repo-guard); de scenario-traces- kruisverwijzingen die alleen user-level bestonden zijn erin geport. - regelrecht-scenario-traces toegevoegd als nieuwe (dossier-agnostische) skill - script/check-skills-no-casus.sh + pre-commit hook 'skills-no-casus' die casus-specifieke inhoud (organisatie-/sector-namen, codes, private repo-namen) blokkeert zodat trajectinformatie niet in de publieke skills lekt
…atronen lokaal Het publieke script bevatte letterlijke casus-/sector-namen (zelf-onthullend in een publieke repo). Nu twee lagen: - laag 1 (publiek): domein-loze structuur-vorm (corpus-repo-naamvorm), draait altijd, fail-closed — geen vals-groen bij ontbrekend aanvulbestand; - laag 2 (git-ignored / CI-secret): casus-/sector-specifieke patronen, nooit in de publieke repo. Geverifieerd: detectie in beide lagen, geen false positives, laag 1 vangt ook zonder aanvulbestand.
Voorziet het git-ignored aanvulbestand in de pre-commit-job vanuit het secret SKILLS_CASUS_DENYLIST, zodat de skills-no-casus hook in CI op volle sterkte draait zonder casus-/sector-patronen in de publieke repo te zetten. De waarde wordt nooit gelogd; ontbreekt het secret (forks) dan draait alleen laag 1 met een expliciete waarschuwing (geen vals 'volledig gescand').
0e331dd to
c380086
Compare
| # are all modelling errors. The script exits non-zero on any finding. | ||
| - name: Check cross-law binding integrity | ||
| run: | | ||
| python3 .claude/skills/regelrecht-stelselanalyse/references/cross-law-integriteit.py corpus/regulation |
There was a problem hiding this comment.
Productie-CI hangt nu rechtstreeks aan een script binnen .claude/skills/. Dat koppelt de corpus-gate aan skill-content: wie deze skill verplaatst/hernoemt/opschoont breekt stil de corpus-CI. Andere corpus-tooling staat in script/ — overweeg dit script daarheen te verplaatsen (en de CI + paths-filter daarnaar te laten wijzen), zodat de gate niet afhankelijk is van de skills-map.
There was a problem hiding this comment.
Goed punt — cross-law-integriteit.py is verplaatst naar script/; de CI-run en de paths-filter wijzen nu daarheen, en de skill-docs ook. De expliciete skills-pad-regel in de filter was redundant naast script/** en is verwijderd. Komt mee in de merge.
| # De waarde wordt nooit gelogd. Ontbreekt het secret (bv. PR's vanuit forks, | ||
| # die geen toegang tot secrets hebben), dan draait alleen laag 1 — expliciet | ||
| # gelogd, zodat een groene run nooit "volledig gescand" suggereert. | ||
| - name: Provision leak-guard supplement (laag 2) |
There was a problem hiding this comment.
Let op: het secret SKILLS_CASUS_DENYLIST bestaat nog niet in de repo (alleen ADMIN_TOKEN, CLAUDE_CODE_OAUTH_TOKEN, GEMINI_API_KEY, RIG_API_KEY). Zolang dat secret leeg is, draait deze stap altijd de else-tak en valt de guard terug op laag 1 — dat ene regelrecht-corpus-... patroon dat vandaag niets matcht. Met andere woorden: de leak-guard doet effectief niets totdat deze denylist is aangemaakt. Graag het secret vullen (en evt. een lokaal script/.skills-casus-denylist.local voor pre-commit) voordat dit als bescherming geldt.
There was a problem hiding this comment.
Terecht. De casus-/sector-patronen staan bewust niet in de publieke repo (hybride opzet); de bescherming werkt pas zodra de externe opslag gevuld is. Actie: we maken het secret SKILLS_CASUS_DENYLIST aan en houden script/.skills-casus-denylist.local lokaal voor pre-commit. Laag 1 (regelrecht-corpus-…) is de generieke baseline-tripwire die ook zonder secret afgaat bij een nieuwe corpus-repo-naam, en de stap logt een waarschuwing als het secret leeg is, zodat een groene run geen volledige scan suggereert.
…-gate van skills) De cross-law-integrity CI-gate draaide een script binnen .claude/skills/, waardoor het hernoemen/opschonen van die skill stil de productie-corpus-CI zou breken (review @tdjager). Verplaatst naar script/ bij de overige corpus-tooling; CI-run en paths-filter wijzen nu daarheen (de expliciete skills-pad-filterregel was redundant naast script/** en is verwijderd). Skill-docs verwijzen naar het nieuwe pad.
There was a problem hiding this comment.
Code Review — PR #785
Reviewed dimensions: script correctness, CI gate integrity, shell guard logic, and the pipeline model comment. The skill/template Markdown content (the bulk of the diff) falls outside REVIEW.md scope — it carries no executable or schema-parseable logic.
Script correctness
script/cross-law-integriteit.py
🟡 Minor — unclosed file handle (cross-law-integriteit.py line 8 in the new file, open(path) without a context manager):
doc = yaml.safe_load(open(path))The handle is never explicitly closed. This is a resource leak; on a large corpus it risks hitting OS file-descriptor limits before GC collects them. Replace with:
with open(path) as f:
doc = yaml.safe_load(f)🟡 Minor — silent ok for reg ≠ None, out = None cross-law binding (cross-law-integriteit.py, the else-branch of the input loop):
if reg not in law_outputs or (out is not None and out not in law_outputs[reg]):
dangling.append(...)
else:
ok += 1When reg is set but out is None, the out is not None short-circuit means only the existence of the target law is checked, not any output. If a source binding with regulation but no output is schema-invalid (the law-generate skill always shows both fields), this case should be flagged as DANGLING rather than counted as clean. At minimum it should continue instead of ok += 1, so it is not silently reported as a verified binding.
script/check-skills-no-casus.sh
🟡 Minor — pattern requires uppercase-initial suffix (line 7 of the new file in the PUBLIC_PATTERNS array):
'regelrecht-corpus-[A-Z][A-Za-z0-9-]*'The grep call does not use -i (the -I flag is for binary-file exclusion, not case-folding), so this pattern only catches names where the character immediately after regelrecht-corpus- is uppercase. A future private corpus repo named regelrecht-corpus-amsterdam (all-lowercase) would slip through. The PR description claims the pattern catches "elk huidig én toekomstig corpus-dossier" — that claim only holds while the naming convention is enforced out-of-band. Consider regelrecht-corpus-[A-Za-z][A-Za-z0-9-]* or add a note in the script that the convention (uppercase initial) is load-bearing.
CI gate integrity — OK
The cross-law-integrity job's if: needs.changes.outputs.ci == 'true' is correct: the changes job's ci filter covers corpus/regulation/** and script/**, so the gate runs on every corpus YAML change. The sys.exit(1 if (...) else 0) at the end of the Python script is present and correct — the gate is properly fail-closed.
Leak-guard / pre-commit hook — OK
pass_filenames: false + language: system is the right combination for a whole-directory scan. The laag 1 / laag 2 design (public domein-loze patterns + git-ignored supplement provisioned from a CI secret) is sound. Fork PRs that lack the secret run laag 1 only, which is explicitly logged — no silent degradation.
packages/pipeline/src/models.rs — OK
The added doc comment on coverage_score is accurate and the COVERAGE ≠ CORRECTNESS distinction is worth stating.
Code review — PR #785Harvester / pipeline / corpus
|
|
Inline — # Current (leaks FDs on large corpus):
doc = yaml.safe_load(open(path))
# Fix:
with open(path) as fh:
doc = yaml.safe_load(fh) |
Wat
Voegt de uiteengelopen user-level en engine-repo kopieën van de regelrecht-skills samen tot één canonieke set, en borgt dat er geen trajectinformatie naar deze publieke repo lekt.
Waarom
De skills bestonden in twee kopieën (
~/.claude/skills/en.claude/skills/hier) die uit elkaar waren gelopen. Per-bestand gereconcilieerd:scenario-traces-kruisverwijzingen die alléén user-level bestonden zijn erin geport.</content>-artefact te bevatten; de engine-versies zijn schoon.Wijzigingen
regelrecht-scenario-tracesals nieuwe, dossier-agnostische skill (9 bestanden).script/check-skills-no-casus.sh+ pre-commit hookskills-no-casus: een leak-guard die casus-specifieke inhoud in.claude/skills/blokkeert (de repo is publiek). Hybride opzet: het publieke script bevat alleen domein-loze structuur-vormen en draait altijd (fail-closed); de casus-/sector-specifieke patronen staan in een git-ignored aanvulbestand en worden in CI via een secret aangeleverd — nooit in de publieke repo. Zo publiceert de guard niet zelf wat hij moet weren.Verificatie
/security-reviewop de diff: geen bevindingen.