You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Neither watches the one thing that actually tells us the taxonomy has stopped being useful: distribution skew. A controlled vocabulary that is never re-audited rots, and tooling already proves it — it has reached 133/247 = ~53% of open issues with no rename/split/addition since the taxonomy was born (#240, 2026-03-28). When a single bucket holds half the tracker it stops discriminating, and filtering by it is no better than not filtering at all.
The recent consolidation (#915 single-sourced the Area list but deliberately kept all 6 buckets) did not address this; it makes the list easier to edit, not self-monitoring. Even after a future split of tooling, the largest successor bucket (likely infra) will still be the biggest and will silently regrow without a tripwire. This is a textbook concept-drift guard: cheap, periodic, converts taxonomy maintenance from reactive to scheduled.
Proposal / Recommendation
Add a lightweight scheduled check that, for each Area label, computes its share of open issues and opens/updates a single tracking issue when any Area exceeds a threshold (~30%), prompting a re-split review.
Two acceptable implementations — pick the lighter one the maintainers prefer:
Scheduled workflow (.github/workflows/taxonomy-drift.yml), on: schedule (e.g. weekly cron) plus workflow_dispatch. For each Area in .github/issue-areas.yml, run the per-label totalCount GraphQL query (validated during this review):
with $q = "repo:genealogix/glx is:issue is:open label:<area>", plus one unfiltered is:open count for the denominator. If count/total > 0.30 for any Area, open (or update, to avoid dupes) an issue tagged tooling listing the offending buckets and their percentages.
Checklist item in the existing triage cadence (lighter, no new workflow surface): a documented step that runs the same query manually each triage cycle and files a re-split issue when the threshold trips. Pair this with the documented audit rule.
Implementation notes:
Read the Area list from .github/issue-areas.yml so the check stays in lock-step with the canonical source (same pattern the labeler and templates-drift check already use — Python + stdlib + PyYAML, no yq).
SHA-pin / patch-pin all uses: actions per .github/CLAUDE.md (@vX.Y.Z, never @vN); permissions: minimal (issues: write only if the workflow opens the tracking issue, else contents: read).
If creating the tracking issue from CI, never interpolate untrusted issue fields into run: — stage values via env: (per .github/CLAUDE.md).
Make the threshold a single named constant so it's easy to tune.
De-dupe: search for an existing open tracking issue (by a fixed title prefix or a dedicated marker) and update it rather than opening a new one every run.
Acceptance criteria
A scheduled mechanism (workflow on: schedule + workflow_dispatch, OR a documented triage-checklist step) computes each Area's share of open issues using the per-label totalCount/issueCount GraphQL query.
Area list is read from .github/issue-areas.yml, not hard-coded.
When any Area exceeds ~30% of open issues, the mechanism opens or updates a single tracking issue (typed Infrastructure, label tooling) naming the offending bucket(s) and their percentages, prompting a re-split review.
Threshold is a single tunable constant.
If implemented as a workflow: all uses: actions are patch-/SHA-pinned per .github/CLAUDE.md; permissions: are least-privilege; no untrusted input is interpolated into run:.
The audit rule (re-audit the taxonomy when a bucket dominates) is documented alongside the check so the intent survives.
Notes / scope
This is preventive tooling, not a bug — hence Infrastructure and P3. It does not block the tooling re-split itself; it ensures the next catch-all can't silently regrow.
If the team decides a dedicated taxonomy/meta label is warranted for these tracking issues, that's a separate request — only the existing tooling label is applied here, since inventing labels is out of scope.
Context / Problem
The Area taxonomy is a controlled vocabulary of 6 buckets, single-sourced in
.github/issue-areas.yml:We have two CI gates that keep this vocabulary internally consistent:
.github/workflows/issue-templates-drift.yml— fails CI if any issue template's Area dropdown options diverge from the canonical list (option sync).Neither watches the one thing that actually tells us the taxonomy has stopped being useful: distribution skew. A controlled vocabulary that is never re-audited rots, and
toolingalready proves it — it has reached 133/247 = ~53% of open issues with no rename/split/addition since the taxonomy was born (#240, 2026-03-28). When a single bucket holds half the tracker it stops discriminating, and filtering by it is no better than not filtering at all.The recent consolidation (#915 single-sourced the Area list but deliberately kept all 6 buckets) did not address this; it makes the list easier to edit, not self-monitoring. Even after a future split of
tooling, the largest successor bucket (likelyinfra) will still be the biggest and will silently regrow without a tripwire. This is a textbook concept-drift guard: cheap, periodic, converts taxonomy maintenance from reactive to scheduled.Proposal / Recommendation
Add a lightweight scheduled check that, for each Area label, computes its share of open issues and opens/updates a single tracking issue when any Area exceeds a threshold (~30%), prompting a re-split review.
Two acceptable implementations — pick the lighter one the maintainers prefer:
Scheduled workflow (
.github/workflows/taxonomy-drift.yml),on: schedule(e.g. weekly cron) plusworkflow_dispatch. For each Area in.github/issue-areas.yml, run the per-labeltotalCountGraphQL query (validated during this review):with
$q = "repo:genealogix/glx is:issue is:open label:<area>", plus one unfilteredis:opencount for the denominator. Ifcount/total > 0.30for any Area, open (or update, to avoid dupes) an issue taggedtoolinglisting the offending buckets and their percentages.Checklist item in the existing triage cadence (lighter, no new workflow surface): a documented step that runs the same query manually each triage cycle and files a re-split issue when the threshold trips. Pair this with the documented audit rule.
Implementation notes:
.github/issue-areas.ymlso the check stays in lock-step with the canonical source (same pattern the labeler and templates-drift check already use — Python + stdlib + PyYAML, noyq).uses:actions per.github/CLAUDE.md(@vX.Y.Z, never@vN);permissions:minimal (issues: writeonly if the workflow opens the tracking issue, elsecontents: read).run:— stage values viaenv:(per.github/CLAUDE.md).Acceptance criteria
on: schedule+workflow_dispatch, OR a documented triage-checklist step) computes each Area's share of open issues using the per-labeltotalCount/issueCountGraphQL query..github/issue-areas.yml, not hard-coded.Infrastructure, labeltooling) naming the offending bucket(s) and their percentages, prompting a re-split review.uses:actions are patch-/SHA-pinned per.github/CLAUDE.md;permissions:are least-privilege; no untrusted input is interpolated intorun:.Notes / scope
Infrastructureand P3. It does not block thetoolingre-split itself; it ensures the next catch-all can't silently regrow.taxonomy/metalabel is warranted for these tracking issues, that's a separate request — only the existingtoolinglabel is applied here, since inventing labels is out of scope.Relates to
.github/workflows/issue-templates-drift.yml, issue-templates-drift.yml: drift check hard-codes the 3 template paths while its trigger globs ISSUE_TEMPLATE/*.yml — a 4th template's Area dropdown drifts unchecked #949, issue-templates-drift.yml: add a concurrency group (the lone PR-triggered workflow without one) #950.Part of a focused review of
.github/issue-areas.ymland its labeling machinery; taxonomy hub: #1062.