ci: scheduled taxonomy-drift tripwire — alert when any one Area exceeds ~30% of open issues

## Context / Problem

The Area taxonomy is a controlled vocabulary of 6 buckets, single-sourced in `.github/issue-areas.yml`:

```
cli, spec, go-glx, import-export, ui, tooling
```

We have two CI gates that keep this vocabulary *internally consistent*:

- `.github/workflows/issue-templates-drift.yml` — fails CI if any issue template's Area dropdown options diverge from the canonical list (option **sync**).
- #946 — proposes guarding that every canonical Area has a corresponding repo label (label **existence**).

Neither watches the one thing that actually tells us the taxonomy has stopped being useful: **distribution skew**. A controlled vocabulary that is never re-audited rots, and `tooling` already proves it — it has reached **133/247 = ~53%** of open issues with no rename/split/addition since the taxonomy was born (#240, 2026-03-28). When a single bucket holds half the tracker it stops discriminating, and filtering by it is no better than not filtering at all.

The recent consolidation (#915 single-sourced the Area list but deliberately kept all 6 buckets) did not address this; it makes the list easier to *edit*, not self-monitoring. Even after a future split of `tooling`, the largest successor bucket (likely `infra`) will still be the biggest and will silently regrow without a tripwire. This is a textbook concept-drift guard: cheap, periodic, converts taxonomy maintenance from reactive to scheduled.

## Proposal / Recommendation

Add a lightweight **scheduled** check that, for each Area label, computes its share of open issues and opens/updates a single tracking issue when any Area exceeds a threshold (~30%), prompting a re-split review.

Two acceptable implementations — pick the lighter one the maintainers prefer:

1. **Scheduled workflow** (`.github/workflows/taxonomy-drift.yml`), `on: schedule` (e.g. weekly cron) plus `workflow_dispatch`. For each Area in `.github/issue-areas.yml`, run the per-label `totalCount` GraphQL query (validated during this review):

   ```graphql
   query($owner:String!, $repo:String!, $q:String!) {
     search(type: ISSUE, query: $q) { issueCount }
   }
   ```

   with `$q = "repo:genealogix/glx is:issue is:open label:<area>"`, plus one unfiltered `is:open` count for the denominator. If `count/total > 0.30` for any Area, open (or update, to avoid dupes) an issue tagged `tooling` listing the offending buckets and their percentages.

2. **Checklist item** in the existing triage cadence (lighter, no new workflow surface): a documented step that runs the same query manually each triage cycle and files a re-split issue when the threshold trips. Pair this with the documented audit rule.

Implementation notes:

- Read the Area list from `.github/issue-areas.yml` so the check stays in lock-step with the canonical source (same pattern the labeler and templates-drift check already use — Python + stdlib + PyYAML, no `yq`).
- **SHA-pin / patch-pin all `uses:` actions** per `.github/CLAUDE.md` (`@vX.Y.Z`, never `@vN`); `permissions:` minimal (`issues: write` only if the workflow opens the tracking issue, else `contents: read`).
- If creating the tracking issue from CI, **never interpolate untrusted issue fields into `run:`** — stage values via `env:` (per `.github/CLAUDE.md`).
- Make the threshold a single named constant so it's easy to tune.
- De-dupe: search for an existing open tracking issue (by a fixed title prefix or a dedicated marker) and update it rather than opening a new one every run.

## Acceptance criteria

- [ ] A scheduled mechanism (workflow `on: schedule` + `workflow_dispatch`, OR a documented triage-checklist step) computes each Area's share of open issues using the per-label `totalCount`/`issueCount` GraphQL query.
- [ ] Area list is read from `.github/issue-areas.yml`, not hard-coded.
- [ ] When any Area exceeds ~30% of open issues, the mechanism opens **or updates** a single tracking issue (typed `Infrastructure`, label `tooling`) naming the offending bucket(s) and their percentages, prompting a re-split review.
- [ ] Threshold is a single tunable constant.
- [ ] If implemented as a workflow: all `uses:` actions are patch-/SHA-pinned per `.github/CLAUDE.md`; `permissions:` are least-privilege; no untrusted input is interpolated into `run:`.
- [ ] The audit rule (re-audit the taxonomy when a bucket dominates) is documented alongside the check so the intent survives.

## Notes / scope

- This is preventive tooling, not a bug — hence `Infrastructure` and **P3**. It does not block the `tooling` re-split itself; it ensures the *next* catch-all can't silently regrow.
- If the team decides a dedicated `taxonomy`/`meta` label is warranted for these tracking issues, that's a separate request — only the existing `tooling` label is applied here, since inventing labels is out of scope.

## Relates to

- Born from / proves the need: #240 (taxonomy origin), #915 (single-sourced the Area list, kept all 6 buckets).
- Complementary CI gates (sync & existence, not distribution): #946, `.github/workflows/issue-templates-drift.yml`, #949, #950.
- Adjacent intake/labeler hardening: #884, #886, #947, #948, #1048.

---
*Part of a focused review of `.github/issue-areas.yml` and its labeling machinery; taxonomy hub: #1062.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: scheduled taxonomy-drift tripwire — alert when any one Area exceeds ~30% of open issues #1067

Context / Problem

Proposal / Recommendation

Acceptance criteria

Notes / scope

Relates to

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ci: scheduled taxonomy-drift tripwire — alert when any one Area exceeds ~30% of open issues #1067

Description

Context / Problem

Proposal / Recommendation

Acceptance criteria

Notes / scope

Relates to

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions