feat(scanner): derive skill/plugin source from install manifests#90
Conversation
61c6665 to
d6d3b8d
Compare
HarnessKit inferred an extension's source by walking up to the nearest enclosing .git, mis-attributing everything under a dotfiles-managed agent home (e.g. ~/.claude) to that backup repo. Read each tool's own install manifest instead, falling back to .git detection only when absent. - Skills: override detect_source with <root>/.skill-lock.json (the skills CLI lockfile), matched by on-disk folder name, cached per lockfile. - Plugins: PluginEntry gains source_url; the Claude adapter fills it from plugins/known_marketplaces.json (github marketplaces -> owner/repo); scan_plugins prefers it over the .git walk. Other adapters unchanged. Regression tests: lockfile beats a populated enclosing repo (and a sibling skill absent from the lock falls back); a plugin is attributed to its marketplace repo. Refs RealZST#89. Builds on RealZST#88 canonicalize in scan_skill_dir.
The scanner now resolves the real source from install manifests, but the git-source backfill only writes install_meta when install_type IS NULL, so a row stamped in an earlier sync (e.g. a plugin first attributed to the enclosing dotfiles repo) keeps its stale install_url. deriveExtensionUrl prefers install_url, so the corrected source_json.url stayed shadowed and the extension lingered in the wrong group. Add refresh_stale_git_install_meta: for install_type='git' skill/plugin rows whose install_url owner/repo differs from the authoritative source_json.url owner/repo, realign install_url/revision and clear the now-stale branch/subpath + pack (re-derived by backfill_packs). Compare by pack, not raw URL string, so a legitimate install recorded as ".../repo" is not churned against the scanner's ".../repo.git" remote every sync (which would wipe its pinned revision and check state). Runs in both sync paths after the symlink heal, before backfill_packs. This is the store half of the manifest source-resolution fix (parallel to the RealZST#88 self-heal).
….md skills Two review findings on the manifest source-resolution change: - refresh_stale_git_install_meta compared pack owner/repo case-sensitively, so a stored `Owner/Repo` vs a scanned `owner/repo` (GitHub is case-insensitive) would churn the row every sync, wiping its pinned revision/check state. Compare lowercased. Add a same-repo case-only regression row that must stay intact. - scan_skill_dir's lockfile key used path.file_name() unconditionally, so a standalone `<name>.md` skill would key on `name.md` and miss its lock entry. Key dir skills by folder name (file_name, dot-safe) and standalone .md by stem.
The web and desktop `update_extension` endpoints validated install_meta and install_type but not kind. The update path clones a repo and deploys it as a skill, so passing a non-skill id ran the skill-update logic on it. This became reachable once manifest source-resolution stamps plugins with install_type='git'. Add the same `is_update_eligible` gate the bulk update path already uses (rejects non-skill kinds), so a plugin id can no longer enter the skill clone+deploy flow.
refresh_stale_git_install_meta realigned a row's install_url to its scanned source whenever the owner/repo packs differed. But a `.git`-inferred source (e.g. an HK-git-installed skill that merely sits under a personal dotfiles repo) is not authoritative — trusting it clobbers the real install_url and wipes the pinned revision, re-attributing the skill to the dotfiles repo. Add `Source::from_manifest`, set only where the source is read from an authoritative install manifest (the skills CLI `.skill-lock.json`, or a plugin marketplace map), and gate the realignment on it so only a manifest-derived source may correct an install record. Inferred sources are left alone.
… CLI Skills installed by the external `skills` CLI live in ~/.agents/skills/<name> and are tracked in ~/.agents/.skill-lock.json (detected via Source::from_manifest). Updating them with HK's own clone+deploy overwrites the CLI-managed copy and leaves the CLI's lockfile hash stale, drifting the two tools apart. Route such a skill's update to `skills update <name> -g -y` (preferring a `skills` binary, falling back to `npx --yes skills`, and falling back to HK's own update when neither is available), so the CLI updates the files AND its own lockfile in one step. Delete stays native — HK's delete is agent/path-granular whereas `skills remove` removes from every agent, and removing a per-agent symlink doesn't touch the canonical copy (no lockfile drift). The CLI syncs the skill to its upstream HEAD, but the deployed files aren't a git checkout, so a rescan would leave install_revision NULL and the row stuck on "update available". After a delegated update, record the upstream HEAD across the skill's same-scope copies so the git-based update check clears the indicator — mirroring how the native path records its captured revision.
0b14bca to
099ad62
Compare
|
Rebased onto main (resolves the #88-squash conflict). Two commits:
Tested web + desktop. Known limitationMonorepo sources still update-check by repo HEAD, so every skill in a repo flags "update available" when any one of them changes. The CLI checks per-skill folder hash but has no read-only check command, so delegating the check would couple HK to its registry. The repo-level check over-reports but never hides a real update. |
Closes #89.
Problem
scanner::detect_sourceinfers an extension's source by walking up to the nearest enclosing.git. When an agent home (e.g.~/.claude) is kept under the user's own dotfiles repo, everything beneath it — plugins cached in~/.claude/plugins/cache/<marketplace>/..., skills, etc. — is mis-attributed to that backup remote, collapsing many unrelated sources into one bogus group. PR #88 fixed only the symlinked-skill slice; this addresses the root cause for skills and plugins.Fix
Read each tool's own install manifest as the authoritative source; fall back to
detect_sourceonly when there is no manifest entry.<root>/.skill-lock.json(the skills CLI lockfile), matched by the on-disk folder name (how the CLI keys it), parsed once per lockfile and cached. A real git checkout's commit hash is preserved.PluginEntrygainssource_url; the Claude adapter fills it fromplugins/known_marketplaces.json(github marketplaces →owner/repo);scan_pluginsprefers it over the.gitwalk. Non-github marketplaces are skipped so we never fabricate a wrong URL. Other adapters passNone(behaviour unchanged).Testing
cargo test -p hk-core— 531 passed, incl. two new regression tests:test_skill_lock_overrides_enclosing_git_source— the lockfile beats a populated enclosing repo; a sibling skill absent from the lock still falls back; the override matches by folder name even when frontmatternamediffers.test_scan_plugins_attributes_to_marketplace_repo— a plugin is attributed to its marketplace's upstream repo.cargo clippy -p hk-core— clean for the touched files.Notes
canonicalizeinscan_skill_dir). Until fix(skills): attribute symlinked skills to real source, not enclosing repo #88 merges this PR's range includes fix(skills): attribute symlinked skills to real source, not enclosing repo #88's commit; I'll rebase ontomainonce fix(skills): attribute symlinked skills to real source, not enclosing repo #88 lands.source_urlresolution later; once manifest resolution is in, the store's git-source backfill can be narrowed.Store self-heal (existing DBs)
The scanner change fixes attribution going forward, but on an existing DB the git-source backfill had already stamped
install_url/packfrom the old (wrong) source, and the backfill only runs oninstall_type IS NULLso it never refreshes them — andderiveExtensionUrlprefersinstall_url, so the correctedsource_json.urlstays shadowed.refresh_stale_git_install_meta(parallel to #88's self-heal) realignsinstall_type='git'skill/plugin rows whose recorded owner/repo differs from the now-authoritativesource_json.urlowner/repo, clearing the stale branch/subpath + pack (re-derived bybackfill_packs). It compares by pack, not raw URL string, so a legitimate install recorded as…/repois not churned against the scanner's…/repo.gitremote every sync (which would wipe its pinned revision/check state). Verified on a real polluted DB: 13 plugins re-attributed to their marketplaces, same-repo and self-authored skills left untouched.Known properties / limitations
refresh_stale_git_install_metapersists the realignment, so reverting the code does not roll the data back (the old backfill only fires oninstall_type IS NULL). In practice the only values it overwrites are ones whose owner/repo genuinely changed — i.e. stale pollution — and pack-gating leaves same-repo rows (incl..git-suffix and case-only variants) untouched, so a legitimately pinned revision is not lost.owner/repo(e.g. gitlab→github) is treated as the same repo and not realigned; the stored URL keeps the old host. This is intentional (it's what avoids churning.git-suffix/case/scheme variants) and harmless for grouping, which keys on owner/repo.detect_source(no fabricated URL); plugins from such markets under a git-managed home stay mis-attributed. Tracked for follow-up in Derive extension source from the tool's install manifest, not the enclosing .git #89, alongside Codex's per-plugin manifest.Known interactions
skills-CLI-installed skill resolves to its real upstream, it becomes update-eligible, and the update path clones that repo anddeploy_skills it over every same-named installed copy — including the canonical~/.agents/skills/<name>that the externalskillsCLI manages (and any agent home symlinked to it). The files get updated but the CLI's.skill-lock.jsonhash/commit is not, so the two can drift. This is pre-existing behaviour (the marketplace-attributed sibling copies already triggered the same clone+deploy), widened here because the symlinked copy is now also an entry point. Flagged for maintainer awareness; not changed in this PR.Decision for maintainer
This PR's blast radius is intentionally bounded: it changes attribution (and a one-time, idempotent, pack-gated DB realignment), and the one potentially destructive interaction — updating an externally-managed skill — is user-triggered (nothing happens until someone clicks Update) and pre-existing (marketplace-attributed copies already cloned + deployed over the same files). So it is safe to ship as-is.
What's left is a policy choice that should be yours, not silently encoded here: should HarnessKit offer Update/Delete on skills that an external tool (the
skillsCLI, tracked in~/.agents/.skill-lock.json) manages? Now that those skills resolve to a real upstream, they become update-eligible, anddeploy_skillwrites through the agent-home symlink onto the CLI-managed~/.agents/skills/<name>, leaving the CLI's lockfile hash stale.Two directions, for you to pick before any further change:
skillsCLI" notice. Minimal, safe, hands-off. (My recommendation.).skill-lock.json(or shells out to theskillsCLI) after an update. Complete, but couples HarnessKit to the external tool's format and lifecycle.This PR deliberately stays at attribution only. If you favour a direction, I'll do it as a separate follow-up (alongside a small, safe hardening: matching
find_skill_in_repoby folder as well as name, so a skill whose frontmatter name differs from its repo folder still updates). Tracked with the manifest follow-ups in #89.