Skip to content

improve hardhat-deploy-migration skill description for better agent matching#591

Open
fernandezbaptiste wants to merge 1 commit intowighawag:mainfrom
fernandezbaptiste:improve-hardhat-deploy-migration-skill
Open

improve hardhat-deploy-migration skill description for better agent matching#591
fernandezbaptiste wants to merge 1 commit intowighawag:mainfrom
fernandezbaptiste:improve-hardhat-deploy-migration-skill

Conversation

@fernandezbaptiste
Copy link
Copy Markdown

@fernandezbaptiste fernandezbaptiste commented Apr 3, 2026

hey @wighawag, thanks for building hardhat-deploy. really like the deployment plugin approach for Ethereum projects. kudos on passing 1.2k stars! just starred it.

ran your hardhat-deploy-migration skill through some evals and noticed a few things that were pretty quick to improve (moving up to ~71% agent performance):

  • restructured the body from 2033 lines to 341 by removing redundant sections (intro, architecture diagrams, duplicated patterns)
  • extracted troubleshooting and advanced topics into references/ files for progressive disclosure
  • added a quick API reference table at the top + expanded description with trigger terms like migrate to hardhat-deploy v2, upgrade hardhat-deploy

these were easy changes to bring the skill in line with what performs well against Anthropic's best practices. honest disclosure, I work at tessl.io where we build tooling around this. not a pitch, just fixes that were straightforward to make.

you've got 1 skill here, if you want to do it yourself, spin up Claude Code and run tessl skill review. alternatively, let me know if you'd like an automatic review in your repo via GitHub Actions. it doesn't require signup, and this means you and your contributors get an instant quality signal before you have to review yourself.

Repository owner deleted a comment from kilo-code-bot bot Apr 3, 2026
@wighawag
Copy link
Copy Markdown
Owner

wighawag commented Apr 3, 2026

Hi @fernandezbaptiste thanks for the PR

is that the only improvement the tool could find ?

@fernandezbaptiste fernandezbaptiste force-pushed the improve-hardhat-deploy-migration-skill branch from b96f280 to a54fc34 Compare April 3, 2026 11:48
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Apr 3, 2026

Kilo Code Review could not run — your account is out of credits.

Add credits or switch to a free model to enable reviews on this change.

@fernandezbaptiste
Copy link
Copy Markdown
Author

fernandezbaptiste commented Apr 3, 2026

hey @wighawag, just pushed a much more substantial update - the skill went from ~71% to ~94%. the main changes:

  • restructured the body from 2033 lines to 341 by removing redundant sections (intro, architecture diagrams, duplicated patterns)
  • extracted troubleshooting and advanced topics into references/ files for progressive disclosure
  • added a quick API reference table at the top for faster navigation
  • kept all the executable migration code examples intact

if you want to squeeze out even more, you can run npx tessl skill review --optimize in your repo with Claude Code - it'll suggest targeted improvements and let you accept them interactively. alternatively, I can help set up a GitHub Action that does this automatically on PRs touching skill files. let me know if either sounds useful.

@wighawag
Copy link
Copy Markdown
Owner

wighawag commented Apr 3, 2026

How do you test the skill accuracy, what do you mean by 94%

Having eval for skills would be great but their accuracy depends on the input being tested

@fernandezbaptiste
Copy link
Copy Markdown
Author

fernandezbaptiste commented Apr 8, 2026

thanks for the question @wighawag - the 94% scores come from a review score eval against Anthropic's best practices - it measures how well a skill.md is structured and formatted for agent consumption (trigger terms, workflow clarity, progressive disclosure, etc). think of it as linting for skill files, useful for catching low-hanging fruit, which is what this PR addresses.

you're right that real accuracy depends on the input being tested. that's exactly why we also built scenario-level evals - they pressure test a skill in realistic agent scenarios, measuring how an agent performs on a task with vs without the skill loaded.

we ran some for your skill here: hardhat-deploy-migration evals - your skill performs super well on the scenarios we tested! kudos on the quality of the content.

the real differentiator is building your own scenarios - you know your repo's edge cases best. spin up claude code and point it to this doc - it'll walk you through creating scenarios specific to hardhat-deploy. takes a few minutes and you'll know exactly where the skill holds up and where it doesn't.

happy to help if you hit any snags

@wighawag
Copy link
Copy Markdown
Owner

wighawag commented Apr 12, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants