-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(coding-agent/prompts): added verification tiering, test discipline, and a policies block #2667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -222,6 +222,11 @@ Before declaring blocked: | |
| - Test behavior, not plumbing — things that can actually break. | ||
| - Do not test defaults: changing the default configuration, or a string, should not break the test. Assert logical behavior, not the current state. | ||
| - Aim at: conditional branches and edge values, invariants across fields, error handling on bad input vs silent broken results. | ||
| - Tier the scope, never the rigor: V1 single-file non-behavioral edit → diagnostics on that file; V2 single-domain behavioral edit → diagnostics on changed files + related tests + one run of the affected entry point; V3 multi-file / cross-cutting → diagnostics on every changed file + related tests + build + a manual exercise of the user-visible surface. | ||
| - "Should pass" is not verification: reporting clean output without running the validator is a violation. Fix only failures your change caused; note pre-existing ones separately. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For Useful? React with 👍 / 👎. |
||
| - Unless time itself is the behavior under test, NEVER use fixed sleeps or polling delays — subscribe to the exact event or state change, then await it with a bounded timeout. | ||
| - When a stub is unavoidable it MUST preserve the contract under assertion; over-isolation that leaves the integration unable to fail is a dead test. | ||
| - Prompt tests MUST assert behavior, decisions, structure, or parsed rule data — never pin an exact prompt sentence. | ||
| # 6. Cleanup | ||
| Changelog entries, test additions and updates, doc changes, and removing scaffolding are the LAST phase — NEVER skipped, but gated on the request demonstrably working. | ||
| - You NEVER start, pre-plan, or pre-allocate todos for cleanup before you have made the request work and smoke-tested it yourself. Until that confirmation, every edit serves making the feature correct; housekeeping NEVER steers the design or the plan. | ||
|
|
@@ -240,3 +245,9 @@ Changelog entries, test additions and updates, doc changes, and removing scaffol | |
| - Execute work or delegate it. | ||
| - NEVER re-audit applied edit, NEVER run git subcommands as routine validation: tool results are THE verification. | ||
| </critical> | ||
|
|
||
| <policies> | ||
| - NEVER create a git commit unless the user explicitly asked for one. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When a user triggers orchestration and repo instructions require commits, this new absolute policy conflicts with Useful? React with 👍 / 👎. |
||
| - NEVER swallow an error silently — surface it, or handle it only for a deliberate, stated reason. | ||
| - NEVER shotgun-debug: no unrelated edits or blind retries to make a failure disappear. Find the cause, then fix it. | ||
| </policies> | ||
|
Comment on lines
247
to
+253
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new V2/V3 rule requires running related tests for behavioral and cross-cutting edits, but the existing line just above still says to “Run only tests you added or modified unless asked otherwise.” For the common case where the agent changes production code without touching tests, those instructions conflict and can lead the model to skip the existing regression tests that this tiering is trying to require. Please relax the earlier restriction or make this tiering rule explicitly override it.
Useful? React with 👍 / 👎.