feat(coding-agent/prompts): added verification tiering, test discipline, and a policies block#2667
feat(coding-agent/prompts): added verification tiering, test discipline, and a policies block#2667metaphorics wants to merge 1 commit into
Conversation
…ne, and a policies block - Extended the system-prompt verification step with V1-V3 scope tiering (tier the scope, never the rigor) and the rule that 'should pass' is not verification. - Added test-discipline lines the prompt lacked: no fixed sleeps (subscribe to the exact event/state change with a bounded timeout), stub-contract integrity (never isolate so heavily the integration cannot fail), and prompt tests must assert behavior rather than pin an exact sentence. - Added a policies block (never commit unless explicitly asked, never silently swallow errors, no shotgun debugging), deduplicated against the existing CONTRACT and intent gate.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Extends the coding-agent system prompt with stricter verification/test-discipline guidance and adds an explicit policies section to constrain agent behavior.
Changes:
- Added V1–V3 scope tiering guidance and additional test/verification discipline rules to the system prompt.
- Introduced a new
<policies>block (no commits unless asked, no silent error swallowing, no shotgun debugging). - Documented the changes in the package changelog.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| packages/coding-agent/src/prompts/system/system-prompt.md | Adds new verification rules and a new <policies> block to the system prompt. |
| packages/coding-agent/CHANGELOG.md | Records the system-prompt updates under Unreleased. |
Comments suppressed due to low confidence (2)
packages/coding-agent/CHANGELOG.md:1
- Typo:
omp lackedshould be corrected (unclear meaning as written).
packages/coding-agent/CHANGELOG.md:1 - This changelog bullet is extremely long and hard to scan. Consider splitting it into multiple bullets (e.g., one for scope tiering/verification wording, one for test-discipline rules, one for the new
<policies>block) to improve readability and make future changelog diffs easier to review.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| </critical> | ||
|
|
||
| <policies> | ||
| - NEVER create a git commit unless the user explicitly asked for one. | ||
| - NEVER swallow an error silently — surface it, or handle it only for a deliberate, stated reason. | ||
| - NEVER shotgun-debug: no unrelated edits or blind retries to make a failure disappear. Find the cause, then fix it. | ||
| </policies> |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2f0c7efddf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - Test behavior, not plumbing — things that can actually break. | ||
| - Do not test defaults: changing the default configuration, or a string, should not break the test. Assert logical behavior, not the current state. | ||
| - Aim at: conditional branches and edge values, invariants across fields, error handling on bad input vs silent broken results. | ||
| - Tier the scope, never the rigor: V1 single-file non-behavioral edit → diagnostics on that file; V2 single-domain behavioral edit → diagnostics on changed files + related tests + one run of the affected entry point; V3 multi-file / cross-cutting → diagnostics on every changed file + related tests + build + a manual exercise of the user-visible surface. |
There was a problem hiding this comment.
Resolve conflicting test-scope instructions
This new V2/V3 rule requires running related tests for behavioral and cross-cutting edits, but the existing line just above still says to “Run only tests you added or modified unless asked otherwise.” For the common case where the agent changes production code without touching tests, those instructions conflict and can lead the model to skip the existing regression tests that this tiering is trying to require. Please relax the earlier restriction or make this tiering rule explicitly override it.
Useful? React with 👍 / 👎.
| </critical> | ||
|
|
||
| <policies> | ||
| - NEVER create a git commit unless the user explicitly asked for one. |
There was a problem hiding this comment.
Reconcile commit policy with orchestration mode
When a user triggers orchestration and repo instructions require commits, this new absolute policy conflicts with orchestrate-notice.md rule 6, which tells the agent to commit when “the repo workflow expects them”; that notice is injected for orchestrate magic-keyword turns via #createMagicKeywordNotices. The result is ambiguous guidance in exactly the large multi-phase workflow where commits are currently prescribed, so either this policy needs to allow repo-workflow-required commits or the orchestrator prompt needs to be updated consistently.
Useful? React with 👍 / 👎.
| - Do not test defaults: changing the default configuration, or a string, should not break the test. Assert logical behavior, not the current state. | ||
| - Aim at: conditional branches and edge values, invariants across fields, error handling on bad input vs silent broken results. | ||
| - Tier the scope, never the rigor: V1 single-file non-behavioral edit → diagnostics on that file; V2 single-domain behavioral edit → diagnostics on changed files + related tests + one run of the affected entry point; V3 multi-file / cross-cutting → diagnostics on every changed file + related tests + build + a manual exercise of the user-visible surface. | ||
| - "Should pass" is not verification: reporting clean output without running the validator is a violation. Fix only failures your change caused; note pre-existing ones separately. |
There was a problem hiding this comment.
Allow explicit CI-fix tasks to repair existing failures
For /green and similar CI-repair prompts, the failing workflow on the current HEAD usually predates the agent’s current turn; ci-green-request.md still instructs the agent to inspect failed jobs and make the minimal fix until the branch is green. This new blanket “Fix only failures your change caused” rule can make the agent treat those same failures as pre-existing and only report them instead of fixing them, so it should be scoped to validators run while verifying the agent’s own changes or explicitly exempt user-requested failure-repair tasks.
Useful? React with 👍 / 👎.
Summary
Extends the system-prompt verification step with V1-V3 scope tiering and test-discipline rules (no fixed sleeps; stub-contract integrity; prompt tests assert behavior, not exact strings), and adds a policies block (never commit unless asked, never silently swallow errors, no shotgun debugging), deduplicated against the existing contract.
Note: touches
prompts/system/system-prompt.mdalongside #2657 and #2658 (distinct sections); rebase if flagged.Closes #2659