Skip to content

feat(coding-agent/prompts): added verification tiering, test discipline, and a policies block#2667

Open
metaphorics wants to merge 1 commit into
can1357:mainfrom
metaphorics:ultra/verify-policies
Open

feat(coding-agent/prompts): added verification tiering, test discipline, and a policies block#2667
metaphorics wants to merge 1 commit into
can1357:mainfrom
metaphorics:ultra/verify-policies

Conversation

@metaphorics

Copy link
Copy Markdown
Contributor

Summary

Extends the system-prompt verification step with V1-V3 scope tiering and test-discipline rules (no fixed sleeps; stub-contract integrity; prompt tests assert behavior, not exact strings), and adds a policies block (never commit unless asked, never silently swallow errors, no shotgun debugging), deduplicated against the existing contract.

Note: touches prompts/system/system-prompt.md alongside #2657 and #2658 (distinct sections); rebase if flagged.

Closes #2659

…ne, and a policies block

- Extended the system-prompt verification step with V1-V3 scope tiering (tier the scope, never the rigor) and the rule that 'should pass' is not verification.
- Added test-discipline lines the prompt lacked: no fixed sleeps (subscribe to the exact event/state change with a bounded timeout), stub-contract integrity (never isolate so heavily the integration cannot fail), and prompt tests must assert behavior rather than pin an exact sentence.
- Added a policies block (never commit unless explicitly asked, never silently swallow errors, no shotgun debugging), deduplicated against the existing CONTRACT and intent gate.
Copilot AI review requested due to automatic review settings June 15, 2026 13:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Extends the coding-agent system prompt with stricter verification/test-discipline guidance and adds an explicit policies section to constrain agent behavior.

Changes:

  • Added V1–V3 scope tiering guidance and additional test/verification discipline rules to the system prompt.
  • Introduced a new <policies> block (no commits unless asked, no silent error swallowing, no shotgun debugging).
  • Documented the changes in the package changelog.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
packages/coding-agent/src/prompts/system/system-prompt.md Adds new verification rules and a new <policies> block to the system prompt.
packages/coding-agent/CHANGELOG.md Records the system-prompt updates under Unreleased.
Comments suppressed due to low confidence (2)

packages/coding-agent/CHANGELOG.md:1

  • Typo: omp lacked should be corrected (unclear meaning as written).
    packages/coding-agent/CHANGELOG.md:1
  • This changelog bullet is extremely long and hard to scan. Consider splitting it into multiple bullets (e.g., one for scope tiering/verification wording, one for test-discipline rules, one for the new <policies> block) to improve readability and make future changelog diffs easier to review.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 247 to +253
</critical>

<policies>
- NEVER create a git commit unless the user explicitly asked for one.
- NEVER swallow an error silently — surface it, or handle it only for a deliberate, stated reason.
- NEVER shotgun-debug: no unrelated edits or blind retries to make a failure disappear. Find the cause, then fix it.
</policies>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f0c7efddf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Test behavior, not plumbing — things that can actually break.
- Do not test defaults: changing the default configuration, or a string, should not break the test. Assert logical behavior, not the current state.
- Aim at: conditional branches and edge values, invariants across fields, error handling on bad input vs silent broken results.
- Tier the scope, never the rigor: V1 single-file non-behavioral edit → diagnostics on that file; V2 single-domain behavioral edit → diagnostics on changed files + related tests + one run of the affected entry point; V3 multi-file / cross-cutting → diagnostics on every changed file + related tests + build + a manual exercise of the user-visible surface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve conflicting test-scope instructions

This new V2/V3 rule requires running related tests for behavioral and cross-cutting edits, but the existing line just above still says to “Run only tests you added or modified unless asked otherwise.” For the common case where the agent changes production code without touching tests, those instructions conflict and can lead the model to skip the existing regression tests that this tiering is trying to require. Please relax the earlier restriction or make this tiering rule explicitly override it.

Useful? React with 👍 / 👎.

</critical>

<policies>
- NEVER create a git commit unless the user explicitly asked for one.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile commit policy with orchestration mode

When a user triggers orchestration and repo instructions require commits, this new absolute policy conflicts with orchestrate-notice.md rule 6, which tells the agent to commit when “the repo workflow expects them”; that notice is injected for orchestrate magic-keyword turns via #createMagicKeywordNotices. The result is ambiguous guidance in exactly the large multi-phase workflow where commits are currently prescribed, so either this policy needs to allow repo-workflow-required commits or the orchestrator prompt needs to be updated consistently.

Useful? React with 👍 / 👎.

- Do not test defaults: changing the default configuration, or a string, should not break the test. Assert logical behavior, not the current state.
- Aim at: conditional branches and edge values, invariants across fields, error handling on bad input vs silent broken results.
- Tier the scope, never the rigor: V1 single-file non-behavioral edit → diagnostics on that file; V2 single-domain behavioral edit → diagnostics on changed files + related tests + one run of the affected entry point; V3 multi-file / cross-cutting → diagnostics on every changed file + related tests + build + a manual exercise of the user-visible surface.
- "Should pass" is not verification: reporting clean output without running the validator is a violation. Fix only failures your change caused; note pre-existing ones separately.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow explicit CI-fix tasks to repair existing failures

For /green and similar CI-repair prompts, the failing workflow on the current HEAD usually predates the agent’s current turn; ci-green-request.md still instructs the agent to inspect failed jobs and make the minimal fix until the branch is green. This new blanket “Fix only failures your change caused” rule can make the agent treat those same failures as pre-existing and only report them instead of fixing them, so it should be scoped to validators run while verifying the agent’s own changes or explicitly exempt user-requested failure-repair tasks.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prompts: verification tiering + test discipline + policies block

2 participants