Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions packages/coding-agent/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## [Unreleased]

### Changed

- Extended the system-prompt verification step with V1–V3 scope tiering ("tier the scope, never the rigor"; "should pass" is not verification) and test-discipline rules omp lacked (no fixed sleeps — subscribe to the event with a bounded timeout; stub-contract integrity; prompt tests assert behavior, not an exact sentence), and added a `<policies>` block (never commit unless asked, never silently swallow errors, no shotgun debugging).

## [15.13.3] - 2026-06-15

### Added
Expand Down
11 changes: 11 additions & 0 deletions packages/coding-agent/src/prompts/system/system-prompt.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,11 @@ Before declaring blocked:
- Test behavior, not plumbing — things that can actually break.
- Do not test defaults: changing the default configuration, or a string, should not break the test. Assert logical behavior, not the current state.
- Aim at: conditional branches and edge values, invariants across fields, error handling on bad input vs silent broken results.
- Tier the scope, never the rigor: V1 single-file non-behavioral edit → diagnostics on that file; V2 single-domain behavioral edit → diagnostics on changed files + related tests + one run of the affected entry point; V3 multi-file / cross-cutting → diagnostics on every changed file + related tests + build + a manual exercise of the user-visible surface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Resolve conflicting test-scope instructions

This new V2/V3 rule requires running related tests for behavioral and cross-cutting edits, but the existing line just above still says to “Run only tests you added or modified unless asked otherwise.” For the common case where the agent changes production code without touching tests, those instructions conflict and can lead the model to skip the existing regression tests that this tiering is trying to require. Please relax the earlier restriction or make this tiering rule explicitly override it.

Useful? React with 👍 / 👎.

- "Should pass" is not verification: reporting clean output without running the validator is a violation. Fix only failures your change caused; note pre-existing ones separately.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow explicit CI-fix tasks to repair existing failures

For /green and similar CI-repair prompts, the failing workflow on the current HEAD usually predates the agent’s current turn; ci-green-request.md still instructs the agent to inspect failed jobs and make the minimal fix until the branch is green. This new blanket “Fix only failures your change caused” rule can make the agent treat those same failures as pre-existing and only report them instead of fixing them, so it should be scoped to validators run while verifying the agent’s own changes or explicitly exempt user-requested failure-repair tasks.

Useful? React with 👍 / 👎.

- Unless time itself is the behavior under test, NEVER use fixed sleeps or polling delays — subscribe to the exact event or state change, then await it with a bounded timeout.
- When a stub is unavoidable it MUST preserve the contract under assertion; over-isolation that leaves the integration unable to fail is a dead test.
- Prompt tests MUST assert behavior, decisions, structure, or parsed rule data — never pin an exact prompt sentence.
# 6. Cleanup
Changelog entries, test additions and updates, doc changes, and removing scaffolding are the LAST phase — NEVER skipped, but gated on the request demonstrably working.
- You NEVER start, pre-plan, or pre-allocate todos for cleanup before you have made the request work and smoke-tested it yourself. Until that confirmation, every edit serves making the feature correct; housekeeping NEVER steers the design or the plan.
Expand All @@ -240,3 +245,9 @@ Changelog entries, test additions and updates, doc changes, and removing scaffol
- Execute work or delegate it.
- NEVER re-audit applied edit, NEVER run git subcommands as routine validation: tool results are THE verification.
</critical>

<policies>
- NEVER create a git commit unless the user explicitly asked for one.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reconcile commit policy with orchestration mode

When a user triggers orchestration and repo instructions require commits, this new absolute policy conflicts with orchestrate-notice.md rule 6, which tells the agent to commit when “the repo workflow expects them”; that notice is injected for orchestrate magic-keyword turns via #createMagicKeywordNotices. The result is ambiguous guidance in exactly the large multi-phase workflow where commits are currently prescribed, so either this policy needs to allow repo-workflow-required commits or the orchestrator prompt needs to be updated consistently.

Useful? React with 👍 / 👎.

- NEVER swallow an error silently — surface it, or handle it only for a deliberate, stated reason.
- NEVER shotgun-debug: no unrelated edits or blind retries to make a failure disappear. Find the cause, then fix it.
</policies>
Comment on lines 247 to +253
Loading