fix: content-anchored fixture matching + relaxed-turnIndex divergence warn/opt-out#276
Merged
Conversation
commit: |
f571084 to
74d4ffc
Compare
…ivergence detection in router Anchor fixture matcher selection on request content rather than positional turnIndex, and add relaxed-turnIndex divergence detection: warn when a match is found by content but diverges from the recorded turnIndex, with an AIMOCK_STRICT_TURN_INDEX opt-out, per-fixture-identity throttle, and accurate matchedBy reporting for predicate/regex fixtures.
Pass recordMatchOptions and the logger through the server record path into each provider handler (messages, responses, gemini, bedrock, bedrock-converse, cohere, ollama) so recorded fixtures carry the content-anchored match metadata.
Cover content-anchored fixture matching, record-path wiring, and relaxed turnIndex detect/warn/opt-out behavior (red-green).
Document content-anchored relaxed-turnIndex replay matching, the divergence warning direction, and the AIMOCK_STRICT_TURN_INDEX opt-out.
74d4ffc to
1bd3cde
Compare
Merged
jpr5
added a commit
that referenced
this pull request
Jun 24, 2026
Release **v1.34.0** — bundles the two grouped PRs now on `main`. ## Included - **#276** (`Changed`) — content-anchored fixture matching; `turnIndex` is a non-fatal disambiguator, not a hard reject gate. New `turnIndexRelaxed` diagnostic + one-shot warn; `AIMOCK_STRICT_TURN_INDEX=1` restores strict replay. Record path stays strict. - **#279** (`Fixed`) — Gemini Interactions mock now emits the SDK 2.x event protocol on both streamed SSE and non-streaming paths; legacy 1.x recorded fixtures still parse. ## Mechanics - `package.json` → `1.34.0`; CHANGELOG `[Unreleased]` finalized under `## [1.34.0] - 2026-06-24`. - Merging to `main` triggers `publish-release.yml` → `npm publish` + tag `v1.34.0`. - Local gate green: prettier/eslint/build clean, 4055 tests pass. Minor bump (not v2): both changes ship with back-compat/opt-out.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR — what this is
What aimock does: records real LLM API calls once, then replays the recorded responses in tests — no live API, deterministic, fast.
The change in one sentence: matching a replay to a recorded fixture is now anchored on the content of the request (the messages/tools being sent), not on the turn number it happened to be recorded at.
Why: the old logic treated "this is turn #3" as a hard requirement. If your test's conversation drifted by even one turn from when it was recorded, replay would fail to match and error out — brittle. Real conversations reorder, retry, and branch. Content is the stable identity; turn position is not.
What actually changed
turnIndexis no longer a hard reject gate. A fixture matches if its content matches; turn position becomes a tiebreaker, not a veto.matchedByreason), so drift is never silent.AIMOCK_STRICT_TURN_INDEX=1restores the old strict behavior.Risk surfaces (public pkg, ~500k downloads/wk) → mitigations
matchedBy/describeMatchexplain why a fixture matched. We never hide the drift — we surface it.AIMOCK_STRICT_TURN_INDEX=1opt-out preserves old behavior → ships as minor, not major.matchedByfield + a throttled, human-readable warning explain the match decision.Bottom line: it trades a brittle, position-coupled gate for a content-anchored match that mirrors how conversations really behave — while keeping strict mode one env var away, recording untouched, and every divergence loud rather than silent. The empirical blast-radius check on our corpus showed zero wrong-fixture matches.
Summary
Makes aimock's fixture replay matching content-anchored: a fixture's recorded content/shape is the primary match key, and
turnIndexis no longer a hard reject gate. This kills a class of false-misses where a multi-bubble / multi-turn run that drifts past its scriptedturnIndexwas spuriously rejected (empty assistant response), even though the content matched. The record path is unchanged — strict per-turn capture is preserved viaMatchOptions.strictTurnIndex.A fixture that previously force-missed on
turnIndexwill now match if its content predicates match. Empirically measured by replaying the real fixture corpora (showcase d6 + aimock examples) through the old vs new matcher — 786 fixture sets / 9769 requests:assistantCount == turnIndex) — divergence happens only when a conversation runs off-by-N from its scripted turn.So this is a strict improvement on real corpora; it's a minor bump. (Content gating stays strict; multiple content matches → closest scripted
turnIndex; all-ahead → unpositioned fallback.)Not surprising anyone (detection + reversible opt-out)
For the divergence to be discoverable + reversible by the ~500k-downloads/wk user base:
AIMOCK_STRICT_TURN_INDEX=1— process opt-out that restores the legacy strict-turnIndex hard gate for replay (default = new content-anchored behavior). This reversible escape hatch is what keeps the change a minor bump.warnfires only when a served fixture'sturnIndexwas relaxed (i.e. the strict gate would have rejected it). Silent in programmatic use (logger default), so it never spams a test suite; for the typical record→replay user (who sits at the canonical position) divergence is rare, so the warn is a genuine, useful signal.turnIndexRelaxed/matchedByon the match diagnostic for inspection.Review
Converged via two full
cr-looppasses (content-anchored matching, then the warn package) — 7 unbiased agents/round + confirmation rounds + Procedure-3 audits (0 promotions). Notable fixes caught in review: WeakSet-identity throttle (predicate/RegExp fixtures no longer collide), accuratematchedByattribution, and the divergence-direction wording.Test plan
pnpm test— 4041 passed / 0 failed;tsc --noEmitclean; lint 0/0; build okKnown follow-ups (pre-existing, not this PR)
getSystemText/getTextContentpresent-but-empty""stray-newline edge underuseExactMatch.parts/contentderefs (gemini/cohere/responses/bedrock/bedrock-converse) → TypeError on malformed JSON.Not auto-merging — flagging the replay-semantics change for review.