fix(tuning): harden extraction runs and prompts#131
Conversation
Honor caller request IDs, isolate timed-out extractor runs, enforce hard LiteLLM timeouts, and add prompt updates for extractor/consolidator failures found during the SWE-bench tuning loop.
📝 WalkthroughWalkthroughThis PR hardens extraction reliability and request lifecycle safety by introducing caller-supplied request IDs for idempotency, hard wall-clock timeout enforcement on LLM calls with executor-based cancellation, conditional storage updates to prevent late outputs from overwriting timeout states, and updated prompts for playbook extraction and consolidation with refined decision semantics. ChangesTimeout Safety and Request Lifecycle
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@reflexio/models/api_schema/domain/entities.py`:
- Around line 559-565: The request_id field currently accepts any string and can
be persisted with only-whitespace or control characters; update handling so
caller-provided request_id is validated/normalized before use: add a helper
(e.g., normalize_request_id or validate_request_id) and invoke it where the code
currently uses request_id || uuid (GenerationService), trimming surrounding
whitespace, rejecting empty/control-only values, and only accepting a safe
pattern (e.g., alphanumeric plus -._) — if validation fails, fall back to
generating a UUID; reference the request_id attribute on the entity and the
GenerationService code path that chooses request_id or uuid to locate where to
add this normalization.
In `@reflexio/server/prompt/prompt_bank/playbook_consolidation/v2.3.1.prompt.md`:
- Around line 30-33: Update the prompt schema text to explicitly require integer
DB ids for the decision fields: ensure `new_id` remains required and add that
`reject_new.superseded_by_existing_id` and `differentiate.existing_id` must be
integer DB `user_playbook_id` values (not label tokens like EXISTING-N) and that
`differentiate.existing_id` is mandatory; also clarify that
`archive_existing_ids` is position-based and distinct from the strict DB-id
fields so it must not be treated the same as
`superseded_by_existing_id`/`existing_id`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: bdc4f51b-ee62-469e-9cdf-83fbf2cda3c1
📒 Files selected for processing (19)
reflexio/models/api_schema/domain/entities.pyreflexio/server/llm/litellm_client.pyreflexio/server/prompt/prompt_bank/playbook_consolidation/v2.3.0.prompt.mdreflexio/server/prompt/prompt_bank/playbook_consolidation/v2.3.1.prompt.mdreflexio/server/prompt/prompt_bank/playbook_extraction_context/v4.2.0.prompt.mdreflexio/server/prompt/prompt_bank/playbook_extraction_context/v4.2.1.prompt.mdreflexio/server/prompt/prompt_bank/playbook_extraction_context/v4.2.2.prompt.mdreflexio/server/services/base_generation_service.pyreflexio/server/services/extraction/resumable_agent.pyreflexio/server/services/generation_service.pyreflexio/server/services/playbook/playbook_consolidator.pyreflexio/server/services/storage/sqlite_storage/_agent_run.pyreflexio/server/services/storage/storage_base/_agent_run.pytests/server/llm/test_litellm_client_unit.pytests/server/services/extraction/test_resumable_agent.pytests/server/services/playbook/test_consolidation_output.pytests/server/services/prompt/test_prompt_manager.pytests/server/services/storage/sqlite_storage/test_agent_run_storage.pytests/server/services/test_prompt_model_mapping.py
| request_id: str | None = None | ||
| """Optional caller-supplied request id for idempotent/local tracing. | ||
|
|
||
| If omitted, the backend generates a UUID as before. Supplying this is useful | ||
| for async callers that need to observe the exact request in extraction | ||
| state without waiting for the full publish response. | ||
| """ |
There was a problem hiding this comment.
Validate/normalize caller-provided request_id before using it as a persisted key.
At Line 559, request_id accepts unrestricted str. Because the runtime path uses request_id or uuid (GenerationService Line 175), whitespace/control-character values are treated as valid IDs and can propagate into storage/correlation unexpectedly.
Suggested patch
class PublishUserInteractionRequest(BaseModel):
request_id: str | None = None
+
+ `@field_validator`("request_id", mode="before")
+ `@classmethod`
+ def normalize_request_id(cls, value: str | None) -> str | None:
+ if value is None:
+ return None
+ normalized = str(value).strip()
+ return normalized or None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| request_id: str | None = None | |
| """Optional caller-supplied request id for idempotent/local tracing. | |
| If omitted, the backend generates a UUID as before. Supplying this is useful | |
| for async callers that need to observe the exact request in extraction | |
| state without waiting for the full publish response. | |
| """ | |
| request_id: str | None = None | |
| """Optional caller-supplied request id for idempotent/local tracing. | |
| If omitted, the backend generates a UUID as before. Supplying this is useful | |
| for async callers that need to observe the exact request in extraction | |
| state without waiting for the full publish response. | |
| """ | |
| `@field_validator`("request_id", mode="before") | |
| `@classmethod` | |
| def normalize_request_id(cls, value: str | None) -> str | None: | |
| if value is None: | |
| return None | |
| normalized = str(value).strip() | |
| return normalized or None |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@reflexio/models/api_schema/domain/entities.py` around lines 559 - 565, The
request_id field currently accepts any string and can be persisted with
only-whitespace or control characters; update handling so caller-provided
request_id is validated/normalized before use: add a helper (e.g.,
normalize_request_id or validate_request_id) and invoke it where the code
currently uses request_id || uuid (GenerationService), trimming surrounding
whitespace, rejecting empty/control-only values, and only accepting a safe
pattern (e.g., alphanumeric plus -._) — if validation fails, fall back to
generating a UUID; reference the request_id attribute on the entity and the
GenerationService code path that chooses request_id or uuid to locate where to
add this normalization.
| - **reject_new** — an EXISTING row already covers NEW or makes NEW redundant. Name the EXISTING id that wins via `superseded_by_existing_id`. Storage-stability tie-break: when same-situation opposite advice is balanced, default here. | ||
|
|
||
| - **differentiate** — both valid in distinct contexts (typically same trigger, opposite advice, where the contexts differ). Set `refined_new_trigger` and `refined_existing_trigger` to be strictly narrower than the originals AND mutually exclusive. | ||
|
|
There was a problem hiding this comment.
Clarify and require DB-ID fields for reject_new/differentiate to match runtime schema.
The prompt now hard-requires new_id, but it still leaves reject_new.superseded_by_existing_id and differentiate.existing_id under-specified. Please explicitly require both as integer DB user_playbook_id values (not EXISTING-N labels), and require differentiate.existing_id to be present. This avoids schema rejections and misrouted decisions.
Suggested prompt patch
- **reject_new** — an EXISTING row already covers NEW or makes NEW redundant. Name the EXISTING id that wins via `superseded_by_existing_id`. Storage-stability tie-break: when same-situation opposite advice is balanced, default here.
+ - **reject_new** — an EXISTING row already covers NEW or makes NEW redundant. Set `superseded_by_existing_id` to the EXISTING row's integer DB id (`user_playbook_id`), not an `EXISTING-N` label. Storage-stability tie-break: when same-situation opposite advice is balanced, default here.
- - **differentiate** — both valid in distinct contexts (typically same trigger, opposite advice, where the contexts differ). Set `refined_new_trigger` and `refined_existing_trigger` to be strictly narrower than the originals AND mutually exclusive.
+ - **differentiate** — both valid in distinct contexts (typically same trigger, opposite advice, where the contexts differ). Include `existing_id` as the EXISTING row's integer DB id (`user_playbook_id`), and set `refined_new_trigger` and `refined_existing_trigger` to be strictly narrower than the originals AND mutually exclusive.Based on learnings, archive_existing_ids is position-based while superseded_by_existing_id/existing_id are strict DB ids and must not be treated uniformly.
Also applies to: 54-57
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@reflexio/server/prompt/prompt_bank/playbook_consolidation/v2.3.1.prompt.md`
around lines 30 - 33, Update the prompt schema text to explicitly require
integer DB ids for the decision fields: ensure `new_id` remains required and add
that `reject_new.superseded_by_existing_id` and `differentiate.existing_id` must
be integer DB `user_playbook_id` values (not label tokens like EXISTING-N) and
that `differentiate.existing_id` is mandatory; also clarify that
`archive_existing_ids` is position-based and distinct from the strict DB-id
fields so it must not be treated the same as
`superseded_by_existing_id`/`existing_id`.
Source: Learnings
Summary
Verification
uv run ruff checkon staged Python filesuv run pyrighton staged Python filesuv run pytest --no-cov -q tests/server/llm/test_litellm_client_unit.py tests/server/services/extraction/test_resumable_agent.py tests/server/services/storage/sqlite_storage/test_agent_run_storage.py tests/server/services/playbook/test_consolidation_output.py tests/server/services/prompt/test_prompt_manager.py tests/server/services/test_prompt_model_mapping.py tests/eval/extraction/test_extraction_eval.pyNotes
This replaces the closed, unmerged PR #130 with a fresh PR containing the latest branch head
ee57851.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation