This repository was archived by the owner on May 19, 2026. It is now read-only.
fix(classify): LLM fallback bypasses cascade short-circuit (closes #99)#100
Merged
bobmatnyc merged 2 commits intoMay 19, 2026
Merged
Conversation
…bmatnyc#99) The pipeline-level LLM fallback called `Engine::classify`, which re-runs `classify_sync` first and returns the same low-confidence tier-1-3 verdict that triggered the fallback. The HTTP call to the LLM was therefore never made — every catch-all `maintenance / 0.3` commit kept its original verdict and the overwrite-guard logged "did not improve confidence" once per commit. - Add `Engine::llm_classify_only` for direct LLM dispatch, bypassing tiers 0–3.5. - Add `LlmClassifier::has_api_key` and `Engine::llm_has_api_key` so the pipeline can warn once at startup when `use_llm: true` is set but no `OPENAI_API_KEY` / `OPENROUTER_API_KEY` is resolved (previously this was silent). - Switch the pipeline's fan-out to `llm_classify_only`. The overwrite- guard (`r.confidence > original_conf`) already handles empty/failed LLM verdicts correctly. - Add a wiremock test proving the LLM endpoint is actually hit, plus unit tests for the new accessors. Co-Authored-By: RuFlo <ruv@ruv.net>
…ckfill, env-guard Folds in two hive-review findings against 52c2932: 1. `Engine::llm_classify_only` now backfills `ticket_id` from the message when the LLM verdict omits it (`LlmClassifier::classify` always returns `ticket_id: None`). Previously, a low-confidence regex verdict carrying `ticket_id=Some("PROJ-1234")` would lose that ID if the LLM result won the pipeline's overwrite-guard — newly observable because this PR is the first time the LLM path actually fires for the catch-all bucket. 2. `llm_has_api_key_signals_misconfiguration` now uses an RAII `EnvVarGuard` (mirrors `core::config::validator::tests`) so an assertion panic between `remove_var` and the restore can't leak the cleared `OPENAI_API_KEY` state to other parallel tests. Plus a contract-pinning test: `classify_does_not_set_ticket_id` asserts the raw `LlmClassifier` continues to return `ticket_id: None`, so a future change that starts surfacing ticket IDs from the LLM verdict will fail loudly and prompt revisiting the engine-level backfill. Co-Authored-By: RuFlo <ruv@ruv.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Engine::classify, which was re-runningclassify_syncand short-circuiting on the same low-confidence tier-1-3 verdict that triggered the fallback (issue LLM fallback in classify pipeline never fires for low-confidence catch-all verdicts — engine.classify short-circuits on classify_sync #99).Engine::llm_classify_only(direct LLM dispatch, backfillsticket_idfrom the message) andEngine::llm_has_api_key(so the pipeline can warn once at startup whenuse_llm: trueis set but no API key is resolved, instead of silently producing one warn-per-commit).LlmClassifier::classify, regression test onEngine::llm_classify_only, contract-pinning test on the raw classifier'sticket_id: Nonebehavior, and a panic-safeEnvVarGuard(mirrorscore::config::validator::tests) for the env-mutating test.Why this matters
On a 2,926-commit corpus reported in the issue, 2,079 commits (71%) sat at the catch-all
maintenance / 0.3verdict that should have been bounced to the LLM. Every fallback attempt produced log linesoriginal_conf=0.3 new_conf=0.3— the smoking gun thatclassify_syncwas what came back, not the LLM. Average classification confidence was 0.43 instead of the 0.84 obtainable when the LLM actually fires.What changed
src/classify/classifier.rspub async fn llm_classify_only(&self, message: &str) -> Option<ClassificationResult>— bypasses tiers 0–3.5, resolves taxonomy.top_level on the verdict, and backfillsticket_idviaRegexMatcher::extract_ticket_idso a ticket reference carried by the original tier-1-3 verdict isn't silently lost when the LLM result wins the pipeline's overwrite-guard.pub fn llm_has_api_key(&self) -> Option<bool>—Some(true)configured,Some(false)enabled-but-no-key,Nonedisabled.Engine::classifynow delegates its LLM arm tollm_classify_only(no behavior change).src/classify/pipeline.rsif engine.config().use_llmblock: one-shotwarn!whenengine.llm_has_api_key() == Some(false).engine_ref.llm_classify_only(&message).await.unwrap_or_else(ClassificationResult::unclassified)instead ofengine_ref.classify(&message, is_merge).await. The existing overwrite-guard (r.confidence > original_conf) already handles the "LLM returned empty / no API key" case correctly becauseunwrap_or_else(...)producesconfidence=0.0.src/classify/tiers/llm.rsLlmClassifier::has_api_key()accessor.has_api_key_reflects_key_state,classify_returns_none_without_api_key,classify_dispatches_to_endpoint_when_keyed(wiremock),classify_does_not_set_ticket_id(pins the contract that justifies the engine-level backfill).src/classify/mod.rsllm_classify_only_returns_none_when_disabled(issue LLM fallback in classify pipeline never fires for low-confidence catch-all verdicts — engine.classify short-circuits on classify_sync #99 regression),llm_has_api_key_signals_misconfiguration(uses RAIIEnvVarGuard).Test plan
cargo buildcleancargo test— 341 tests pass (was 340; +6 new, -5 covered elsewhere; see commits)cargo clippy --all-targets -- -D warningscleancargo fmt --checkcleancargo doc --no-depscleanllm_classify_only_returns_none_when_disabledasserts that whenuse_llm: false, the method returnsNoneeven for messages the catch-all would match — otherwise the pipeline's overwrite-guard would see the same low-confidence verdict back and the LLM tier would never run.Notes for reviewers
ticket_iddata-loss path and the parallel-test env-var hazard; both fixes are in the second commit on this branch.Cargo.tomlorCHANGELOG.mdbumps in contributor PRs — none here.llm_*methods topub(crate)(keptpubfor library symmetry), per-verdict accept/reject counters on the fallback path.🤖 Generated with RuFlo