fix(classify): LLM fallback bypasses cascade short-circuit (closes #99) by maui314159 · Pull Request #100 · bobmatnyc/trusty-git-analytics

maui314159 · 2026-05-18T22:52:37Z

Summary

Pipeline-level LLM fallback now dispatches directly to the LLM tier instead of going through Engine::classify, which was re-running classify_sync and short-circuiting on the same low-confidence tier-1-3 verdict that triggered the fallback (issue LLM fallback in classify pipeline never fires for low-confidence catch-all verdicts — engine.classify short-circuits on classify_sync #99).
Adds Engine::llm_classify_only (direct LLM dispatch, backfills ticket_id from the message) and Engine::llm_has_api_key (so the pipeline can warn once at startup when use_llm: true is set but no API key is resolved, instead of silently producing one warn-per-commit).
Adds wiremock test on LlmClassifier::classify, regression test on Engine::llm_classify_only, contract-pinning test on the raw classifier's ticket_id: None behavior, and a panic-safe EnvVarGuard (mirrors core::config::validator::tests) for the env-mutating test.

Why this matters

On a 2,926-commit corpus reported in the issue, 2,079 commits (71%) sat at the catch-all maintenance / 0.3 verdict that should have been bounced to the LLM. Every fallback attempt produced log lines original_conf=0.3 new_conf=0.3 — the smoking gun that classify_sync was what came back, not the LLM. Average classification confidence was 0.43 instead of the 0.84 obtainable when the LLM actually fires.

What changed

src/classify/classifier.rs

New: pub async fn llm_classify_only(&self, message: &str) -> Option<ClassificationResult> — bypasses tiers 0–3.5, resolves taxonomy.top_level on the verdict, and backfills ticket_id via RegexMatcher::extract_ticket_id so a ticket reference carried by the original tier-1-3 verdict isn't silently lost when the LLM result wins the pipeline's overwrite-guard.
New: pub fn llm_has_api_key(&self) -> Option<bool> — Some(true) configured, Some(false) enabled-but-no-key, None disabled.
Engine::classify now delegates its LLM arm to llm_classify_only (no behavior change).

src/classify/pipeline.rs

Inside the if engine.config().use_llm block: one-shot warn! when engine.llm_has_api_key() == Some(false).
The futures stream calls engine_ref.llm_classify_only(&message).await.unwrap_or_else(ClassificationResult::unclassified) instead of engine_ref.classify(&message, is_merge).await. The existing overwrite-guard (r.confidence > original_conf) already handles the "LLM returned empty / no API key" case correctly because unwrap_or_else(...) produces confidence=0.0.

src/classify/tiers/llm.rs

New: LlmClassifier::has_api_key() accessor.
Tests: has_api_key_reflects_key_state, classify_returns_none_without_api_key, classify_dispatches_to_endpoint_when_keyed (wiremock), classify_does_not_set_ticket_id (pins the contract that justifies the engine-level backfill).

src/classify/mod.rs

Tests: llm_classify_only_returns_none_when_disabled (issue LLM fallback in classify pipeline never fires for low-confidence catch-all verdicts — engine.classify short-circuits on classify_sync #99 regression), llm_has_api_key_signals_misconfiguration (uses RAII EnvVarGuard).

Test plan

cargo build clean
cargo test — 341 tests pass (was 340; +6 new, -5 covered elsewhere; see commits)
cargo clippy --all-targets -- -D warnings clean
cargo fmt --check clean
cargo doc --no-deps clean
Regression test pins the bug: llm_classify_only_returns_none_when_disabled asserts that when use_llm: false, the method returns None even for messages the catch-all would match — otherwise the pipeline's overwrite-guard would see the same low-confidence verdict back and the LLM tier would never run.

Notes for reviewers

A hive-mode code review on the original fix flagged the ticket_id data-loss path and the parallel-test env-var hazard; both fixes are in the second commit on this branch.
Repo convention: no Cargo.toml or CHANGELOG.md bumps in contributor PRs — none here.
Follow-ups considered nice-to-have and not in this PR: a higher-level pipeline-end-to-end test with a mock LLM endpoint (requires adding an engine-level test seam), tightening llm_* methods to pub(crate) (kept pub for library symmetry), per-verdict accept/reject counters on the fallback path.

🤖 Generated with RuFlo

…bmatnyc#99) The pipeline-level LLM fallback called `Engine::classify`, which re-runs `classify_sync` first and returns the same low-confidence tier-1-3 verdict that triggered the fallback. The HTTP call to the LLM was therefore never made — every catch-all `maintenance / 0.3` commit kept its original verdict and the overwrite-guard logged "did not improve confidence" once per commit. - Add `Engine::llm_classify_only` for direct LLM dispatch, bypassing tiers 0–3.5. - Add `LlmClassifier::has_api_key` and `Engine::llm_has_api_key` so the pipeline can warn once at startup when `use_llm: true` is set but no `OPENAI_API_KEY` / `OPENROUTER_API_KEY` is resolved (previously this was silent). - Switch the pipeline's fan-out to `llm_classify_only`. The overwrite- guard (`r.confidence > original_conf`) already handles empty/failed LLM verdicts correctly. - Add a wiremock test proving the LLM endpoint is actually hit, plus unit tests for the new accessors. Co-Authored-By: RuFlo <ruv@ruv.net>

…ckfill, env-guard Folds in two hive-review findings against 52c2932: 1. `Engine::llm_classify_only` now backfills `ticket_id` from the message when the LLM verdict omits it (`LlmClassifier::classify` always returns `ticket_id: None`). Previously, a low-confidence regex verdict carrying `ticket_id=Some("PROJ-1234")` would lose that ID if the LLM result won the pipeline's overwrite-guard — newly observable because this PR is the first time the LLM path actually fires for the catch-all bucket. 2. `llm_has_api_key_signals_misconfiguration` now uses an RAII `EnvVarGuard` (mirrors `core::config::validator::tests`) so an assertion panic between `remove_var` and the restore can't leak the cleared `OPENAI_API_KEY` state to other parallel tests. Plus a contract-pinning test: `classify_does_not_set_ticket_id` asserts the raw `LlmClassifier` continues to return `ticket_id: None`, so a future change that starts surfacing ticket IDs from the LLM verdict will fail loudly and prompt revisiting the engine-level backfill. Co-Authored-By: RuFlo <ruv@ruv.net>

maui314159 and others added 2 commits May 18, 2026 18:32

maui314159 requested a review from bobmatnyc as a code owner May 18, 2026 22:52

bobmatnyc merged commit 640b95e into bobmatnyc:main May 19, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(classify): LLM fallback bypasses cascade short-circuit (closes #99)#100

fix(classify): LLM fallback bypasses cascade short-circuit (closes #99)#100
bobmatnyc merged 2 commits into
bobmatnyc:mainfrom
maui314159:fix/issue-99-llm-fallback-shortcircuit

maui314159 commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

maui314159 commented May 18, 2026

Summary

Why this matters

What changed

Test plan

Notes for reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants