Skip to content
This repository was archived by the owner on May 19, 2026. It is now read-only.

fix(classify): LLM fallback bypasses cascade short-circuit (closes #99)#100

Merged
bobmatnyc merged 2 commits into
bobmatnyc:mainfrom
maui314159:fix/issue-99-llm-fallback-shortcircuit
May 19, 2026
Merged

fix(classify): LLM fallback bypasses cascade short-circuit (closes #99)#100
bobmatnyc merged 2 commits into
bobmatnyc:mainfrom
maui314159:fix/issue-99-llm-fallback-shortcircuit

Conversation

@maui314159
Copy link
Copy Markdown
Contributor

Summary

  • Pipeline-level LLM fallback now dispatches directly to the LLM tier instead of going through Engine::classify, which was re-running classify_sync and short-circuiting on the same low-confidence tier-1-3 verdict that triggered the fallback (issue LLM fallback in classify pipeline never fires for low-confidence catch-all verdicts — engine.classify short-circuits on classify_sync #99).
  • Adds Engine::llm_classify_only (direct LLM dispatch, backfills ticket_id from the message) and Engine::llm_has_api_key (so the pipeline can warn once at startup when use_llm: true is set but no API key is resolved, instead of silently producing one warn-per-commit).
  • Adds wiremock test on LlmClassifier::classify, regression test on Engine::llm_classify_only, contract-pinning test on the raw classifier's ticket_id: None behavior, and a panic-safe EnvVarGuard (mirrors core::config::validator::tests) for the env-mutating test.

Why this matters

On a 2,926-commit corpus reported in the issue, 2,079 commits (71%) sat at the catch-all maintenance / 0.3 verdict that should have been bounced to the LLM. Every fallback attempt produced log lines original_conf=0.3 new_conf=0.3 — the smoking gun that classify_sync was what came back, not the LLM. Average classification confidence was 0.43 instead of the 0.84 obtainable when the LLM actually fires.

What changed

src/classify/classifier.rs

  • New: pub async fn llm_classify_only(&self, message: &str) -> Option<ClassificationResult> — bypasses tiers 0–3.5, resolves taxonomy.top_level on the verdict, and backfills ticket_id via RegexMatcher::extract_ticket_id so a ticket reference carried by the original tier-1-3 verdict isn't silently lost when the LLM result wins the pipeline's overwrite-guard.
  • New: pub fn llm_has_api_key(&self) -> Option<bool>Some(true) configured, Some(false) enabled-but-no-key, None disabled.
  • Engine::classify now delegates its LLM arm to llm_classify_only (no behavior change).

src/classify/pipeline.rs

  • Inside the if engine.config().use_llm block: one-shot warn! when engine.llm_has_api_key() == Some(false).
  • The futures stream calls engine_ref.llm_classify_only(&message).await.unwrap_or_else(ClassificationResult::unclassified) instead of engine_ref.classify(&message, is_merge).await. The existing overwrite-guard (r.confidence > original_conf) already handles the "LLM returned empty / no API key" case correctly because unwrap_or_else(...) produces confidence=0.0.

src/classify/tiers/llm.rs

  • New: LlmClassifier::has_api_key() accessor.
  • Tests: has_api_key_reflects_key_state, classify_returns_none_without_api_key, classify_dispatches_to_endpoint_when_keyed (wiremock), classify_does_not_set_ticket_id (pins the contract that justifies the engine-level backfill).

src/classify/mod.rs

Test plan

  • cargo build clean
  • cargo test — 341 tests pass (was 340; +6 new, -5 covered elsewhere; see commits)
  • cargo clippy --all-targets -- -D warnings clean
  • cargo fmt --check clean
  • cargo doc --no-deps clean
  • Regression test pins the bug: llm_classify_only_returns_none_when_disabled asserts that when use_llm: false, the method returns None even for messages the catch-all would match — otherwise the pipeline's overwrite-guard would see the same low-confidence verdict back and the LLM tier would never run.

Notes for reviewers

  • A hive-mode code review on the original fix flagged the ticket_id data-loss path and the parallel-test env-var hazard; both fixes are in the second commit on this branch.
  • Repo convention: no Cargo.toml or CHANGELOG.md bumps in contributor PRs — none here.
  • Follow-ups considered nice-to-have and not in this PR: a higher-level pipeline-end-to-end test with a mock LLM endpoint (requires adding an engine-level test seam), tightening llm_* methods to pub(crate) (kept pub for library symmetry), per-verdict accept/reject counters on the fallback path.

🤖 Generated with RuFlo

maui314159 and others added 2 commits May 18, 2026 18:32
…bmatnyc#99)

The pipeline-level LLM fallback called `Engine::classify`, which re-runs
`classify_sync` first and returns the same low-confidence tier-1-3
verdict that triggered the fallback. The HTTP call to the LLM was
therefore never made — every catch-all `maintenance / 0.3` commit kept
its original verdict and the overwrite-guard logged "did not improve
confidence" once per commit.

- Add `Engine::llm_classify_only` for direct LLM dispatch, bypassing
  tiers 0–3.5.
- Add `LlmClassifier::has_api_key` and `Engine::llm_has_api_key` so the
  pipeline can warn once at startup when `use_llm: true` is set but no
  `OPENAI_API_KEY` / `OPENROUTER_API_KEY` is resolved (previously this
  was silent).
- Switch the pipeline's fan-out to `llm_classify_only`. The overwrite-
  guard (`r.confidence > original_conf`) already handles empty/failed
  LLM verdicts correctly.
- Add a wiremock test proving the LLM endpoint is actually hit, plus
  unit tests for the new accessors.

Co-Authored-By: RuFlo <ruv@ruv.net>
…ckfill, env-guard

Folds in two hive-review findings against 52c2932:

1. `Engine::llm_classify_only` now backfills `ticket_id` from the message
   when the LLM verdict omits it (`LlmClassifier::classify` always
   returns `ticket_id: None`). Previously, a low-confidence regex
   verdict carrying `ticket_id=Some("PROJ-1234")` would lose that ID if
   the LLM result won the pipeline's overwrite-guard — newly observable
   because this PR is the first time the LLM path actually fires for
   the catch-all bucket.

2. `llm_has_api_key_signals_misconfiguration` now uses an RAII
   `EnvVarGuard` (mirrors `core::config::validator::tests`) so an
   assertion panic between `remove_var` and the restore can't leak the
   cleared `OPENAI_API_KEY` state to other parallel tests.

Plus a contract-pinning test: `classify_does_not_set_ticket_id` asserts
the raw `LlmClassifier` continues to return `ticket_id: None`, so a
future change that starts surfacing ticket IDs from the LLM verdict
will fail loudly and prompt revisiting the engine-level backfill.

Co-Authored-By: RuFlo <ruv@ruv.net>
@maui314159 maui314159 requested a review from bobmatnyc as a code owner May 18, 2026 22:52
@bobmatnyc bobmatnyc merged commit 640b95e into bobmatnyc:main May 19, 2026
4 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants