T046: iris-adapter-llm-anthropic - standalone Anthropic Messages API adapter#45
T046: iris-adapter-llm-anthropic - standalone Anthropic Messages API adapter#45likith1908 wants to merge 3 commits into
Conversation
Standalone AnthropicProvider (no OpenAICompatProvider base). Bearer auth via x-api-key header; structured output via Anthropic tool-use pattern; model routing via model_hint. 30 unit tests covering C-LLM-001..010; 3 live tests verified against real API (IRIS_LLM_LIVE_ANTHROPIC=1). Fixes from code review: is-not-None guard for max_output_tokens; correct stop_reason=max_tokens message; empty content list raises LLMUnavailable; content-filter detection checks error.type + broader keyword set; tool_use null-input distinguished from missing block.
anmolg1997
left a comment
There was a problem hiding this comment.
Reviewed on the stack (own diff against T045-llm-openai). The standalone implementation is well done: it mirrors the base class shape (complete then with_retry then _do_complete then _call then _map_result), reuses with_retry and instrument_complete from the shared package rather than forking them, sets llm.retry_count on the span, and the tool-use structured-output path is correct - forced tool_choice on a structured_output tool, validates the tool_use input via model_validate, and C-LLM-005 / C-LLM-006 are tested (success, missing block, null input, invalid input). Good model default (claude-haiku-4-5-20251001) and max_tokens defaulting since Anthropic requires it. One change before merge, plus a forward note and a nit.
1. Test the standalone error-mapping branches
Because Anthropic does not inherit OpenAICompatProvider, it reimplements _raise_for_status and the response-shape guards - and several of those branches are currently untested (the adapter sits at 90 percent with exactly these lines uncovered):
- plain 400 -> LLMInvalidRequest (the non-content-filter path)
- 500 -> LLMUnavailable, and the catch-all 4xx -> LLMUnavailable
- non-JSON body -> LLMUnavailable (_parse_response)
- missing keys / wrong shape -> LLMUnavailable (_map_result)
- stop_sequences wiring when request.stop is set
The llm-shared base class has an explicit test for every one of these, which is why Azure and OpenAI get them for free. The standalone path should meet the same bar so a regression in the typed-error mapping is actually caught. These are cheap mock-response tests in the style you already have for the 401/429 cases. (Side benefit: it lifts the thin coverage margin - the stack reads 94.95 percent on my machine only because tesseract is not installed locally so ocr-local is undercovered here; CI has the binary and test-cov is green, but the Anthropic branches are the real drag and testing them helps both correctness and the margin.)
2. C-LLM-009 reading - flag for T048, not a blocker here
All three adapters (and the merged base) map the truncation signal (stop_reason max_tokens / finish_reason length) to LLMContextWindowExceeded with a generic message, and map a too-large input to LLMInvalidRequest via the 400 path. But C-LLM-009's wording is 'a request whose prompt plus max_output_tokens exceeds the model's window raises LLMContextWindowExceeded' with 'the requested total and the model's limit' in the message - that reads like the pre-flight input-too-large rejection, which today surfaces as LLMInvalidRequest with no numbers. How are you reading that clause: the truncation signal, or the input-too-large rejection? Worth nailing down before T048 so the contract suite and the adapters agree. Not introduced by this PR and not in T046's acceptance, so it does not block here.
3. Nit (optional)
_require_env is now copy-pasted in all three adapter clients. Could hoist to iris-llm-shared as a tiny helper. Trivial, your call.
Net: solid adapter. Add the error-branch tests and this is good to go. The C-LLM-009 question is the more interesting one but it belongs to the contract suite.
|
Re: C-LLM-009 reading The clause wording describes the pre-flight case (input too large before generation starts), but that is not what is implemented. What the adapters actually do:
So Implementing the pre-flight case properly would need token counting before each request (tokenizer library or dry-run API call) plus the model's context limit at runtime for the error message. Do we want the pre-flight check as part of T048, or leave it as a future task? |
…tests - Add iris_adapter_llm_shared/env.py with require_env helper - Re-export require_env from iris_adapter_llm_shared.__init__ - Remove local _require_env from llm-anthropic, llm-openai, llm-azure-openai - Add 5 missing error-branch tests to llm-anthropic: server error (5xx), unexpected 4xx, non-JSON body, missing content key, stop_sequences forwarding
|
Pushed the changes from the review:
|
Summary
Implements T046 - the
iris-adapter-llm-anthropicworkspace package providing a standalone Anthropic LLM adapter. Does not inherit fromOpenAICompatProvider; calls the Anthropic Messages API directly. Stacked on #44.packages/iris-adapters/llm-anthropic/src/iris_adapter_llm_anthropic/client.py:AnthropicProviderimplementing the fullLLMProviderprotocol. SendsPOST https://api.anthropic.com/v1/messageswithx-api-keyandanthropic-version: 2023-06-01headers. Structured output via Anthropic tool use: forcestool_choice: {type: "tool", name: "structured_output"}and extracts thetool_usecontent block.packages/iris-adapters/llm-anthropic/tests/test_unit.py: 30 unit tests covering all C-LLM-001..010 contract points using a mockhttpx.AsyncClient. Includes dual content-filter detection (by error type and by message keyword),tool_use_foundsentinel logic, andstop_reason: "max_tokens"handling.packages/iris-adapters/llm-anthropic/tests/test_live.py: 3 live tests gated onIRIS_LLM_LIVE_ANTHROPIC=1, verified against the real Anthropic API.pyproject.toml:iris_adapter_llm_anthropicregistered in importlinter root_packages, layers, independence, forbidden, mypy strict overrides, and coverage source. Renamed from scaffoldiris_llm_anthropic.tests/001-project-scaffold/: workspace, import-linter, and mypy scaffold tests updated for the renamed module..env.example:IRIS_LLM_ANTHROPIC_API_KEYand optional model env vars documented.Task reference
T046004-llm-adapter-setAcceptance criteria
T046
AnthropicProvider.from_env()constructs from env vars; raisesRuntimeErrorifIRIS_LLM_ANTHROPIC_API_KEYis missinghttps://api.anthropic.com/v1/messageswithx-api-keyheader andanthropic-version: 2023-06-01tools+tool_choice: {type: "tool", name: "structured_output"}; response validated from thetool_usecontent blockstop_reason: "max_tokens"raisesLLMContextWindowExceededoutput_blocked,content_policy_violation) and message keyword fallbackIRIS_LLM_LIVE_ANTHROPIC=1): 3/3 passedPR checklist
tasks.mdentry is satisfied.docs-ciworkflow passes (markdown lint + tasks structural check).Rebase note
This PR is stacked on #44. Once #44 merges, rebase onto
mainbefore merging this one.Notes for the reviewer
No
OpenAICompatProviderinheritance: Anthropic uses a different request/response shape, auth headers, and content structure. The adapter is standalone but still importsRetryConfigandwith_retryfromiris_adapter_llm_shared(the independence contract only covers concrete adapters, not the shared helper).Tool use for structured output: Unlike OpenAI JSON-mode, Anthropic structured output works by defining a tool schema and forcing the model to call it. The
tool_use_foundboolean sentinel distinguishes two null cases: tool block missing entirely vs. block present with null input.stop_reasonsemantics:"max_tokens"means the output budget was exhausted (not the context window). HTTP 400 indicates context window overflow. The error message reflects this distinction.Default model:
claude-haiku-4-5-20251001for both chat and extraction slots. Overridable viaIRIS_LLM_ANTHROPIC_MODEL_CHATandIRIS_LLM_ANTHROPIC_MODEL_EXTRACT.