Skip to content

T046: iris-adapter-llm-anthropic - standalone Anthropic Messages API adapter#45

Open
likith1908 wants to merge 3 commits into
T045-llm-openaifrom
T046-llm-anthropic
Open

T046: iris-adapter-llm-anthropic - standalone Anthropic Messages API adapter#45
likith1908 wants to merge 3 commits into
T045-llm-openaifrom
T046-llm-anthropic

Conversation

@likith1908

Copy link
Copy Markdown
Contributor

Summary

Implements T046 - the iris-adapter-llm-anthropic workspace package providing a standalone Anthropic LLM adapter. Does not inherit from OpenAICompatProvider; calls the Anthropic Messages API directly. Stacked on #44.

  • packages/iris-adapters/llm-anthropic/src/iris_adapter_llm_anthropic/client.py: AnthropicProvider implementing the full LLMProvider protocol. Sends POST https://api.anthropic.com/v1/messages with x-api-key and anthropic-version: 2023-06-01 headers. Structured output via Anthropic tool use: forces tool_choice: {type: "tool", name: "structured_output"} and extracts the tool_use content block.
  • packages/iris-adapters/llm-anthropic/tests/test_unit.py: 30 unit tests covering all C-LLM-001..010 contract points using a mock httpx.AsyncClient. Includes dual content-filter detection (by error type and by message keyword), tool_use_found sentinel logic, and stop_reason: "max_tokens" handling.
  • packages/iris-adapters/llm-anthropic/tests/test_live.py: 3 live tests gated on IRIS_LLM_LIVE_ANTHROPIC=1, verified against the real Anthropic API.
  • pyproject.toml: iris_adapter_llm_anthropic registered in importlinter root_packages, layers, independence, forbidden, mypy strict overrides, and coverage source. Renamed from scaffold iris_llm_anthropic.
  • tests/001-project-scaffold/: workspace, import-linter, and mypy scaffold tests updated for the renamed module.
  • .env.example: IRIS_LLM_ANTHROPIC_API_KEY and optional model env vars documented.

Task reference

  • Task ID: T046
  • Workstream: 004-llm-adapter-set

Acceptance criteria

T046

  • AnthropicProvider.from_env() constructs from env vars; raises RuntimeError if IRIS_LLM_ANTHROPIC_API_KEY is missing
  • Endpoint: https://api.anthropic.com/v1/messages with x-api-key header and anthropic-version: 2023-06-01
  • Structured output uses Anthropic tool use: tools + tool_choice: {type: "tool", name: "structured_output"}; response validated from the tool_use content block
  • stop_reason: "max_tokens" raises LLMContextWindowExceeded
  • Content filter detected via error type (output_blocked, content_policy_violation) and message keyword fallback
  • All C-LLM-001..010 contract points covered by 30 unit tests
  • Live tests pass against real Anthropic API (IRIS_LLM_LIVE_ANTHROPIC=1): 3/3 passed

PR checklist

  • Every acceptance criterion in the task's tasks.md entry is satisfied.
  • docs-ci workflow passes (markdown lint + tasks structural check).
  • No em-dashes introduced in new prose.
  • All internal links resolve.
  • No secrets, credentials, or personal names introduced.

Rebase note

This PR is stacked on #44. Once #44 merges, rebase onto main before merging this one.

Notes for the reviewer

No OpenAICompatProvider inheritance: Anthropic uses a different request/response shape, auth headers, and content structure. The adapter is standalone but still imports RetryConfig and with_retry from iris_adapter_llm_shared (the independence contract only covers concrete adapters, not the shared helper).

Tool use for structured output: Unlike OpenAI JSON-mode, Anthropic structured output works by defining a tool schema and forcing the model to call it. The tool_use_found boolean sentinel distinguishes two null cases: tool block missing entirely vs. block present with null input.

stop_reason semantics: "max_tokens" means the output budget was exhausted (not the context window). HTTP 400 indicates context window overflow. The error message reflects this distinction.

Default model: claude-haiku-4-5-20251001 for both chat and extraction slots. Overridable via IRIS_LLM_ANTHROPIC_MODEL_CHAT and IRIS_LLM_ANTHROPIC_MODEL_EXTRACT.

Standalone AnthropicProvider (no OpenAICompatProvider base). Bearer auth
via x-api-key header; structured output via Anthropic tool-use pattern;
model routing via model_hint. 30 unit tests covering C-LLM-001..010;
3 live tests verified against real API (IRIS_LLM_LIVE_ANTHROPIC=1).

Fixes from code review: is-not-None guard for max_output_tokens; correct
stop_reason=max_tokens message; empty content list raises LLMUnavailable;
content-filter detection checks error.type + broader keyword set; tool_use
null-input distinguished from missing block.

@anmolg1997 anmolg1997 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed on the stack (own diff against T045-llm-openai). The standalone implementation is well done: it mirrors the base class shape (complete then with_retry then _do_complete then _call then _map_result), reuses with_retry and instrument_complete from the shared package rather than forking them, sets llm.retry_count on the span, and the tool-use structured-output path is correct - forced tool_choice on a structured_output tool, validates the tool_use input via model_validate, and C-LLM-005 / C-LLM-006 are tested (success, missing block, null input, invalid input). Good model default (claude-haiku-4-5-20251001) and max_tokens defaulting since Anthropic requires it. One change before merge, plus a forward note and a nit.

1. Test the standalone error-mapping branches

Because Anthropic does not inherit OpenAICompatProvider, it reimplements _raise_for_status and the response-shape guards - and several of those branches are currently untested (the adapter sits at 90 percent with exactly these lines uncovered):

  • plain 400 -> LLMInvalidRequest (the non-content-filter path)
  • 500 -> LLMUnavailable, and the catch-all 4xx -> LLMUnavailable
  • non-JSON body -> LLMUnavailable (_parse_response)
  • missing keys / wrong shape -> LLMUnavailable (_map_result)
  • stop_sequences wiring when request.stop is set

The llm-shared base class has an explicit test for every one of these, which is why Azure and OpenAI get them for free. The standalone path should meet the same bar so a regression in the typed-error mapping is actually caught. These are cheap mock-response tests in the style you already have for the 401/429 cases. (Side benefit: it lifts the thin coverage margin - the stack reads 94.95 percent on my machine only because tesseract is not installed locally so ocr-local is undercovered here; CI has the binary and test-cov is green, but the Anthropic branches are the real drag and testing them helps both correctness and the margin.)

2. C-LLM-009 reading - flag for T048, not a blocker here

All three adapters (and the merged base) map the truncation signal (stop_reason max_tokens / finish_reason length) to LLMContextWindowExceeded with a generic message, and map a too-large input to LLMInvalidRequest via the 400 path. But C-LLM-009's wording is 'a request whose prompt plus max_output_tokens exceeds the model's window raises LLMContextWindowExceeded' with 'the requested total and the model's limit' in the message - that reads like the pre-flight input-too-large rejection, which today surfaces as LLMInvalidRequest with no numbers. How are you reading that clause: the truncation signal, or the input-too-large rejection? Worth nailing down before T048 so the contract suite and the adapters agree. Not introduced by this PR and not in T046's acceptance, so it does not block here.

3. Nit (optional)

_require_env is now copy-pasted in all three adapter clients. Could hoist to iris-llm-shared as a tiny helper. Trivial, your call.

Net: solid adapter. Add the error-branch tests and this is good to go. The C-LLM-009 question is the more interesting one but it belongs to the contract suite.

@likith1908

Copy link
Copy Markdown
Contributor Author

Re: C-LLM-009 reading

The clause wording describes the pre-flight case (input too large before generation starts), but that is not what is implemented. What the adapters actually do:

  • stop_reason: max_tokens / finish_reason: length in a 200 response raises LLMContextWindowExceeded - output truncation signal.
  • HTTP 400 raises LLMInvalidRequest, regardless of whether the rejection was caused by an oversized prompt or a bad parameter.

So LLMContextWindowExceeded currently means "response was cut short mid-generation", not "input was rejected before generation."

Implementing the pre-flight case properly would need token counting before each request (tokenizer library or dry-run API call) plus the model's context limit at runtime for the error message.

Do we want the pre-flight check as part of T048, or leave it as a future task?

…tests

- Add iris_adapter_llm_shared/env.py with require_env helper
- Re-export require_env from iris_adapter_llm_shared.__init__
- Remove local _require_env from llm-anthropic, llm-openai, llm-azure-openai
- Add 5 missing error-branch tests to llm-anthropic: server error (5xx),
  unexpected 4xx, non-JSON body, missing content key, stop_sequences forwarding
@likith1908

Copy link
Copy Markdown
Contributor Author

Pushed the changes from the review:

  • require_env hoisted to iris-llm-shared/env.py and re-exported from __init__; local _require_env removed from all three adapter clients.
  • 5 error-branch tests added to llm-anthropic: server error (5xx), unexpected 4xx, non-JSON body, missing content key, and stop_sequences forwarding.

@likith1908 likith1908 marked this pull request as ready for review June 26, 2026 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants