Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The

### Added

- **Image content blocks for user messages (proposal 0015, spec v0.13.0).** `UserMessage.content` now accepts `str | list[ContentBlock]`. The block surface introduces `TextBlock`, `ImageBlock`, `ImageSourceURL`, `ImageSourceInline`, and the `ContentBlock` / `ImageSource` discriminated unions over the block / source `type` field. `ImageBlock` carries a `media_type` (required for inline sources; ignored for URL sources; typed as `str | None` so callers MAY pass any `image/*` type the bound model supports) and an optional `detail` hint (`"auto"` / `"low"` / `"high"`; `None` default omits the field from the wire so providers apply their own default). System, assistant, and tool messages stay text-string-only; image inputs are user-only in v1.
Comment thread
chris-colinsky marked this conversation as resolved.
Outdated
- **`OpenAIProvider` content-array wire mapping.** When `UserMessage.content` is a content-block sequence, the wire body uses OpenAI's `content` array per §8.1.1. `TextBlock → {type: "text", text}`. `ImageBlock` with a URL source maps to `{type: "image_url", image_url: {url, detail?}}`. `ImageBlock` with an inline source constructs an RFC 2397 `data:<media_type>;base64,<base64_data>` URI and goes through the same `image_url` entry shape. Inline bytes pass through unchanged — no inspection, transcoding, or re-encoding.
- **New error category `ProviderUnsupportedContentBlock` (non-transient).** Raised when the bound model rejects a content block type / media variant. Distinct from `ProviderInvalidRequest` (which covers spec-shape malformation): this category surfaces a *capability* mismatch, letting callers route differently (e.g., fall back to a multimodal-capable provider) without overloading the malformed-request category. Carries `block_type` ("image" / "audio" / "video") and `reason` (provider's human-readable message) when those are recoverable from the rejection. `OpenAIProvider` detects content rejection via HTTP 400 bodies — heuristic on `error.code` (known set: `image_content_not_supported`, `unsupported_image_media_type`, `audio_content_not_supported`, etc.), `error.type` (`image_parse_error`), and `error.message` ("does not support" + image/audio/video).
- **Structured output (proposal 0016, spec v0.14.0).** `Provider.complete()` now accepts an optional `response_schema` parameter — either a JSON Schema dict or a Pydantic `BaseModel` subclass. When supplied, the provider constrains the model's output to the schema and populates `Response.parsed` with the validated value (`dict` for dict-schema input, a `BaseModel` instance for class input). New `StructuredOutputInvalid` error category (non-transient by default) raises on JSON parse failure or schema validation failure; carries the requested schema, the raw response content, and a failure description.
- **`OpenAIProvider` native response_format wire path.** When `response_schema` is supplied, the chat-completions request body carries `response_format: { type: "json_schema", json_schema: { name, schema, strict } }`. The `strict` flag is determined by a deep recursive walk over the schema (object-property required-coverage rule across `anyOf` / `oneOf` / `allOf` and `$ref` targets, with cycle protection); unresolvable refs fall through to `strict: false`. The `name` field uses `schema.title` when present, otherwise a deterministic sha256-prefix hash.
- **`OpenAIProvider` prompt-augmentation fallback.** Constructor flag `force_prompt_augmentation_fallback: bool` (default `False`) and read-only inspect property `uses_prompt_augmentation_fallback: bool`. When the flag is on, structured-output calls build a fresh message list with a system directive containing the serialized schema, omit `response_format` from the wire, and validate the response post-receive. The caller's original `messages` list is never mutated. Use for OpenAI-compatible servers (older vLLM, some LM Studio releases, llama.cpp variants) that reject or silently ignore `response_format`.
Expand Down
111 changes: 111 additions & 0 deletions docs/concepts/llms.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,117 @@ on every object. Pydantic-derived schemas may need `model_config =
ConfigDict(extra="forbid")` on the class to get the
`additionalProperties: false` in the generated JSON Schema.

## Content blocks (multimodal user messages)

User messages carry content in one of two shapes: a plain text string,
or an ordered sequence of typed content blocks. The string form is the
common case. Blocks are how you mix non-text modalities into a single
turn. v1 defines two block types: text and image. Audio and video are
deferred to future proposals.

System, assistant, and tool messages stay text-string only. Image
inputs are user-only in v1; image outputs (assistant-message-borne
images, e.g. DALL-E-style generation) are out of scope.

### Text and image blocks

A text block is the array-form equivalent of a text-string message:
`TextBlock(text="describe this")`. A user message holding a single
text block is normatively equivalent to one with `content="describe
this"`.

An image block carries one source — URL or inline base64 — plus an
optional `detail` hint:

```python
from openarmature.llm import (
ImageBlock,
ImageSourceInline,
ImageSourceURL,
OpenAIProvider,
TextBlock,
UserMessage,
)


async def describe_image(provider: OpenAIProvider) -> str:
response = await provider.complete(
[
UserMessage(
content=[
ImageBlock(
source=ImageSourceURL(url="https://example.com/diagram.png"),
detail="high", # optional; omitted from wire when None
),
TextBlock(text="What does this diagram show?"),
]
)
]
)
return response.message.content
```

Block order is preserved on the wire. Providers vary in whether they
treat order as semantically meaningful (an image followed by its
describing text is a different signal from text followed by the
image); construct the sequence in the order you want the model to
perceive it.

### URL vs inline sources

- **URL source** (`ImageSourceURL`): the provider fetches the URL. Any
scheme the provider documents support for is valid (`http(s)://`,
`data:`, etc.). The framework passes it through unchanged.
- **Inline source** (`ImageSourceInline`): the image is sent as
base64-encoded bytes in the request body. The `media_type` field on
the surrounding `ImageBlock` is **required** for inline sources (and
ignored for URL sources). The framework constructs an RFC 2397
`data:<media_type>;base64,<bytes>` URI for the wire; it does not
inspect, transcode, or re-encode the bytes.

OpenAI, Anthropic, and Google all accept `image/png`, `image/jpeg`,
and `image/webp` as guaranteed media types. `media_type` is typed as
`str | None`, so callers MAY pass additional `image/*` types when
they know the bound model supports them; portable code sticks to the
three.

### The `detail` hint

`detail` is a per-image hint to the provider about processing
fidelity: `"auto"`, `"low"`, or `"high"`. The class default is `None`,
which **omits the field from the wire** and lets the provider apply
its own default (conceptually `"auto"`). Setting `detail="auto"`
explicitly on the spec block forces the wire to carry an explicit
`"auto"` — usually unnecessary, since the provider's default is the
same value.

### When the model can't handle the block

`provider_unsupported_content_block` raises when the bound model
rejects a content block type or media variant. Concrete cases:

- A text-only model (e.g., `gpt-3.5-turbo`) received an image block.
- The model supports images but not the requested `media_type`.
- The model supports the type but rejected the specific source variant
(a URL the provider can't fetch, for example).

The error category is **non-transient**: retrying without changing
the request, the bound model, or the provider won't succeed. Userland
fallback patterns (e.g., a middleware that routes to a multimodal
provider on this category) compose cleanly against it.

`ProviderUnsupportedContentBlock` carries `block_type` ("image",
"audio", "video") and `reason` (the provider's human-readable
message) when those are recoverable from the rejection.

`OpenAIProvider` detects content rejection via the response body —
HTTP 400 with an error code like `image_content_not_supported` or a
message like "does not support image inputs." Pre-send capability
checks (failing fast before the wire trip when you know the model
doesn't support images) live above the provider as userland
middleware — the provider doesn't ship a static model-capability
catalog.

## Routing on parsed fields

A conditional edge is a function `state -> str` that names the next
Expand Down
10 changes: 10 additions & 0 deletions docs/model-providers/authoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,16 @@ of:
- **Tool calls.** Wire-mapping the `tool_calls` array on
`AssistantMessage` to the Provider's expected shape, parsing tool
results back from `ToolMessage`s.
- **Content blocks (multimodal user input).** Wire-mapping the
`list[ContentBlock]` form of `UserMessage.content` to the provider's
multimodal shape (OpenAI's `image_url` content-array entries,
Anthropic's image blocks, Google's `inlineData` parts, etc.). The
spec types (`TextBlock`, `ImageBlock`, `ImageSourceURL`,
`ImageSourceInline`) are stable across providers; only the wire
shape differs. Provider authors targeting non-multimodal models
MUST surface `ProviderUnsupportedContentBlock` when the request
carries blocks the bound model can't serve — pre-send or
post-receive per §7.
- **Structured output.** Threading `response_schema` through the
request body (native `response_format` if the underlying wire
supports it; prompt-augmentation fallback otherwise) and validating
Expand Down
39 changes: 25 additions & 14 deletions docs/model-providers/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,24 +64,35 @@ class Provider(Protocol):

## Errors

Eight canonical error categories cover every failure mode:

| Error | Trigger |
| --------------------------- | ---------------------------------------------------------------------- |
| `ProviderAuthentication` | 401 / 403 (bad key, expired token) |
| `ProviderUnavailable` | 5xx, network failure, timeout |
| `ProviderInvalidModel` | Bound model doesn't exist on the provider |
| `ProviderModelNotLoaded` | Model known but not currently serving |
| `ProviderRateLimit` | 429 (with `Retry-After` exposed) |
| `ProviderInvalidResponse` | 200 OK that fails to parse |
| `ProviderInvalidRequest` | Malformed request (per-message or list-level) |
| `StructuredOutputInvalid` | Response failed to parse as JSON or failed to validate against schema |
Nine canonical error categories cover every failure mode:

| Error | Trigger |
| ---------------------------------- | ---------------------------------------------------------------------- |
| `ProviderAuthentication` | 401 / 403 (bad key, expired token) |
| `ProviderUnavailable` | 5xx, network failure, timeout |
| `ProviderInvalidModel` | Bound model doesn't exist on the provider |
| `ProviderModelNotLoaded` | Model known but not currently serving |
| `ProviderRateLimit` | 429 (with `Retry-After` exposed) |
| `ProviderInvalidResponse` | 200 OK that fails to parse |
| `ProviderInvalidRequest` | Malformed request (per-message or list-level) |
| `ProviderUnsupportedContentBlock` | Bound model rejected a content block (image / audio / media-type) |
| `StructuredOutputInvalid` | Response failed to parse as JSON or failed to validate against schema |

Three of these (`Unavailable`, `RateLimit`, `ModelNotLoaded`) are
exported in `TRANSIENT_CATEGORIES`, the canonical "safe to retry"
set used by the default retry-middleware classifier.
`StructuredOutputInvalid` is non-transient by default; see
[Structured output](#structured-output) below.
`StructuredOutputInvalid` and `ProviderUnsupportedContentBlock` are
non-transient by default. See [Content blocks](../concepts/llms.md#content-blocks-multimodal-user-messages)
in the LLMs concept page for the multimodal contract; see
[Structured output](#structured-output) below for the
`response_schema` path.

`OpenAIProvider` detects unsupported-content-block rejections via
the response body (HTTP 400 with an error code or message indicating
content rejection) — a post-receive mapping rather than a static
pre-send capability check. Pre-send protection is a userland
middleware pattern when callers know the bound model's capabilities
up front.

## Structured output

Expand Down
16 changes: 16 additions & 0 deletions src/openarmature/llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
PROVIDER_MODEL_NOT_LOADED,
PROVIDER_RATE_LIMIT,
PROVIDER_UNAVAILABLE,
PROVIDER_UNSUPPORTED_CONTENT_BLOCK,
STRUCTURED_OUTPUT_INVALID,
TRANSIENT_CATEGORIES,
LlmProviderError,
Expand All @@ -40,12 +41,19 @@
ProviderModelNotLoaded,
ProviderRateLimit,
ProviderUnavailable,
ProviderUnsupportedContentBlock,
StructuredOutputInvalid,
)
from .messages import (
AssistantMessage,
ContentBlock,
ImageBlock,
ImageSource,
ImageSourceInline,
ImageSourceURL,
Message,
SystemMessage,
TextBlock,
Tool,
ToolCall,
ToolMessage,
Expand All @@ -69,10 +77,16 @@
"PROVIDER_MODEL_NOT_LOADED",
"PROVIDER_RATE_LIMIT",
"PROVIDER_UNAVAILABLE",
"PROVIDER_UNSUPPORTED_CONTENT_BLOCK",
"STRUCTURED_OUTPUT_INVALID",
"TRANSIENT_CATEGORIES",
"AssistantMessage",
"ContentBlock",
"FinishReason",
"ImageBlock",
"ImageSource",
"ImageSourceInline",
"ImageSourceURL",
"LlmProviderError",
"Message",
"OpenAIProvider",
Expand All @@ -85,10 +99,12 @@
"ProviderModelNotLoaded",
"ProviderRateLimit",
"ProviderUnavailable",
"ProviderUnsupportedContentBlock",
"Response",
"RuntimeConfig",
"StructuredOutputInvalid",
"SystemMessage",
"TextBlock",
"Tool",
"ToolCall",
"ToolMessage",
Expand Down
45 changes: 45 additions & 0 deletions src/openarmature/llm/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
PROVIDER_RATE_LIMIT = "provider_rate_limit"
PROVIDER_INVALID_RESPONSE = "provider_invalid_response"
PROVIDER_INVALID_REQUEST = "provider_invalid_request"
PROVIDER_UNSUPPORTED_CONTENT_BLOCK = "provider_unsupported_content_block"
STRUCTURED_OUTPUT_INVALID = "structured_output_invalid"


Expand Down Expand Up @@ -137,6 +138,48 @@ class ProviderInvalidRequest(LlmProviderError):
category = PROVIDER_INVALID_REQUEST


# Non-transient by default — the bound model's capability set does
# not change between calls, so retrying without changing the request
# (the message list, the bound model, or the provider) will not
# succeed.
#
# Distinct from ProviderInvalidRequest. ProviderInvalidRequest covers
# spec-shape violations (the request is malformed at the wire layer);
# ProviderUnsupportedContentBlock covers capability mismatches (the
# request is well-formed but the bound model can't fulfill it).
# Splitting them lets callers route the unsupported-content case
# differently (e.g., fall back to a multimodal-capable provider)
# without overloading the malformed-request category.
class ProviderUnsupportedContentBlock(LlmProviderError):
"""Raised when the bound model does not support a content block
type used in the request.

Examples: a text-only model received an image block, or the model
supports images but not the requested ``media_type`` or ``source``
variant.

Attributes:
block_type: The block type that was rejected (e.g., ``"image"``),
when the provider's response makes this identifiable.
reason: The provider's human-readable description of the
rejection, when available.
"""

category = PROVIDER_UNSUPPORTED_CONTENT_BLOCK
block_type: str | None
reason: str | None

def __init__(
self,
*args: Any,
block_type: str | None = None,
reason: str | None = None,
) -> None:
super().__init__(*args)
self.block_type = block_type
self.reason = reason


# Non-transient by default — a model that fails schema compliance on a
# given prompt usually fails the same way on retry. The default
# RetryMiddleware classifier does NOT retry this category. Users wanting
Expand Down Expand Up @@ -184,6 +227,7 @@ def __init__(
"PROVIDER_MODEL_NOT_LOADED",
"PROVIDER_RATE_LIMIT",
"PROVIDER_UNAVAILABLE",
"PROVIDER_UNSUPPORTED_CONTENT_BLOCK",
"STRUCTURED_OUTPUT_INVALID",
"TRANSIENT_CATEGORIES",
"LlmProviderError",
Expand All @@ -194,5 +238,6 @@ def __init__(
"ProviderModelNotLoaded",
"ProviderRateLimit",
"ProviderUnavailable",
"ProviderUnsupportedContentBlock",
"StructuredOutputInvalid",
]
Loading