LunarCommand · chris-colinsky · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -6,6 +6,24 @@ The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). The
 
 ## [Unreleased]
 
+### Added
+
+- **Structured output (proposal 0016, spec v0.14.0).** `Provider.complete()` now accepts an optional `response_schema` parameter — either a JSON Schema dict or a Pydantic `BaseModel` subclass. When supplied, the provider constrains the model's output to the schema and populates `Response.parsed` with the validated value (`dict` for dict-schema input, a `BaseModel` instance for class input). New `StructuredOutputInvalid` error category (non-transient by default) raises on JSON parse failure or schema validation failure; carries the requested schema, the raw response content, and a failure description.
+- **`OpenAIProvider` native response_format wire path.** When `response_schema` is supplied, the chat-completions request body carries `response_format: { type: "json_schema", json_schema: { name, schema, strict } }`. The `strict` flag is determined by a deep recursive walk over the schema (object-property required-coverage rule across `anyOf` / `oneOf` / `allOf` and `$ref` targets, with cycle protection); unresolvable refs fall through to `strict: false`. The `name` field uses `schema.title` when present, otherwise a deterministic sha256-prefix hash.
+- **`OpenAIProvider` prompt-augmentation fallback.** Constructor flag `force_prompt_augmentation_fallback: bool` (default `False`) and read-only inspect property `uses_prompt_augmentation_fallback: bool`. When the flag is on, structured-output calls build a fresh message list with a system directive containing the serialized schema, omit `response_format` from the wire, and validate the response post-receive. The caller's original `messages` list is never mutated. Use for OpenAI-compatible servers (older vLLM, some LM Studio releases, llama.cpp variants) that reject or silently ignore `response_format`.
+- **Provider-agnostic schema helpers.** `openarmature.llm.validate_response_schema(schema)` (raises `ProviderInvalidRequest` when the schema is not a dict with a top-level `type: "object"`) and `openarmature.llm.strict_mode_supported(schema)` (the deep-tree strict-mode constraint check) are exported for reuse by future Anthropic/Gemini providers.
+- **Capability-agnostic conformance harness helpers.** `tests/conformance/harness/wire.py` adds `match_wire_body` (recursive deep-equal with `"*"` wildcard support), `assert_response_format_absent`, `assert_system_references_schema`, and `assert_error_carries` for the `expected_wire_request[_checks]` and `expected.raises.carries.{...}` fixture shapes. Used by the 0016 fixtures; available for the upcoming 0014 / 0015 / 0017 fixture sets.
+- **Runtime dependency: `jsonschema>=4.0`.** Used by the dict-schema validation path. The Pydantic-class path uses Pydantic's native validator and does not need `jsonschema`.
+
+### Changed
+
+- **Pinned spec version: 0.10.0 → 0.15.0.** Adopts the skip-ahead governance principle: the submodule jumps across v0.11.0–v0.15.0 (proposals 0009, 0011, 0014, 0015, 0016, 0017) in one bump. Only the surface introduced by proposal 0016 is implemented in this changelog entry; fixtures from 0011 / 0014 / 0015 / 0017 are marked deferred-skip in the conformance suite and unmark as their respective PRs land.
+
+### Notes
+
+- **Release gate: do not tag until all of {0011, 0014, 0015, 0016, 0017} are merged.** This batch implements one proposal per PR and lands a consolidated release after the fifth PR. Cutting a release tag before the batch is complete would ship a partial spec implementation against the v0.15.0 pin.
+- **Pre-1.0 MINOR.** Existing free-form callers (no `response_schema`) see no behavior change — the new field defaults to `None`, the wire body omits `response_format`, and `Response.parsed` remains absent.
+
 ## [0.5.0] — 2026-05-10
 
 First release on real PyPI. Catches the implementation up from spec v0.5.x to v0.10.0 across six phases — the spec accepted eight proposals while the python lib was at v0.3.1, and v0.5.0 lands all of them in one curated drop.

diff --git a/README.md b/README.md
@@ -55,26 +55,34 @@ The OpenTelemetry mapping mandates a private `TracerProvider`. That prevents the
 
 ## Hello World
 
-About fifty lines that show the engine in action. Three reducer policies declared on one state class. Routing as a pure function of state, not a hidden state machine. An observer attached at compile time that sees every node boundary the engine emits. No LLM, no API key, no boilerplate. Copy it, run it, watch the events fire. Requires Python 3.12 or later.
+About a hundred lines that show the engine in action. Three reducer policies declared on one state class. Three LLM calls each returning typed structured output (Pydantic class on two, raw JSON Schema dict on the third). Conditional routing as a pure function of state, not a hidden state machine. An observer attached at compile time that sees every node boundary the engine emits. Requires Python 3.12 or later and an OpenAI-compatible endpoint (defaults to OpenAI public API; works against any local server too).
 
 ```python
 import asyncio
-from typing import Annotated
-
-from openarmature.graph import (
-    END,
-    GraphBuilder,
-    NodeEvent,
-    State,
-    append,
-    merge,
-)
-from pydantic import Field
+import os
+from collections.abc import Mapping
+from typing import Annotated, Any, Literal
+
+from openarmature.graph import END, GraphBuilder, NodeEvent, State, append, merge
+from openarmature.llm import OpenAIProvider, UserMessage
+from pydantic import BaseModel, Field
+
+
+class Classification(BaseModel):
+    intent: Literal["research", "summarize"]
+    rationale: str
+
+
+class Summary(BaseModel):
+    one_liner: str
+    confidence: float
 
 
 class PipelineState(State):
     query: str                                                # last_write_wins (default)
-    classification: str = ""                                  # last_write_wins
+    classification: Classification | None = None              # set by classify
+    research_plan: dict[str, Any] | None = None               # set by research (dict-schema form)
+    summary: Summary | None = None                            # set by summarize
     sources: Annotated[list[str], append] = Field(            # appends across writes
         default_factory=list
     )
@@ -83,34 +91,56 @@ class PipelineState(State):
     )
 
 
-async def classify(state: PipelineState) -> dict:
-    decision = "research" if "?" in state.query else "summarize"
-    return {
-        "classification": decision,
-        "metadata": {"classified_by": "rule"},
-    }
+provider = OpenAIProvider(
+    base_url=os.environ.get("LLM_BASE_URL", "https://api.openai.com"),  # host root; impl adds /v1
+    model=os.environ.get("LLM_MODEL", "gpt-4o-mini"),
+    api_key=os.environ.get("LLM_API_KEY"),
+)
 
 
-async def research(state: PipelineState) -> dict:
+async def classify(state: PipelineState) -> Mapping[str, Any]:
+    response = await provider.complete(
+        [UserMessage(content=f"Route to 'research' or 'summarize': {state.query!r}")],
+        response_schema=Classification,                                  # class → instance
+    )
+    return {"classification": response.parsed, "metadata": {"classified_by": "llm"}}
+
+
+async def research(state: PipelineState) -> Mapping[str, Any]:
+    response = await provider.complete(
+        [UserMessage(content=f"Plan research for {state.query!r}: list topics + follow-ups.")],
+        response_schema={                                                # dict → dict
+            "type": "object",
+            "properties": {
+                "topics": {"type": "array", "items": {"type": "string"}},
+                "follow_up_questions": {"type": "array", "items": {"type": "string"}},
+            },
+            "required": ["topics", "follow_up_questions"],
+            "additionalProperties": False,
+        },
+    )
     return {
+        "research_plan": response.parsed,
         "sources": ["wikipedia", "arxiv"],
-        "metadata": {"tool": "search"},
+        "metadata": {"tool": "research"},
     }
 
 
-async def summarize(state: PipelineState) -> dict:
-    return {
-        "sources": ["cache"],
-        "metadata": {"tool": "summarizer"},
-    }
+async def summarize(state: PipelineState) -> Mapping[str, Any]:
+    response = await provider.complete(
+        [UserMessage(content=f"Summarize {state.query!r} in one sentence with confidence 0-1.")],
+        response_schema=Summary,                                         # class → instance
+    )
+    return {"summary": response.parsed, "sources": ["cache"], "metadata": {"tool": "summarize"}}
 
 
 def route(state: PipelineState) -> str:
-    return state.classification
+    assert state.classification is not None
+    return state.classification.intent
 
 
 async def trace(event: NodeEvent) -> None:
-    if event.phase == "completed" and event.error is None:
+    if event.phase == "completed" and event.error is None and event.post_state is not None:
         print(f"{event.node_name}: sources={event.post_state.sources}")
 
 
@@ -127,22 +157,29 @@ graph = (
 )
 graph.attach_observer(trace)
 
+
 async def main() -> None:
     try:
-        await graph.invoke(PipelineState(query="what is RAG?"))
+        final = await graph.invoke(PipelineState(query="what is RAG?"))
+        print(f"\nclassification: {final.classification}")
+        if final.research_plan is not None:
+            print(f"research_plan: {final.research_plan}")
+        if final.summary is not None:
+            print(f"summary: {final.summary}")
     finally:
         await graph.drain()
 
 
 asyncio.run(main())
-# classify: sources=[]
-# research: sources=['wikipedia', 'arxiv']
 ```
 
-A few things to notice in this short example:
+Set `LLM_API_KEY=sk-...` and run. To swap providers, point `LLM_BASE_URL` and `LLM_MODEL` at OpenRouter, vLLM, LM Studio, llama.cpp, or anything else that speaks the OpenAI Chat Completions wire format. The example also lives at [`examples/00-hello-world/main.py`](./examples/00-hello-world/main.py); see [`examples/`](./examples/) for more runnable demos.
+
+A few things to notice:
 
-- **Three reducer policies on one state schema.** `query` and `classification` get the default `last_write_wins`. `sources` is `Annotated[list[str], append]`, so successive writes concatenate. `metadata` is `Annotated[dict[str, str], merge]`, so successive writes shallow-merge. The merge policy lives on the schema, once.
-- **Conditional routing as a state function.** `route` reads `state.classification` and returns a node name. The graph engine doesn't care that this happens to be deterministic; it would accept an LLM-driven router with the same shape.
+- **Three reducer policies on one state schema.** `query` / `classification` / `research_plan` / `summary` get the default `last_write_wins`. `sources` is `Annotated[list[str], append]`, so successive writes concatenate. `metadata` is `Annotated[dict[str, str], merge]`, so successive writes shallow-merge. The merge policy lives on the schema, once.
+- **Structured output, two forms.** `response_schema=Classification` (a Pydantic class) returns `Response.parsed` as a validated `Classification` instance, typed end-to-end. `response_schema={...}` (a raw JSON Schema dict) returns `Response.parsed` as a plain dict. Same wire shape underneath; pick the form that fits.
+- **Conditional routing on a parsed field.** `route` reads `state.classification.intent` and returns the next node's name. The graph engine doesn't care the discriminator came from an LLM; it would accept a deterministic rule with the same shape.
 - **Observer sees both phases.** `trace` filters to `completed` events for brevity; the engine also delivers `started` events.
 - **The graph either compiles or it doesn't.** Remove `.set_entry()` and `.compile()` raises `NoDeclaredEntry` before `invoke()` runs.
 

diff --git a/examples/00-hello-world/main.py b/examples/00-hello-world/main.py
@@ -0,0 +1,203 @@
+"""Hello-world demo: a 3-node graph where each node makes an LLM call
+with structured output. Classify a query, then either plan research or
+write a one-sentence summary.
+
+**Demonstrates:**
+
+- Typed ``State`` with three reducer policies (``last_write_wins``,
+  ``append``, ``merge``).
+- ``OpenAIProvider`` from ``openarmature.llm`` against any
+  OpenAI-compatible endpoint.
+- Both ``response_schema`` forms:
+  - Pydantic class (``Classification``, ``Summary``): typed
+    instance on ``Response.parsed``.
+  - JSON Schema dict (``research``): raw dict on ``Response.parsed``.
+- Conditional routing on a parsed field (``route`` reads
+  ``state.classification.intent``).
+- ``attach_observer`` for boundary visibility.
+
+**Configuration** (env vars; OpenAI defaults shown):
+
+- ``LLM_BASE_URL``: defaults to ``https://api.openai.com``. **Host
+  root only**; the impl adds ``/v1/chat/completions`` and
+  ``/v1/models`` itself, so do NOT include ``/v1`` in this value.
+- ``LLM_MODEL``: defaults to ``gpt-4o-mini``.
+- ``LLM_API_KEY``: required (your OpenAI API key, or empty for
+  local servers that don't authenticate).
+
+Run with:
+
+    uv sync --group examples
+    LLM_API_KEY=sk-... uv run python examples/00-hello-world/main.py
+"""
+
+from __future__ import annotations
+
+import asyncio
+import os
+from collections.abc import Mapping
+from typing import Annotated, Any, Literal
+
+from pydantic import BaseModel, Field
+
+from openarmature.graph import (
+    END,
+    CompiledGraph,
+    GraphBuilder,
+    NodeEvent,
+    State,
+    append,
+    merge,
+)
+from openarmature.llm import OpenAIProvider, UserMessage
+
+
+# Pydantic schemas the model is constrained to produce. Passing a
+# class as ``response_schema`` makes the framework convert to JSON
+# Schema, instruct the provider to return matching content, validate
+# the response, and yield an instance via ``Response.parsed``.
+class Classification(BaseModel):
+    intent: Literal["research", "summarize"]
+    rationale: str
+
+
+class Summary(BaseModel):
+    one_liner: str
+    confidence: float
+
+
+# State holds intermediate artifacts from each LLM call. ``research``
+# uses a dict schema (rather than a class), so its parsed value is a
+# raw dict, typed here as ``dict[str, Any] | None``.
+class PipelineState(State):
+    query: str
+    classification: Classification | None = None
+    research_plan: dict[str, Any] | None = None
+    summary: Summary | None = None
+    sources: Annotated[list[str], append] = Field(default_factory=list)
+    metadata: Annotated[dict[str, str], merge] = Field(default_factory=dict)
+
+
+_provider = OpenAIProvider(
+    base_url=os.environ.get("LLM_BASE_URL", "https://api.openai.com"),
+    model=os.environ.get("LLM_MODEL", "gpt-4o-mini"),
+    api_key=os.environ.get("LLM_API_KEY"),
+)
+
+
+async def classify(state: PipelineState) -> Mapping[str, Any]:
+    # response_schema=class form: parsed comes back as a Classification
+    # instance. The model picks the branch (research vs summarize) and
+    # the routing function below reads it as a typed field.
+    response = await _provider.complete(
+        [
+            UserMessage(
+                content=(
+                    f"Route this query to either 'research' (find new information) "
+                    f"or 'summarize' (condense known material): {state.query!r}"
+                )
+            )
+        ],
+        response_schema=Classification,
+    )
+    return {"classification": response.parsed, "metadata": {"classified_by": "llm"}}
+
+
+async def research(state: PipelineState) -> Mapping[str, Any]:
+    # response_schema=dict form: parsed comes back as a plain dict.
+    # Same wire shape as the class form: the framework converts a
+    # class via .model_json_schema() under the hood. Use dict when
+    # you want raw shape without declaring a Pydantic model.
+    response = await _provider.complete(
+        [
+            UserMessage(
+                content=(
+                    f"Plan research for the query {state.query!r}. List up to 3 "
+                    f"specific topics to investigate and up to 3 follow-up questions."
+                )
+            )
+        ],
+        response_schema={
+            "type": "object",
+            "properties": {
+                "topics": {"type": "array", "items": {"type": "string"}},
+                "follow_up_questions": {"type": "array", "items": {"type": "string"}},
+            },
+            "required": ["topics", "follow_up_questions"],
+            "additionalProperties": False,
+        },
+    )
+    return {
+        "research_plan": response.parsed,
+        "sources": ["wikipedia", "arxiv"],
+        "metadata": {"tool": "research"},
+    }
+
+
+async def summarize(state: PipelineState) -> Mapping[str, Any]:
+    # Pydantic-class form again: parsed is a Summary instance with
+    # a typed one_liner and a confidence float.
+    response = await _provider.complete(
+        [
+            UserMessage(
+                content=(
+                    f"Summarize {state.query!r} in one sentence. Set confidence "
+                    f"between 0 and 1 reflecting how well-established the answer is."
+                )
+            )
+        ],
+        response_schema=Summary,
+    )
+    return {
+        "summary": response.parsed,
+        "sources": ["cache"],
+        "metadata": {"tool": "summarize"},
+    }
+
+
+def route(state: PipelineState) -> str:
+    if state.classification is None:
+        raise RuntimeError("classify did not populate state.classification")
+    return state.classification.intent
+
+
+async def trace(event: NodeEvent) -> None:
+    # OpenAIProvider emits NodeEvent-shaped events for LLM-span
+    # tracking under a sentinel namespace; those have post_state=None.
+    # Filter to events that carry a state snapshot before reading it.
+    if event.phase == "completed" and event.error is None and event.post_state is not None:
+        print(f"{event.node_name}: sources={event.post_state.sources}")
+
+
+def build_graph() -> CompiledGraph[PipelineState]:
+    return (
+        GraphBuilder(PipelineState)
+        .add_node("classify", classify)
+        .add_node("research", research)
+        .add_node("summarize", summarize)
+        .add_conditional_edge("classify", route)
+        .add_edge("research", END)
+        .add_edge("summarize", END)
+        .set_entry("classify")
+        .compile()
+    )
+
+
+async def main() -> None:
+    graph = build_graph()
+    graph.attach_observer(trace)
+    try:
+        final = await graph.invoke(PipelineState(query="what is RAG?"))
+        print(f"\nclassification: {final.classification}")
+        if final.research_plan is not None:
+            print(f"research_plan: {final.research_plan}")
+        if final.summary is not None:
+            print(f"summary: {final.summary}")
+        print(f"sources: {final.sources}")
+        print(f"metadata: {final.metadata}")
+    finally:
+        await graph.drain()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())