diff --git a/docs/examples/00-hello-world.md b/docs/examples/00-hello-world.md
new file mode 100644
index 0000000..246bf34
--- /dev/null
+++ b/docs/examples/00-hello-world.md
@@ -0,0 +1,104 @@
+# 00 - Hello, world
+
+The smallest possible LLM-routed pipeline: classify a query, then
+either plan research on it or summarize it in one sentence. Sets the
+shape every other example builds on.
+
+## Overview
+
+You ask a question. A classifier LLM decides whether the question
+wants new information or a summary of known material. Depending on
+the answer, the run either calls a research-planner node (returns
+topics to investigate plus follow-up questions) or a summarizer node
+(returns one sentence plus a confidence score).
+
+The demo query is *"why did Apollo 13 abort its lunar landing?"*,
+which the model usually routes to `summarize` because the facts are
+well-established.
+
+## What it teaches
+
+- A typed [`State`](../concepts/state-and-reducers.md) holding query
+  plus per-node artifacts, with three reducer policies in one model
+  (`last_write_wins`, `append`, `merge`).
+- The [`OpenAIProvider`](../concepts/llms.md) talking to any
+  OpenAI-compatible endpoint.
+- Both forms of [structured output](../concepts/llms.md): pass a
+  Pydantic class as `response_schema` (`Classification`, `Summary`)
+  and get an instance back on `Response.parsed`; pass a JSON Schema
+  dict (`research`) and get a raw dict.
+- `RuntimeConfig` for per-call sampling knobs. Every `complete()`
+  passes `RuntimeConfig(temperature=0.0)` so the run is as
+  reproducible as the API allows.
+- A [conditional edge](../concepts/graphs.md) reading a parsed field
+  off state (`route` returns `state.classification.intent`).
+- A function-shaped [observer](../concepts/observability.md) attached
+  after compile.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/00-hello-world/main.py
+```
+
+To point at a local OpenAI-compatible server, override `LLM_BASE_URL`
+and (often) `LLM_MODEL`:
+
+```bash
+LLM_BASE_URL=http://localhost:8000 LLM_MODEL=Qwen2.5-7B-Instruct \
+  LLM_API_KEY= \
+  uv run python examples/00-hello-world/main.py
+```
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  classify[classify]
+  research[research]
+  summarize[summarize]
+  stop([end])
+
+  start --> classify
+  classify -->|"intent == 'research'"| research
+  classify -->|"intent == 'summarize'"| summarize
+  research --> stop
+  summarize --> stop
+```
+
+Three nodes, one conditional edge. `classify` is the entry; `route`
+inspects `state.classification.intent` and returns the name of the
+next node.
+
+## Reading the output
+
+A clean run prints two lines from the observer and then the final
+state:
+
+```
+classify: sources=[]
+summarize: sources=['cache']
+
+classification: intent='summarize' rationale='...'
+summary: one_liner='...' confidence=0.92
+sources: ['cache']
+metadata: {'classified_by': 'llm', 'tool': 'summarize'}
+```
+
+- `classify: sources=[]` - the classifier ran, no sources have been
+  appended yet because only the chosen follow-up node adds them.
+- `summarize: sources=['cache']` - the second node ran (since the
+  classifier picked `summarize`). The `append` reducer on the
+  `sources` field merged the new entry into the existing list.
+- `classification` and `summary` are the parsed Pydantic instances,
+  not raw model output. Compare with `research_plan`, which would
+  show as a plain dict if the classifier had picked `research`.
+- `metadata: {...}` shows the `merge` reducer in action. Each node
+  contributed one key (`classified_by`, `tool`); the final map has
+  both.
+
+If the classifier picks `research` instead, you'll see `research`
+in the second observer line and a `research_plan` dict (with
+`topics` and `follow_up_questions`) in the final printout.
diff --git a/docs/examples/01-routing-and-subgraphs.md b/docs/examples/01-routing-and-subgraphs.md
new file mode 100644
index 0000000..b3064bd
--- /dev/null
+++ b/docs/examples/01-routing-and-subgraphs.md
@@ -0,0 +1,109 @@
+# 01 - Routing and subgraphs
+
+A question-answering assistant. Classify the question, then either
+give a one-shot quick answer or run a multi-step research
+sub-pipeline, then lightly copy-edit the result.
+
+## Overview
+
+You ask a question. A classifier LLM decides whether it can be
+answered in one or two sentences ("quick") or whether it benefits
+from considering multiple angles ("research"). Quick questions go
+through a single `quick_answer` node. Research questions descend
+into a subgraph that plans three angles, gathers a short note for
+each, and synthesizes them into a paragraph. Either way, a final
+`format_final` node copy-edits the answer.
+
+Demo questions: *"what year did the moon landing happen"*
+(usually routes to quick) and *"why is the lunar south pole
+strategically important?"* (usually routes to research).
+
+## What it teaches
+
+- [Conditional edges](../concepts/graphs.md) routing on a state
+  field. `classify` writes `state.route`; the conditional edge
+  function reads it and returns the next node's name.
+- [Subgraph composition](../concepts/composition.md). The research
+  pipeline is itself a compiled graph, wrapped as a single node in
+  the outer graph via `add_subgraph_node`.
+- A custom
+  [`ProjectionStrategy`](../concepts/composition.md). The default
+  `FieldNameMatching` only carries fields back *out* of a subgraph;
+  carrying the parent's question *in* requires writing a small
+  `ProjectionStrategy` class. The `QuestionProjection` here is the
+  canonical pattern for non-trivial subgraph boundaries.
+- The [`merge` reducer](../concepts/state-and-reducers.md) for dict
+  accumulation. Every node returns a small `tallies` fragment; the
+  reducer accumulates them into one dict on the final state.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/01-routing-and-subgraphs/main.py \
+  "why is the lunar south pole strategically important?"
+```
+
+The first positional arg becomes the question. With no arg, it falls
+back to the lunar-south-pole question above.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  classify[classify]
+  quick_answer[quick_answer]
+  format_final[format_final]
+  stop([end])
+
+  subgraph research [research subgraph]
+    direction TB
+    plan_research[plan_research]
+    gather[gather]
+    synthesize[synthesize]
+    plan_research --> gather --> synthesize
+  end
+
+  start --> classify
+  classify -->|"route == 'quick'"| quick_answer
+  classify -->|"route == 'research'"| research
+  quick_answer --> format_final
+  research --> format_final
+  format_final --> stop
+```
+
+The research box is a separate compiled graph with its own state
+schema (`ResearchState`). The `QuestionProjection` carries
+`parent.question` in as `subgraph.question`, and brings
+`subgraph.answer` plus `subgraph.trace` back out.
+
+## Reading the output
+
+For a research-route run, expect:
+
+```
+question: why is the lunar south pole strategically important?
+route:    research
+
+answer:
+<paragraph synthesized from three angles>
+
+trace:   ['classify', 'plan_research', 'gather', 'synthesize', 'format_final']
+tallies: {'classify_calls': 1, 'research_runs': 1, 'formatted': 1}
+```
+
+- `route` is the field `classify` wrote that the conditional edge
+  read.
+- `trace` lists nodes in invocation order. Subgraph nodes appear
+  inline; that's the projection's `trace` field flowing back out
+  through the parent's `append` reducer.
+- `tallies` has one entry per node that contributed: `classify` set
+  `classify_calls`, the subgraph projection's `project_out` set
+  `research_runs`, `format_final` set `formatted`. `quick_answer`
+  would have contributed `quick_answers: 1` if the run had gone the
+  other way.
+
+For a quick-route run, `trace` drops to `['classify',
+'quick_answer', 'format_final']` and `tallies` has
+`quick_answers: 1` in place of `research_runs: 1`.
diff --git a/docs/examples/02-explicit-subgraph-mapping.md b/docs/examples/02-explicit-subgraph-mapping.md
new file mode 100644
index 0000000..d3addf0
--- /dev/null
+++ b/docs/examples/02-explicit-subgraph-mapping.md
@@ -0,0 +1,103 @@
+# 02 - Explicit subgraph mapping
+
+Compare two topics by running the *same* compiled analysis subgraph
+on each, with each call site writing into disjoint parent fields.
+This is the canonical use of `ExplicitMapping`.
+
+## Overview
+
+You give the pipeline two topics ("Apollo 11" vs "Apollo 17", or
+"Apollo program vs Artemis program"). One compiled subgraph
+(`summarize → score`) is registered twice in the outer graph. The
+first registration analyzes topic A and writes its results into
+`a_summary` / `a_score`; the second analyzes topic B and writes
+into `b_summary` / `b_score`. A final `synthesize` node reads both
+sides and renders a verdict.
+
+Without explicit mapping the two sites would both write to a single
+`parent.summary` field under default name matching, and the second
+call would clobber the first.
+
+## What it teaches
+
+- [`ExplicitMapping`](../concepts/composition.md) for reusing one
+  compiled subgraph at multiple parent sites with disjoint parent
+  fields. Each site declares its own `inputs` and `outputs` dicts;
+  the same compiled subgraph value is registered twice.
+- The encapsulation property that makes this work: the subgraph
+  speaks in neutral field names (`topic`, `summary`, `score`) and
+  has no idea which side of the comparison it's running for. The
+  mapping at each call site is what wires the subgraph's neutral
+  names to the parent's per-side fields.
+- The contrast with example 01: there a custom
+  `ProjectionStrategy` carried one field in. Here the two sites
+  need to be similar-but-different, and `ExplicitMapping` is the
+  zero-boilerplate way to express that.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/02-explicit-subgraph-mapping/main.py "Apollo 11" "Apollo 17"
+```
+
+Or pass a single `"X vs Y"` arg, or no args (defaults to
+`"Apollo 11"` vs `"Apollo 17"`).
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  synthesize[synthesize]
+  stop([end])
+
+  subgraph analysis_a [analyze_a: analysis subgraph]
+    direction TB
+    sa[summarize]
+    ra[score]
+    sa --> ra
+  end
+
+  subgraph analysis_b [analyze_b: analysis subgraph]
+    direction TB
+    sb[summarize]
+    rb[score]
+    sb --> rb
+  end
+
+  start --> analysis_a --> analysis_b --> synthesize --> stop
+```
+
+Both subgraph boxes are the *same* compiled value, registered twice
+under different names with different mappings. The `analyze_a` site
+maps `parent.topic_a → subgraph.topic` and back out as `summary →
+a_summary`, `score → a_score`. The `analyze_b` site does the same
+thing on the B-side parent fields.
+
+## Reading the output
+
+```
+topic A: Apollo 11
+  summary: <one-sentence summary of Apollo 11>
+  score:   8/10
+
+topic B: Apollo 17
+  summary: <one-sentence summary of Apollo 17>
+  score:   7/10
+
+verdict:
+<paragraph picking a winner or calling it a tie, citing the summaries>
+
+trace: ['summarize', 'score', 'summarize', 'score', 'synthesize']
+```
+
+- The per-side summary and score fields are populated by separate
+  invocations of the same subgraph, routed by the mappings.
+- `trace` shows the subgraph's nodes running **twice**, interleaved
+  with the outer `synthesize`. Both invocations contribute to the
+  same parent `trace` list because each `outputs` mapping includes
+  `"trace": "trace"`.
+- `verdict` is whatever `synthesize` produced from reading both
+  sides. The outer node knows nothing about which side ran first;
+  it just reads four parent fields.
diff --git a/docs/examples/03-observer-hooks.md b/docs/examples/03-observer-hooks.md
new file mode 100644
index 0000000..ca74567
--- /dev/null
+++ b/docs/examples/03-observer-hooks.md
@@ -0,0 +1,124 @@
+# 03 - Observer hooks
+
+Add observability to a `draft → review → finalize` pipeline without
+touching any node code. Three observer flavors run side-by-side: a
+console tracer, a per-invocation metrics collector, and the
+OpenTelemetry observer wired to a console span exporter.
+
+## Overview
+
+You ask a question. The outer graph drafts an answer, then descends
+into a `review` subgraph that critiques the draft and produces a
+revision, then runs a `finalize` node that marks the run done.
+
+Three observers watch every node boundary:
+
+1. A **graph-attached console tracer** prints one structured line
+   per node boundary to stderr.
+2. An **invocation-scoped metrics collector** counts events,
+   errors, and unique namespaces seen on *this* call only.
+3. The **`OTelObserver`** opens and closes spans on a private
+   `TracerProvider`, and a console span exporter prints the JSON
+   span at close.
+
+The observers share one `Observer` Protocol; nothing in the node
+bodies knows or cares that they're attached.
+
+## What it teaches
+
+- [`attach_observer`](../concepts/observability.md) for
+  graph-attached observers (fire on every invocation until removed).
+- The `observers=[...]` kwarg on `invoke()` for invocation-scoped
+  observers (fire only for that call).
+- The [`NodeEvent`](../concepts/observability.md) shape: `phase`,
+  `step`, `namespace`, `pre_state`, `post_state`, `error`.
+- Namespace chaining across subgraph boundaries. The subgraph's
+  events arrive with their parent node name prepended to the
+  namespace tuple.
+- Function-shaped versus class-shaped observers (both satisfy the
+  Protocol structurally).
+- The [`OTelObserver`](../concepts/observability.md) from the
+  `[otel]` extra, registered like any other observer. Same hook,
+  spans instead of prints.
+- The `await graph.drain()` requirement for short-lived processes:
+  events go through a background queue, and `invoke()` returns when
+  the graph hits `END`, not when the queue empties.
+
+## How to run
+
+```bash
+uv sync --group examples --all-extras
+LLM_API_KEY=sk-... uv run python examples/03-observer-hooks/main.py \
+  "what year did the moon landing happen"
+```
+
+`--all-extras` pulls in `opentelemetry-sdk` for the OTel observer.
+The first positional arg becomes the question.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  draft[draft]
+  finalize[finalize]
+  stop([end])
+
+  subgraph review [review subgraph]
+    direction TB
+    critique[critique]
+    revise[revise]
+    critique --> revise
+  end
+
+  start --> draft --> review --> finalize --> stop
+```
+
+The `review` subgraph is wired with an `ExplicitMapping` that
+carries `draft` IN; the default field-name matching brings
+`revised` and `trace` back OUT.
+
+## Reading the output
+
+Both observers subscribe to `started` and `completed` events by
+default, so for each node the engine fires two events sharing the
+same `step`. The console tracer prints one line per event to
+**stderr** in `[step=N] namespace → fields_changed` form; `started`
+events print as `→ {}` since `post_state` isn't populated yet. The
+trimmed sample below shows only the completed lines for
+readability, interleaved with OTel JSON spans on **stdout**:
+
+```
+[step=1] draft → {'draft': '...', 'trace': ['draft']}
+{"name": "draft", "context": {...}, "kind": "SpanKind.INTERNAL", ...}
+[step=2] review.critique → {'critique': '...', 'trace': ['critique']}
+[step=3] review.revise → {'revised': '...', 'trace': ['critique', 'revise']}
+[step=4] finalize → {'trace': ['draft', 'critique', 'revise', 'finalize']}
+
+question: what year did the moon landing happen
+draft:   <two or three sentence draft>
+revised: <copy-edited version>
+
+per-invocation metrics:
+  events seen:        8
+  errors observed:    0
+  unique namespaces:  4
+  trace order:        ['draft', 'critique', 'revise', 'finalize']
+```
+
+- **`namespace` chains across subgraphs.** `draft` and `finalize`
+  fire at the top level (single-element namespace). `critique` and
+  `revise` fire inside the subgraph and arrive with namespace
+  `('review', 'critique')` / `('review', 'revise')`, which the
+  tracer joins with `.`.
+- **`fields_changed` is the diff** between `pre_state` and
+  `post_state`. Each node's "what did it do?" is visible without
+  the node logging anything itself.
+- **The metrics observer is per-invocation.** It counts only events
+  from this single `invoke()` call. The tracer and OTel observer
+  would persist across further `invoke()` calls on the same
+  compiled graph.
+- **OTel spans appear as JSON on stdout** because we wired a
+  `ConsoleSpanExporter`. The `OTelObserver` uses a private
+  `TracerProvider`, so it does not pollute any global OTel setup
+  the surrounding application might have.
diff --git a/docs/examples/04-nested-subgraphs.md b/docs/examples/04-nested-subgraphs.md
new file mode 100644
index 0000000..c839ab7
--- /dev/null
+++ b/docs/examples/04-nested-subgraphs.md
@@ -0,0 +1,135 @@
+# 04 - Nested subgraphs
+
+Question answering against a tiny baked-in document corpus, with
+two levels of subgraph nesting: outer coordinator, middle doc-QA,
+inner section-extract.
+
+## Overview
+
+You ask a question. A small in-memory corpus of three documents
+(Apollo 11, Apollo 13, Artemis II) sits at module level. The
+pipeline runs in three layers:
+
+1. **Outer (coordinator).** Receives the question, delegates to the
+   doc-QA subgraph, polishes the final answer.
+2. **Middle (doc-QA).** Picks the single most relevant document
+   from the corpus, hands it to the section-extract subgraph, then
+   synthesizes a clean answer from what came back.
+3. **Inner (section-extract).** Given one document and the
+   question, finds the relevant paragraph and pulls out the answer
+   text.
+
+Each layer has its own state schema scoped to its job: the outer
+cares about a question and a final answer, the middle picks one
+document and synthesizes, the inner narrows to a paragraph and
+extracts a span.
+
+## What it teaches
+
+- [Nested subgraphs](../concepts/composition.md): a compiled
+  subgraph is just a value, so it can be embedded inside another
+  compiled subgraph, recursively.
+- Layer-scoped state schemas. Each compiled subgraph has its own
+  `State` subclass. Boundaries between layers are explicit
+  projections, not aliased namespaces.
+- [`ExplicitMapping`](../concepts/composition.md) at every
+  parent ↔ child boundary, plumbing the question down through three
+  layers and the answer back up.
+- A depth-aware [observer](../concepts/observability.md). The
+  observer prints `namespace` as a `>`-joined breadcrumb and indents
+  by depth. Useful for seeing the descent into nested subgraphs and
+  the return.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/04-nested-subgraphs/main.py \
+  "what happened on Apollo 13?"
+```
+
+Other demo questions: *"what year did humans first land on the
+moon?"* (routes to Apollo 11) and *"who was on the Artemis II
+crew?"* (routes to Artemis II). With no arg the default is the
+moon-landing year question.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  receive[receive]
+  format_final[format_final]
+  stop([end])
+
+  subgraph doc_qa [doc_qa subgraph]
+    direction TB
+    pick_doc[pick_doc]
+    synthesize[synthesize]
+
+    subgraph section_extract [section_extract subgraph]
+      direction TB
+      find_section[find_section]
+      extract_answer[extract_answer]
+      find_section --> extract_answer
+    end
+
+    pick_doc --> section_extract --> synthesize
+  end
+
+  start --> receive --> doc_qa --> format_final --> stop
+```
+
+The doc-QA box wraps the section-extract box. The outer's
+`ExplicitMapping` carries `question` down into `doc_qa` and brings
+`answer` plus `trace` back. Inside `doc_qa`, a second
+`ExplicitMapping` carries `question` plus `selected_body` into the
+section-extract subgraph (as its `question` and `doc_body` fields)
+and brings the extracted text back as `raw_answer`.
+
+## Reading the output
+
+The depth observer indents each event by its namespace depth, so
+the descent and return are visually obvious. A trimmed run:
+
+```
+[step 1] depth=1  receive
+    started   question='what happened on Apollo 13?' answer=''
+    completed question='what happened on Apollo 13?' answer=''
+
+  [step 2] depth=2  doc_qa > pick_doc
+      started   question='...' selected_title='' selected_body=''
+      completed question='...' selected_title='Apollo 13' selected_body='...'
+
+    [step 3] depth=3  doc_qa > section_extract > find_section
+        started   question='...' doc_body='...' relevant_section=''
+        completed question='...' doc_body='...' relevant_section='...'
+
+    [step 4] depth=3  doc_qa > section_extract > extract_answer
+        started   relevant_section='...' extracted=''
+        completed relevant_section='...' extracted='aborted after an oxygen tank ruptured'
+
+  [step 5] depth=2  doc_qa > synthesize
+      started   raw_answer='aborted after...' answer=''
+      completed raw_answer='aborted after...' answer='Apollo 13's lunar landing was aborted after...'
+
+[step 6] depth=1  format_final
+    started   answer='Apollo 13's lunar landing was aborted...'
+    completed answer='Apollo 13's lunar landing was aborted after an oxygen tank ruptured.'
+
+Answer: Apollo 13's lunar landing was aborted after an oxygen tank ruptured.
+
+Trace: ['receive', 'pick_doc', 'find_section', 'extract_answer', 'synthesize', 'format_final']
+```
+
+- **`depth`** counts the namespace tuple. Top-level nodes are
+  depth=1; doc-QA subgraph nodes are depth=2; section-extract nodes
+  are depth=3. The indent makes the levels obvious at a glance.
+- **`namespace`** chains: `doc_qa > section_extract > find_section`
+  means "find_section running inside section_extract running inside
+  doc_qa." Observability backends can use this same chain for
+  trace correlation.
+- **`trace`** in final state shows the order of node completions,
+  flattened across all layers. Each subgraph's projection
+  contributed its trace back to the parent through the parent's
+  `append` reducer; the outer's `trace` ends up as the concatenation.
diff --git a/docs/examples/05-fan-out-with-retry.md b/docs/examples/05-fan-out-with-retry.md
new file mode 100644
index 0000000..35911a0
--- /dev/null
+++ b/docs/examples/05-fan-out-with-retry.md
@@ -0,0 +1,137 @@
+# 05 - Fan-out with retry
+
+Summarize a batch of lunar-mission headlines in parallel, with
+per-headline retry and timing middleware wrapping each instance's
+subgraph run.
+
+## Overview
+
+You have a list of news headlines. Each one needs a one-sentence
+summary plus a topic tag. The headlines are independent, so the
+work parallelizes naturally: dispatch one per-headline subgraph
+run per headline, bounded concurrency, retry transient LLM failures
+on a per-instance basis.
+
+The per-instance subgraph is small (`summarize → classify`) and
+would also run standalone against a single headline. Fan-out
+multiplies it out across the batch.
+
+A second mode, controlled by the `COLLECT_MODE` env var, exercises
+the failure path. With `COLLECT_MODE=1` the demo prepends a
+sentinel headline that always raises `ProviderUnavailable`; under
+`error_policy="collect"` the failure lands in
+`state.instance_errors` and the rest of the batch completes.
+
+## What it teaches
+
+- [`add_fan_out_node`](../concepts/fan-out.md) in `items_field`
+  mode: one subgraph invocation per element of `state.headlines`.
+  `item_field` names the per-instance input field on the subgraph's
+  state.
+- `collect_field` and `extra_outputs` for harvesting per-instance
+  results into parent lists. The two lists (`summaries`, `topics`)
+  end up index-aligned.
+- `instance_middleware`: middleware wrapped around each instance's
+  subgraph run. `RetryMiddleware` (3 attempts, deterministic
+  backoff) plus `TimingMiddleware` (captures duration per
+  instance). Retries are per-instance: a transient failure on
+  headline 3 doesn't restart 0-2.
+- `concurrency=3` capping how many instances run in flight at once.
+- `error_policy="fail_fast"` (default, first exhausted-retry
+  failure aborts the batch) vs `"collect"` (failures land in
+  `errors_field` and the batch produces partial results).
+- A `fan_out_config_observer` reads
+  `NodeEvent.fan_out_config` on the fan-out node's dispatch event,
+  recording the resolved `item_count` / `concurrency` /
+  `error_policy` at runtime. Inner-instance events carry
+  `fan_out_index` but not the config.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/05-fan-out-with-retry/main.py
+```
+
+To exercise the collect path with a synthetic failure:
+
+```bash
+COLLECT_MODE=1 LLM_API_KEY=sk-... \
+  uv run python examples/05-fan-out-with-retry/main.py
+```
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  announce[announce]
+  present[present]
+  stop([end])
+
+  subgraph headline_runs [headline_runs: fan-out, concurrency=3]
+    direction TB
+    note["N instances of:<br/>summarize -> classify<br/>(retry + timing middleware)"]
+  end
+
+  start --> announce --> headline_runs --> present --> stop
+```
+
+`headline_runs` is the fan-out node. At dispatch time it expands
+into N copies of the per-instance subgraph, one per headline.
+`RetryMiddleware` and `TimingMiddleware` wrap each instance.
+
+## Reading the output
+
+A clean default-mode run (`fail_fast`, all instances succeed):
+
+```
+========================================================================
+Summarizing 5 headlines in parallel (concurrency=3)
+error_policy='fail_fast'
+========================================================================
+
+  [observer] fan-out node 'headline_runs' dispatching: item_count=5 concurrency=3 error_policy='fail_fast'
+
+Results (in input order):
+
+  [0] Artemis II splashes down in Pacific after ten-day lunar flyby
+       summary: <one-sentence rewrite>
+       topic:   crew
+
+  [1] NASA pauses Lunar Gateway program in favor of crewed surface base
+       summary: <one-sentence rewrite>
+       topic:   policy
+
+  ...
+
+Per-instance timings (in completion order):
+  #0   812.3 ms  outcome=success
+  #1   941.7 ms  outcome=success
+  #2   876.2 ms  outcome=success
+  #3   903.4 ms  outcome=success
+  #4   1012.8 ms  outcome=success
+
+  wall-clock total:         2089.3 ms
+  sum of per-instance:      4546.4 ms
+  → concurrency speedup:    2.18x
+```
+
+- **The observer line** is the `fan_out_config_observer` printing
+  the dispatch-time config. Useful when `count` or `concurrency`
+  are callable resolvers whose runtime value isn't visible in code.
+- **Per-input order vs completion order.** The result loop walks
+  `final.headlines` in input order; `final.summaries` and
+  `final.topics` are index-aligned with it. The timings list is in
+  completion order, not input order (instance 2 may finish before
+  instance 1 under concurrency).
+- **Concurrency speedup.** `sum of per-instance / wall-clock`. A
+  speedup near `concurrency` indicates the work parallelized well;
+  a value near 1.0 indicates concurrency didn't help (the upstream
+  serialized you, or instances themselves are short).
+
+With `COLLECT_MODE=1`, the output includes the sentinel headline
+at index 0 with a `(failed after retries; ...)` marker, plus a
+`Captured 1 per-instance error(s):` block listing the failed
+`fan_out_index` and error category. The other instances complete
+as usual.
diff --git a/docs/examples/06-parallel-branches.md b/docs/examples/06-parallel-branches.md
new file mode 100644
index 0000000..4150012
--- /dev/null
+++ b/docs/examples/06-parallel-branches.md
@@ -0,0 +1,134 @@
+# 06 - Parallel branches
+
+Enrich a lunar-mission news article with three independent
+analyses (one-sentence summary, sentiment label, topic tags)
+running concurrently as separate subgraphs.
+
+## Overview
+
+Where fan-out (example 05) runs N copies of *one* subgraph against
+different inputs, parallel-branches runs M *heterogeneous*
+subgraphs against the same input. Different state schemas,
+different middleware, different topologies per branch, one
+dispatch.
+
+The article goes into three branches in parallel:
+
+- **summary**: bare subgraph, one node, writes `summary` back.
+- **sentiment**: subgraph wrapped in `RetryMiddleware` (the
+  classification call is short and cheap to retry), writes a
+  `label` back into the parent's `sentiment` field.
+- **topics**: bare subgraph, writes a `tags` list back into the
+  parent's `topics` field.
+
+The branches don't depend on each other, so they fire concurrently
+and the parent fans in once all three complete.
+
+## What it teaches
+
+- [`add_parallel_branches_node`](../concepts/parallel-branches.md):
+  M named `BranchSpec`s under one node. Each spec carries its own
+  compiled subgraph plus per-branch input/output projection plus
+  optional per-branch middleware.
+- Branches with *different* state schemas. The summary subgraph's
+  state has a `summary` field; the sentiment subgraph's has
+  `label`; the topics subgraph's has a `tags` list. The projection
+  mappings translate between the branch's vocabulary and the
+  parent's.
+- Heterogeneous per-branch middleware. The sentiment branch wraps
+  its subgraph in retry; the other two run bare. A production
+  pipeline often wants different retry policies, timing windows, or
+  custom middleware per branch.
+- Branch insertion order = fan-in order. When two branches write to
+  the same parent field, the parent's reducer applies them in the
+  order they were declared in the `branches` mapping (not in
+  completion order). The three branches here write disjoint parent
+  fields, so the order doesn't affect the result, but the property
+  holds.
+- A `branch_attribution_observer` reads
+  `NodeEvent.branch_name` on inner-node events. `branch_name` is
+  populated only for events *inside* a branch's subgraph;
+  outer-graph nodes carry `branch_name=None`. This is the
+  per-event attribution that lets observability backends route
+  metrics and spans by branch.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/06-parallel-branches/main.py
+```
+
+The article is baked into the example.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  receive[receive]
+  present[present]
+  stop([end])
+
+  subgraph enrich [enrich: parallel-branches]
+    direction TB
+    summary[summary branch]
+    sentiment[sentiment branch<br/>retry middleware]
+    topics[topics branch]
+  end
+
+  start --> receive --> enrich --> present --> stop
+```
+
+`enrich` is the parallel-branches node; the three branches inside
+the box dispatch concurrently against the same `article` field on
+parent state. The sentiment branch is the only one with middleware
+attached.
+
+## Reading the output
+
+```
+========================================================================
+Lunar-mission article enrichment - three independent analyses in parallel
+========================================================================
+
+Article (642 chars):
+
+NASA's Artemis II crew capsule Integrity splashed down in the Pacific
+Ocean this evening, ending a ten-day flight that carried four astronauts
+on a free-return trajectory around the Moon and back...
+
+  [observer] (branch=summary) inner node 'write_summary' started
+  [observer] (branch=sentiment) inner node 'classify_sentiment' started
+  [observer] (branch=topics) inner node 'extract_topics' started
+
+========================================================================
+Enrichment results
+========================================================================
+
+  summary:   <one-sentence summary>
+  sentiment: positive
+  topics:    ['Artemis II', 'splashdown', 'lunar program']
+
+  wall-clock: 1142.6 ms
+
+The three branches ran in parallel - wall-clock is closer to the
+slowest single branch than to the sum of all three.
+```
+
+- **The three observer lines** fire close together (often within a
+  few ms of each other), confirming the branches dispatched in
+  parallel rather than serially.
+- **`branch_name` attribution** is what makes per-branch
+  observability tractable. `write_summary` knows nothing about
+  `branch_name`; it's the engine that tags the event for the
+  observer.
+- **Wall-clock under 1500 ms** for three sequential LLM calls is
+  the clearest indicator of parallelism. Three serial calls at
+  roughly 1s each would land near 3 seconds; under parallel
+  dispatch the wall-clock approaches the slowest branch's duration.
+- **Disjoint output fields** mean the reducer order at fan-in
+  doesn't matter here. If two branches both wrote to `summary`, the
+  declared branch order (`summary` before `sentiment` before
+  `topics`) would determine which value won under the default
+  `last_write_wins` reducer.
diff --git a/docs/examples/07-multimodal-prompt.md b/docs/examples/07-multimodal-prompt.md
new file mode 100644
index 0000000..4838f5a
--- /dev/null
+++ b/docs/examples/07-multimodal-prompt.md
@@ -0,0 +1,126 @@
+# 07 - Multimodal prompt
+
+Two independent analyses of a lunar-mission photograph using
+versioned prompt templates, a fallback prompt backend, and a
+multimodal user message that carries both text and image.
+
+## Overview
+
+Given a photograph of a lunar mission (default: Buzz Aldrin on the
+lunar surface during Apollo 11), run two independent analyses
+against the same image:
+
+- **describe-surface**: describe what's visible of the lunar
+  surface.
+- **describe-equipment**: identify the equipment in the frame.
+
+Both prompts take the mission name as their only variable; neither
+depends on the other's output. Both rendered prompts are grouped
+under one observability `PromptGroup` so a trace UI can show the
+analyses as one logical unit.
+
+The image source can be a URL (default) or a local file. Setting
+`IMAGE_PATH` switches the demo to inline base64 transport.
+
+## What it teaches
+
+- [`PromptManager`](../concepts/prompts.md) configured with two
+  `FilesystemPromptBackend`s: a primary `prompts/` directory and a
+  fallback `prompts_fallback/` directory. The manager tries them in
+  order on every `get`; the fallback fires only when the primary
+  raises `PromptStoreUnavailable`. Typical production shape is
+  "remote primary + filesystem fallback".
+- [`FilesystemPromptBackend`](../concepts/prompts.md) with the
+  `<root>/<label>/<name>.j2` layout, here under the `production`
+  label.
+- [`PromptGroup`](../concepts/prompts.md) wrapping the two prompt
+  results under one identifier. `with_active_prompt_group` and
+  `with_active_prompt` are nested ContextVar scopes that propagate
+  group and per-call prompt identifiers to observers.
+- [Multimodal `UserMessage`](../concepts/llms.md) with a list of
+  content blocks (`TextBlock` + `ImageBlock`), and the two image
+  sources (`ImageSourceURL` for hosted images, `ImageSourceInline`
+  with base64 data for local files).
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/07-multimodal-prompt/main.py
+```
+
+To use a local file instead of the default URL:
+
+```bash
+IMAGE_PATH=./photo.jpg LLM_API_KEY=sk-... \
+  uv run python examples/07-multimodal-prompt/main.py
+```
+
+Override `MISSION` to set the prompt variable for a different
+mission name:
+
+```bash
+MISSION="Apollo 17" LLM_API_KEY=sk-... \
+  uv run python examples/07-multimodal-prompt/main.py
+```
+
+The model must be vision-capable. `gpt-4o-mini` (the default) is.
+
+## The graph
+
+```mermaid
+flowchart LR
+  start([start])
+  describe_surface[describe_surface]
+  describe_equipment[describe_equipment]
+  stop([end])
+
+  start --> describe_surface --> describe_equipment --> stop
+```
+
+Two sequential nodes. Each fetches its own prompt from the
+manager, renders it with the `mission` variable, builds a
+multimodal `UserMessage` with the rendered text plus the image
+block, and calls the LLM inside a `with_active_prompt(rendered)`
+scope. Both nodes run inside a `with_active_prompt_group(group)`
+scope set up by `main()`.
+
+## Reading the output
+
+```
+========================================================================
+Lunar-mission image analysis (surface + equipment)
+========================================================================
+
+  mission:   Apollo 11
+  image:     https://upload.wikimedia.org/.../Aldrin_Apollo_11_original.jpg (url)
+
+  group:                lunar-image-analysis
+  describe-surface:     describe-surface @ <version-hash>
+  describe-equipment:   describe-equipment @ <version-hash>
+
+  surface description:
+    <model's description of the lunar regolith, footprints, terrain, etc.>
+
+  equipment description:
+    <model's identification of the LM, EASEP, suit, camera, etc.>
+```
+
+- **`describe-surface @ <hash>`** lines show the prompt identifiers
+  resolved through the manager. The hash is the template digest;
+  it changes if the template file is edited. Observability backends
+  use this to correlate LLM-call spans to specific template
+  versions.
+- **`group: lunar-image-analysis`** is the group identifier from
+  `PromptGroup`. Inside the `with_active_prompt_group` scope, OTel
+  observers would stamp this on every LLM-call span fired by either
+  node.
+- **Per-call prompt scope**. The inner `with_active_prompt(rendered)`
+  block adds the per-call identifiers (name, version, label,
+  template_hash, rendered_hash) on top of the group identifier. Two
+  layers, both stamped on the same span.
+- **Fallback path** isn't visible in a clean run because the
+  primary backend serves both prompts. To observe it, point the
+  primary at a directory missing one of the templates and re-run;
+  the manager logs a fallback transition and the call succeeds with
+  the variant from `prompts_fallback/`.
diff --git a/docs/examples/08-checkpointing-and-migration.md b/docs/examples/08-checkpointing-and-migration.md
new file mode 100644
index 0000000..80b9004
--- /dev/null
+++ b/docs/examples/08-checkpointing-and-migration.md
@@ -0,0 +1,157 @@
+# 08 - Checkpointing and migration
+
+A lunar-mission planning pipeline that writes a SQLite checkpoint
+after every step, then resumes the saved invocation under an
+upgraded state schema with a v1->v2 migration backfilling new
+fields.
+
+## Overview
+
+A planning pipeline drafts a lunar mission plan in three steps:
+
+1. `define_objective` - state the primary objective in one
+   sentence.
+2. `size_crew` - pick a crew size between 2 and 8.
+3. `draft_timeline` - draft a one-sentence timeline.
+
+The graph is wired with a `SQLiteCheckpointer` in JSON mode, so the
+engine writes a record after every `completed` event. The whole v1
+pipeline runs once, and the saved record persists on disk in a
+temporary directory.
+
+Then "some time later" - in the same script for demo purposes - a
+v2 schema lands. It adds a `risk_assessment` field and a new
+`assess_risks` node at the end. A migration function backfills
+`risk_assessment=""` for v1 records. The v2 graph resumes the
+saved v1 invocation, the migration runs once on load, and
+execution continues at `assess_risks`. The original three nodes do
+not re-execute.
+
+## What it teaches
+
+- [`SQLiteCheckpointer(path, serialization="json")`](../concepts/checkpointing.md)
+  writing records to a SQLite file. JSON is the
+  migration-eligible serialization; `pickle` mode is faster but
+  can't bridge schemas.
+- [`with_checkpointer`](../concepts/checkpointing.md) wiring the
+  checkpointer to the graph. The engine fires a save at every
+  `completed` event for outermost and subgraph-internal nodes.
+- [`State.schema_version`](../concepts/checkpointing.md) as a
+  `ClassVar[str]` declaration. Empty string opts the class out of
+  migration support; any non-empty value opts it in.
+- [`with_state_migration(from_version, to_version, migrate)`](../concepts/checkpointing.md)
+  registering one edge of the migration chain. The `migrate`
+  callable is pure (dict in, dict out, no I/O).
+- [`invoke(state, resume_invocation=<id>)`](../concepts/checkpointing.md)
+  resuming from a saved record. The engine reads the record,
+  applies the migration chain, re-deserializes against the current
+  state class, and continues at the first uncompleted node.
+- The migration registry's BFS resolution: with a v3 schema and
+  two migration edges (`v1→v2`, `v2→v3`), the registry walks the
+  shortest chain automatically. A v1 record loaded under a v3
+  graph runs `v1→v2` then `v2→v3` without caller-side composition.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/08-checkpointing-and-migration/main.py
+```
+
+The SQLite database is created in a `TemporaryDirectory` that's
+cleaned up automatically. The demo runs both phases in one
+invocation so you can see the resume behavior end-to-end without
+manual orchestration.
+
+## The graph
+
+V1 graph:
+
+```mermaid
+flowchart LR
+  start([start])
+  define_objective[define_objective]
+  size_crew[size_crew]
+  draft_timeline[draft_timeline]
+  stop([end])
+
+  start --> define_objective --> size_crew --> draft_timeline --> stop
+```
+
+V2 graph (adds `assess_risks` at the end):
+
+```mermaid
+flowchart LR
+  start([start])
+  define_objective[define_objective]
+  size_crew[size_crew]
+  draft_timeline[draft_timeline]
+  assess_risks[assess_risks]
+  stop([end])
+
+  start --> define_objective --> size_crew --> draft_timeline --> assess_risks --> stop
+```
+
+The v2 graph also registers `with_state_migration("v1", "v2",
+migrate_v1_to_v2)`. The migration function takes the saved state
+as a plain dict and returns a dict at the new schema (here, just
+`{**state_dict, "risk_assessment": ""}`).
+
+## Reading the output
+
+```
+========================================================================
+Phase 1 - invoke v1 graph; checkpoints save after every node
+========================================================================
+
+  destination:       Lunar South Pole
+  checkpoint db:     /tmp/oa-checkpoint-demo-.../checkpoints.sqlite
+
+v1 result:
+  objective:  <objective sentence>
+  crew_size:  4
+  timeline:   <timeline sentence>
+
+  v1 invocation_id: <uuid>
+
+========================================================================
+Phase 2 - invoke v2 graph with resume; v1->v2 migration runs
+========================================================================
+
+  v2 adds:    risk_assessment field + assess_risks node
+  migration:  backfills risk_assessment='' for v1 records
+
+v2 result after resume:
+  objective:        <same objective sentence from v1>
+  crew_size:        4
+  timeline:         <same timeline sentence from v1>
+  risk_assessment:  <new sentence from assess_risks>
+
+  trace: ['assess_risks']
+
+The v1 nodes appear once each in v1's trace and NOT in v2's
+trace - they were skipped on resume because completed_positions
+already covered them. Only assess_risks ran in phase 2.
+```
+
+- **`v1 invocation_id`** is the saved record's correlation key. We
+  passed a deterministic `correlation_id` to `invoke()` so the
+  checkpoint filter can find the right record; in production, the
+  caller usually owns the correlation_id and persists it alongside
+  the request that produced the run.
+- **`trace: ['assess_risks']`** on the v2 result is the key signal.
+  The v1 nodes (`define_objective`, `size_crew`, `draft_timeline`)
+  did not re-execute on resume. Their `completed_positions` entries
+  in the saved record told the engine they were already done; the
+  v2 pipeline began at the first uncompleted position, which is
+  `assess_risks`.
+- **`crew_size: 4`** and the other v1 fields are present on the v2
+  result because the migration preserved them via `{**state_dict,
+  ...}`. A migration that *changed* an existing field (e.g.,
+  splitting `name` into `first_name` + `last_name`) would
+  transform the dict more thoroughly.
+- **JSON serialization** is what made this possible. With
+  `serialization="pickle"`, the saved record would be a pickled
+  v1 instance that couldn't be re-deserialized against
+  `MissionPlanStateV2`; JSON makes the saved state a plain dict
+  that the migration function can rewrite freely.
diff --git a/docs/examples/09-tool-use.md b/docs/examples/09-tool-use.md
new file mode 100644
index 0000000..e3c5f9b
--- /dev/null
+++ b/docs/examples/09-tool-use.md
@@ -0,0 +1,151 @@
+# 09 - Tool use
+
+A lunar-mission assistant that calls local Python tools to answer
+questions mixing fact recall and physics arithmetic about Apollo
+and Artemis missions.
+
+## Overview
+
+A user asks something that mixes a factual recall ("when did
+Apollo 13 splash down?") with a small computation ("what's the
+delta-v for a Hohmann transfer from a 300 km parking orbit to
+lunar distance?"). Neither belongs in the model's prompt: facts
+get stale and arithmetic is unreliable from the model alone. The
+agent defines two local tools and lets the model call them.
+
+The agent loop:
+
+1. Send the running message list (plus tool definitions) to the
+   model.
+2. If the model returns `tool_calls`, dispatch each to its local
+   Python function, append the results as `ToolMessage`s, and call
+   the model again.
+3. When the model returns a normal assistant message (no
+   `tool_calls`), present it as the final answer.
+
+A `MAX_TURNS` cap (5 here) prevents runaway loops.
+
+## What it teaches
+
+- [`Tool(name, description, parameters)`](../concepts/llms.md)
+  defining each function as a JSON Schema. The two tools here use
+  the standard `type: object` shape with `required` properties; the
+  model receives them via `complete(messages, tools=TOOLS)`.
+- [`ToolCall(id, name, arguments)`](../concepts/llms.md) records
+  parsed from the model's response. The framework guarantees
+  `arguments` is a parsed dict matching the tool's parameters
+  schema (or `None` only under `finish_reason="error"`).
+- [`ToolMessage(content, tool_call_id)`](../concepts/llms.md)
+  round-trip. The `tool_call_id` must match the `id` the model
+  emitted so it can pair its request with the response.
+- The **agent loop as a graph cycle**:
+  `call_llm → dispatch_tools → call_llm → present → END`
+  with a [conditional edge](../concepts/graphs.md) from `call_llm`
+  routing to either `dispatch_tools` (more tool calls requested),
+  `present` (model is done), or `present` (turn cap exceeded). No
+  special "agent framework" abstraction; tool-calling composes with
+  existing graph mechanics.
+- Defensive handling: `tc.arguments is None` returns a clear error
+  string the model can react to; `try/except` around `dispatch`
+  catches `KeyError` / `ValueError` / `TypeError` and surfaces them
+  the same way.
+
+## How to run
+
+```bash
+uv sync --group examples
+LLM_API_KEY=sk-... uv run python examples/09-tool-use/main.py
+```
+
+Pass a question on the command line:
+
+```bash
+LLM_API_KEY=sk-... uv run python examples/09-tool-use/main.py \
+  "When was Apollo 17 launched?"
+```
+
+With no arg, the default is a multi-part question combining a
+mission lookup with a delta-v computation, which usually forces at
+least two tool calls before the model presents an answer.
+
+## The graph
+
+```mermaid
+flowchart TD
+  start([start])
+  call_llm[call_llm]
+  dispatch_tools[dispatch_tools]
+  present[present]
+  stop([end])
+
+  start --> call_llm
+  call_llm -->|tool_calls present| dispatch_tools
+  call_llm -->|no tool_calls or turn cap hit| present
+  dispatch_tools --> call_llm
+  present --> stop
+```
+
+The cycle is `call_llm → dispatch_tools → call_llm`, exited by the
+conditional edge from `call_llm` to `present`. `route_after_llm`
+returns `"present"` if `s.turn >= MAX_TURNS` (hard cap) or if the
+last message has no `tool_calls` (model is done); otherwise
+`"dispatch_tools"`.
+
+## Reading the output
+
+For the default multi-part question, expect something like:
+
+```
+========================================================================
+Lunar-mission assistant - tool-calling loop
+========================================================================
+
+  question: Tell me about Apollo 13. Then, separately, if I were
+  planning a similar free-return-style mission and wanted to inject
+  from a 300 km parking orbit to apogee at the Moon's mean distance
+  (384,400 km above Earth's surface), roughly how much delta-v would
+  that take?
+
+  turns:     2
+  tools used: 2
+
+  trace:
+    - call_llm[turn=1]
+    - dispatch_tools[2]
+    - call_llm[turn=2]
+    - present
+
+  final answer:
+    Apollo 13 launched 11 April 1970 with Lovell, Swigert, and Haise
+    aboard; an oxygen tank rupture in the service module aborted the
+    landing and the crew returned safely via free-return on 17 April
+    1970. For your hypothetical free-return injection from a 300 km
+    parking orbit to apogee at the Moon's mean distance...
+    [continues with delta-v values from the tool]
+```
+
+- **`turns: 2`** means `call_llm` ran twice. Turn 1 produced
+  tool_calls (the model decided to use both tools); turn 2
+  produced the final answer with no tool_calls, and the conditional
+  edge routed directly to `present`. (The cap of 5 is never hit on
+  the happy path.)
+- **`tools used: 2`** counts `ToolMessage`s appended. The model
+  emitted two `ToolCall`s in one assistant turn (one for
+  `lookup_mission`, one for `compute_delta_v`); the dispatcher
+  produced two `ToolMessage`s back. A `tool_call_count` over
+  `turns - 1` indicates the model batched calls.
+- **`trace`** shows the loop structure clearly: each `call_llm`
+  step is tagged with its turn number; `dispatch_tools[N]` records
+  how many tools that step ran.
+- **Order of tool calls vs replies.** Inside a `dispatch_tools`
+  step, the implementation preserves order: the assistant message's
+  `tool_calls` list is iterated in order, and one `ToolMessage` is
+  appended per call. The model relies on `tool_call_id` to pair
+  requests with responses, but the in-order appending keeps logs
+  scannable.
+- **Failure mode.** If the model emits a `ToolCall` with
+  unparseable arguments (only happens under
+  `finish_reason="error"`), `dispatch_tools` appends a
+  `ToolMessage` with an error string instead of raising. The next
+  turn lets the model react to the error and either retry or
+  give up.
diff --git a/docs/examples/index.md b/docs/examples/index.md
new file mode 100644
index 0000000..4bdd2c8
--- /dev/null
+++ b/docs/examples/index.md
@@ -0,0 +1,75 @@
+# Examples
+
+End-to-end demos of `openarmature`, each framed around a small but
+plausible use case. Read top to bottom for a guided tour, or jump
+straight to whichever example covers the feature you're learning.
+
+Every demo is a standalone `main.py` you can run against any
+OpenAI-compatible LLM endpoint (OpenAI's public API, vLLM, LM Studio,
+llama.cpp server, etc.). All code lives under
+[`examples/`](https://github.com/LunarCommand/openarmature-python/tree/main/examples)
+in the repo.
+
+## Catalog
+
+- [**00 - Hello, world**](00-hello-world.md). Classify a query and
+  route to one of two follow-up nodes. Smallest possible LLM-routed
+  pipeline; introduces typed state, reducers, conditional edges, and
+  structured output.
+- [**01 - Routing and subgraphs**](01-routing-and-subgraphs.md).
+  Question-answering assistant that branches into a short-answer node
+  or a research subgraph, then copy-edits the result.
+- [**02 - Explicit subgraph mapping**](02-explicit-subgraph-mapping.md).
+  Run the same analysis subgraph against two parent fields by mapping
+  them explicitly at each call site.
+- [**03 - Observer hooks**](03-observer-hooks.md). Attach
+  observability to a `draft → review → finalize` pipeline without
+  touching any node code, including OpenTelemetry spans.
+- [**04 - Nested subgraphs**](04-nested-subgraphs.md). Question
+  answering against a small baked-in document corpus with two levels
+  of subgraph nesting.
+- [**05 - Fan-out with retry**](05-fan-out-with-retry.md). Summarize a
+  batch of news headlines in parallel, with retry + timing middleware
+  wrapping each per-instance subgraph run.
+- [**06 - Parallel branches**](06-parallel-branches.md). Enrich an
+  article with three independent analyses (summary, sentiment, tags)
+  running concurrently, each with its own state schema.
+- [**07 - Multimodal prompt**](07-multimodal-prompt.md). Two analyses
+  of a lunar-mission photograph using versioned prompt templates,
+  multimodal user messages, and a prompt-group trace.
+- [**08 - Checkpointing and migration**](08-checkpointing-and-migration.md).
+  Resume a saved invocation under an upgraded state schema, with a
+  v1→v2 migration backfilling new fields.
+- [**09 - Tool use**](09-tool-use.md). Lunar-mission assistant that
+  calls local Python tools to answer questions mixing fact recall and
+  physics arithmetic.
+
+## Configuration
+
+All demos read their LLM client config from environment variables.
+The OpenAI public-API defaults are:
+
+| Env var         | Default                    | Notes                                                                            |
+| --------------- | -------------------------- | -------------------------------------------------------------------------------- |
+| `LLM_BASE_URL`  | `https://api.openai.com`   | Host root only. The provider adds the path.                                      |
+| `LLM_MODEL`     | `gpt-4o-mini`              | Any model the bound endpoint exposes.                                            |
+| `LLM_API_KEY`   | (none)                     | Required. Pass empty for local servers that don't authenticate.                  |
+
+For a local OpenAI-compatible server (vLLM, LM Studio, llama.cpp,
+etc.), point `LLM_BASE_URL` at the host root (e.g.
+`http://localhost:8000`) and set `LLM_API_KEY` to whatever value the
+server expects.
+
+## Running locally
+
+```bash
+# Install the examples dep group.
+uv sync --group examples
+
+# Demo 03 also wants the OTel SDK for its OTelObserver.
+uv sync --group examples --all-extras
+
+# Run any demo.
+LLM_API_KEY=sk-... uv run python examples/00-hello-world/main.py
+LLM_API_KEY=sk-... uv run python examples/01-routing-and-subgraphs/main.py "what year did the moon landing happen"
+```
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
index eacacc1..3eda4f7 100644
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -197,6 +197,17 @@
   display: none;
 }
 
+/* Indent nested nav items so section hierarchy reads at a glance.
+ * Material's ``navigation.sections`` layout renders section labels
+ * and their children flush-left at the same horizontal position;
+ * fine for short sections but hard to scan when one runs long (the
+ * Examples section has 11 entries). A small left-pad on the nested
+ * links shows the hierarchy without altering typography. Scoped to
+ * the primary nav so the right-hand TOC sidebar is untouched. */
+.md-nav--primary .md-nav .md-nav__link {
+  padding-left: 0.375rem;
+}
+
 [data-md-color-scheme="slate"] .md-header,
 [data-md-color-scheme="slate"] .md-footer-meta {
   background-color: var(--md-footer-bg-color);
diff --git a/mkdocs.yml b/mkdocs.yml
index 2a4b3fc..1877e84 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -65,6 +65,10 @@ plugins:
           - getting-started/*.md
         Concepts:
           - concepts/*.md
+        Examples:
+          - examples/*.md
+        Model Providers:
+          - model-providers/*.md
         Reference:
           - reference/*.md
 
@@ -80,7 +84,11 @@ markdown_extensions:
   - pymdownx.emoji:
       emoji_index: !!python/name:material.extensions.emoji.twemoji
       emoji_generator: !!python/name:material.extensions.emoji.to_svg
-  - pymdownx.superfences
+  - pymdownx.superfences:
+      custom_fences:
+        - name: mermaid
+          class: mermaid
+          format: !!python/name:pymdownx.superfences.fence_code_format
   - pymdownx.highlight:
       anchor_linenums: true
   - pymdownx.inlinehilite
@@ -104,6 +112,18 @@ nav:
     - Prompts: concepts/prompts.md
     - Observability: concepts/observability.md
     - Checkpointing: concepts/checkpointing.md
+  - Examples:
+    - examples/index.md
+    - Hello, world: examples/00-hello-world.md
+    - Routing and subgraphs: examples/01-routing-and-subgraphs.md
+    - Explicit subgraph mapping: examples/02-explicit-subgraph-mapping.md
+    - Observer hooks: examples/03-observer-hooks.md
+    - Nested subgraphs: examples/04-nested-subgraphs.md
+    - Fan-out with retry: examples/05-fan-out-with-retry.md
+    - Parallel branches: examples/06-parallel-branches.md
+    - Multimodal prompt: examples/07-multimodal-prompt.md
+    - Checkpointing and migration: examples/08-checkpointing-and-migration.md
+    - Tool use: examples/09-tool-use.md
   - Model Providers:
     - model-providers/index.md
     - Authoring a Provider: model-providers/authoring.md