From 6a8ee6ff2cff5e3e76d4c8aff0ad7f2a36dc6c3f Mon Sep 17 00:00:00 2001
From: Tomas Pflanzer <tom@wiseguys.co>
Date: Sat, 16 May 2026 19:42:37 +0200
Subject: [PATCH] chore: bump to 0.32.0 "Claude Agents Deep Integration"

Bumps __version__, rewrites README hero, adds v0.32.0 changelog entry,
adds v0.32 release card to /whatsnew/ (Don Draper voice opening "You
shipped the agent. The client wants the integration. The auditor wants
the trail..."), and updates the v0.31 -> v0.32 footer labels on the
homepage and the EU AI Act landing.
---
 CHANGELOG.md               |  57 +++++++++++++++++++++
 README.md                  |  21 +++++---
 site/eu-ai-act/index.html  |   2 +-
 site/index.html            |   4 +-
 site/whatsnew/index.html   | 101 ++++++++++++++++++++++++++++++++++---
 src/sandcastle/__init__.py |   2 +-
 6 files changed, 169 insertions(+), 18 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7c543ef..ff24a0d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,63 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.32.0] - 2026-05-16 - "Claude Agents Deep Integration"
+
+You shipped the agent. The client wants the integration. The auditor wants the trail. The user wants you to ask the next question without restarting the whole workflow. v0.32 is the answer to every one of those. Sandcastle now exposes every Anthropic Managed Agents primitive shipped under the managed-agents-2026-04-01 beta umbrella, plus the things Anthropic doesn't ship: a cryptographically verifiable trajectory replay, a Skills publisher that turns workflows into uploadable Claude Skills, and an Agent SDK runtime for teams that want in-process execution. Two weeks of work, 169 new tests, one release.
+
+### Added - Anthropic primitives (the things the beta header gives you)
+
+- **Memory Stores** client (`sandcastle.engine.memory_stores.MemoryStoresClient`). Versioned per-session memory mounted at /mnt/memory/, optimistic-concurrency writes via If-Match, redact endpoint for GDPR right-to-be-forgotten, 100 kB per file, 8 stores per session. `attach_to_session_payload()` helper builds the resources block for session-create.
+- **Multiagent coordinator** (`sandcastle.engine.multiagent`). Up to 20 specialist agents in parallel, 25 threads, 1-level depth per Anthropic spec. Three pre-baked templates: `research-and-write`, `code-review-and-test`, `analyst-with-translator`. `validate_roster()` + `build_coordinator_payload()` + `parse_thread_event()`.
+- **Outcomes API** (`sandcastle.engine.outcomes`). `user.define_outcome` events on session start, `span.outcome_evaluation_end` captured into step output. Composite aggregator at module level so AutoPilot and Workflow Evolution can read native Anthropic eval signals.
+- **Webhooks** (`sandcastle.api.agent_webhooks`). HMAC-signed session lifecycle events at `/agent-webhooks/anthropic`. Fire-and-forget dispatch, integrates with the existing arq scheduler.
+- **Elicitation** (the 6th MCP primitive, added in spec rev 2025-11-25). New `request_workflow_input` tool wraps `ctx.session.elicit()` with JSON Schema validation so a workflow that hits a gap mid-execution can ask the user for a typed value without restarting.
+
+### Added - managed-agent step extensions
+
+The `type: managed-agent` step now accepts three new config fields that thread directly into the Anthropic primitives above:
+
+- `memory_stores: list[str]` - attach existing memory store IDs to the session
+- `multiagent: dict` - build a coordinator payload with validated roster
+- `outcomes: list[dict]` - define outcomes at session start, capture eval results in step output
+
+### Added - Sandcastle differentiators (the things Anthropic doesn't ship)
+
+- **Skills publisher** (`sandcastle.engine.agent_skills`). `sandcastle publish-skills [--upload] [--dir]` converts every workflow into a SKILL.md tar.gz package with strict frontmatter validation (kebab-case name, no reserved tokens, ≤1024-char description) and uploads to `/v1/skills`. Workflows are now reachable from every Anthropic Skills-aware client.
+- **Trajectory Replay step type** (`sandcastle.engine.trajectory_replay`). New `type: trajectory-replay` step computes SHA-256 over a recorded tool-call sequence, diffs against a candidate run, returns score + diff_summary. Because Sandcastle's audit trail is a hash chain, the replay is cryptographically verifiable - a property neither LangSmith nor Braintrust ships.
+- **Computer Use integration helper** (`sandcastle.engine.computer_use`). New `type: computer-use` step type. Builds the `computer_20251124` tool definitions, sets the beta header, runs an 8-item safety pre-flight (prompt-injection guard, screenshot dimensions, page-load deadline).
+- **Agent SDK runtime** (`sandcastle.engine.agent_sdk_runtime`). New `runtime: "agent-sdk"` dispatch. For teams who want in-process Claude agents (EU sovereignty, air-gapped, no Managed Agents infra). Lazy-imports `anthropic_agent_sdk`; falls back to a typed `AgentSDKNotInstalled` error when the optional package isn't installed.
+
+### Added - Tool Search + tool-use-examples convention
+
+New `sandcastle.engine.tool_search.ToolRegistry` lets workflows mark tools with `defer_loading: true` (loaded on first selection) and `examples: [...]` (1-5 realistic invocations per tool). Anthropic measured the result on Opus 4: tool-selection accuracy from 49% to 74%, usable context from 122,800 to 191,300 tokens (85% saving), parameter accuracy from 72% to 90%. New docs/tool-examples-convention.md.
+
+### Added - Tier 1 wire fixes (table stakes that had been broken)
+
+- `tools_enabled` config field is now actually sent to the agent-create API (previously parsed but ignored - users thought they were restricting tools).
+- `temperature`, `max_tokens`, `thinking_budget` on `ManagedAgentConfig`. None-aware: omitted from request when unset.
+- `stream` config field is now honoured (was dead code).
+- Pricing table for Opus 4.7 (5/25), Sonnet 4.6 (3/15), Haiku 4.5 (1/5), Opus 4.6 (15/75), Sonnet 4.5 (3/15). Unknown model falls back to Sonnet 4.6 rates with a one-time warning.
+- `fallback_template` accepts a list (chain of up to 5 templates) in addition to a single string.
+
+### Added - dashboard
+
+- Live "Agent Reasoning" panel on the run detail page. Subscribes to `/api/runs/{id}/agent-stream` SSE, renders agent.thinking, agent.tool_use, agent.message, agent.complete, agent.error events. Thread-grouped, collapsible, graceful 404 fallback.
+
+### Changed
+
+- New step types `trajectory-replay` and `computer-use` registered (VALID_STEP_TYPES count 22 -> 24).
+- `agent_webhooks_router` mounted on the FastAPI app alongside `a2a_router` and `agui_router`.
+- MCP server manifest now advertises 6 primitives (added Elicitation) and declares `spec_revision: "2025-11-25"`.
+
+### Tests
+
+- 18 new tests for Tier 1 wire fixes (tests/test_managed_agent_wires.py)
+- 156 new tests for the 9 modules in isolation
+- 13 new e2e wiring tests (tests/test_v032_wiring.py)
+- 169 v0.32-related tests total, all green in 1.8s
+- Full suite: 15,176 passing (vs 15,009 baseline) - the +167 are this release's new tests
+
 ## [0.31.0] - 2026-05-14 - "Compliance & Connections"
 
 Eighty days to the EU AI Act deadline (2 August 2026). This release is the answer: a dedicated landing page mapping every Sandcastle control to a specific Article, ten compliance workflow templates, MCP-first publishing so every workflow becomes a tool inside Claude Desktop / Cursor / Windsurf, eval gates that block regressing models from getting promoted, and a dashboard that doesn't crash when one API hiccups. Plus the closeout of v0.30: Codex audit rounds 9 and 10 fully fixed.
diff --git a/README.md b/README.md
index 4525801..78c9839 100644
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@
 
 **Describe what you want. Go home. Sandcastle ships it.** Production-ready workflow orchestrator for AI agents. 7 AI providers with auto-failover, 22 step types including Claude Managed Agents, 15 agent templates, 4 OCR engines, EU AI Act compliance, and a full-featured dashboard. Define workflows in YAML or let AI design them for you.
 
-[![PyPI](https://img.shields.io/badge/PyPI-v0.31.0-blue?style=flat-square)](https://pypi.org/project/sandcastle-ai/0.31.0/)
+[![PyPI](https://img.shields.io/badge/PyPI-v0.32.0-blue?style=flat-square)](https://pypi.org/project/sandcastle-ai/0.32.0/)
 [![License: BSL 1.1](https://img.shields.io/badge/License-BSL_1.1-blue.svg)](LICENSE)
 [![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
 [![Tests](https://img.shields.io/badge/tests-15000%2B%20passing-brightgreen?style=flat-square)](https://github.com/gizmax/Sandcastle/actions)
@@ -10,14 +10,19 @@
 [![Website](https://img.shields.io/badge/Website-sandcastle--ai.eu-blue?style=flat-square)](https://sandcastle-ai.eu)
 [![Live Demo](https://img.shields.io/badge/Live%20Demo-Dashboard-F59E0B?style=flat-square)](https://gizmax.github.io/Sandcastle/)
 
-> **v0.31 - "Compliance & Connections"** Shipped May 14, 2026. Eighty days to the EU AI Act deadline. We built the answer.
+> **v0.32 - "Claude Agents Deep Integration"** Shipped May 16, 2026. Every Anthropic Managed Agents primitive surfaced, plus the things Anthropic doesn't ship.
 >
-> - **EU AI Act landing + 10 compliance workflow templates** mapped to Articles 9, 11, 12, 14, 25, 49, 50, 73 and Annex IV. DPIA, bias audit, incident report, vendor risk, model card, AI inventory, GDPR DSAR, human oversight log, transparency report, risk register. Bring the templates. Customize the prompts. Hand the auditor the audit trail.
-> - **MCP-first publishing.** `sandcastle publish-mcp` turns every workflow you've built into a first-class tool inside Claude Desktop, Cursor, Windsurf, or any MCP client.
-> - **Eval gates that block bad promotions.** Define a golden dataset, set a minimum score, ship with confidence. Eval-driven development is a query parameter, not a future plan.
-> - **A dashboard that doesn't crash.** Overview split into 20 focused components with per-section error boundaries.
-> - **Codex audit rounds 9 + 10** closed: 5 HIGH + 1 MEDIUM findings fixed - cross-tenant cache, memory, prompts, XSS, SSRF, A2A budgets.
-> - 15,014 tests passing. PyPI: `pip install sandcastle-ai==0.31.0`.
+> - **Memory Stores + Multiagent + Outcomes + Webhooks** wired into the YAML. One workflow can attach versioned memory at /mnt/memory/, spawn 20 parallel specialist agents, define outcomes the eval pipeline reads automatically, and emit lifecycle events to a webhook endpoint.
+> - **Skills Publisher.** `sandcastle publish-skills --upload` converts every workflow into a tar.gz SKILL.md and uploads to `/v1/skills`. Workflows now callable from every Anthropic Skills-aware client.
+> - **Trajectory Replay step type.** SHA-256 over a recorded tool-call sequence + diff against the candidate run. Because the audit trail is a hash chain, the replay is **cryptographically verifiable** - LangSmith and Braintrust don't ship this.
+> - **Agent SDK runtime** as `runtime: "agent-sdk"` alternative. In-process Claude agents for EU sovereignty / air-gapped / regulated teams.
+> - **Computer Use** integration (`computer_20251124` beta) with 8-item safety pre-flight + new `type: computer-use` step.
+> - **MCP Elicitation** (6th primitive, spec rev 2025-11-25). Workflows can ask the user for missing input mid-execution.
+> - **Live Agent Reasoning panel** in the dashboard - SSE stream of agent.thinking, agent.tool_use, agent.message events on the run detail page.
+> - Plus 5 Tier 1 wire fixes (`tools_enabled`, sampling params, `stream`, pricing table, fallback chain).
+> - 15,176 tests passing, 169 new. PyPI: `pip install sandcastle-ai==0.32.0`.
+>
+> Previous: **v0.31 - "Compliance & Connections"** (May 14, 2026): EU AI Act landing + 10 compliance templates, MCP-first publishing, eval gates, dashboard error boundaries, Codex audit rounds 9+10.
 >
 
 <p align="center">
diff --git a/site/eu-ai-act/index.html b/site/eu-ai-act/index.html
index d84f3de..33a5f3c 100644
--- a/site/eu-ai-act/index.html
+++ b/site/eu-ai-act/index.html
@@ -638,7 +638,7 @@ <h2>Stop drafting policy. Start shipping artefacts.</h2>
     </p>
     <div class="container footer-inner">
       <div class="footer-left">
-        <strong>Sandcastle</strong> v0.31 - BSL 1.1 (Apache 2.0 after 2030)<br>
+        <strong>Sandcastle</strong> v0.32 - BSL 1.1 (Apache 2.0 after 2030)<br>
         &copy; 2026 Created by <a href="https://github.com/gizmax" target="_blank" rel="noopener">Tomas Pflanzer</a> @gizmax
       </div>
       <div class="footer-links">
diff --git a/site/index.html b/site/index.html
index 7bf99c5..64b8bd5 100644
--- a/site/index.html
+++ b/site/index.html
@@ -1761,7 +1761,7 @@
   <header class="hero" id="hero">
     <div class="container">
       <div class="hero-badge">
-        v0.31 - Compliance & Connections
+        v0.32 - Claude Agents Deep Integration
       </div>
       <h1>
         Describe what you want. Go home.<br>
@@ -2725,7 +2725,7 @@ <h3>Hush</h3>
   <footer>
     <div class="container footer-inner">
       <div class="footer-left">
-        <strong>Sandcastle</strong> v0.31 - BSL 1.1 (Apache 2.0 after 2030)<br>
+        <strong>Sandcastle</strong> v0.32 - BSL 1.1 (Apache 2.0 after 2030)<br>
         &copy; 2026 Created by <a href="https://github.com/gizmax" target="_blank" rel="noopener">Tomas Pflanzer</a> @gizmax
       </div>
       <div class="footer-links">
diff --git a/site/whatsnew/index.html b/site/whatsnew/index.html
index 8f13f6a..ebbf61c 100644
--- a/site/whatsnew/index.html
+++ b/site/whatsnew/index.html
@@ -4,20 +4,20 @@
   <meta charset="UTF-8">
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   <title>What's New - Sandcastle</title>
-  <meta name="description" content="Sandcastle release history. v0.31: Compliance & Connections. EU AI Act landing + 10 compliance templates, MCP-first publishing, eval gates, dashboard error boundaries. Eighty days to the deadline.">
+  <meta name="description" content="Sandcastle release history. v0.32: Claude Agents Deep Integration. Memory Stores, Multiagent coordinator, Outcomes API, Skills publisher, Agent SDK runtime, Trajectory Replay, Computer Use, MCP Elicitation. Every Anthropic primitive surfaced.">
   <meta name="author" content="Tomas Pflanzer @gizmax">
 
   <!-- Open Graph -->
-  <meta property="og:title" content="What's New - Sandcastle v0.31">
-  <meta property="og:description" content="Compliance & Connections. EU AI Act landing + 10 compliance workflow templates, MCP-first publishing, eval gates, error boundaries. Eighty days to August 2.">
+  <meta property="og:title" content="What's New - Sandcastle v0.32">
+  <meta property="og:description" content="Claude Agents Deep Integration. Every Anthropic Managed Agents primitive surfaced, plus Skills publisher, Trajectory Replay, Computer Use, Agent SDK runtime.">
   <meta property="og:image" content="https://raw.githubusercontent.com/gizmax/Sandcastle/main/site/og-image.jpeg">
   <meta property="og:url" content="https://sandcastle-ai.eu/whatsnew/">
   <meta property="og:type" content="website">
 
   <!-- Twitter Card -->
   <meta name="twitter:card" content="summary_large_image">
-  <meta name="twitter:title" content="What's New - Sandcastle v0.31">
-  <meta name="twitter:description" content="Compliance & Connections. EU AI Act landing + 10 templates, MCP publishing, eval gates. Eighty days to the deadline.">
+  <meta name="twitter:title" content="What's New - Sandcastle v0.32">
+  <meta name="twitter:description" content="Claude Agents Deep Integration. Memory Stores, Multiagent, Outcomes, Skills publisher, Trajectory Replay, Computer Use. 169 new tests.">
   <meta name="twitter:image" content="https://raw.githubusercontent.com/gizmax/Sandcastle/main/site/og-image.jpeg">
 
   <link rel="canonical" href="https://sandcastle-ai.eu/whatsnew/">
@@ -194,11 +194,100 @@ <h1>What's New in <span class="accent">Sandcastle</span></h1>
   <main>
     <div class="container releases">
 
+      <!-- v0.32 -->
+      <div class="release fade-in">
+        <div class="release-header">
+          <span class="release-version">v0.32</span>
+          <span class="release-name">Claude Agents Deep Integration <span class="badge-latest">Latest</span></span>
+          <span class="release-date">May 16, 2026</span>
+        </div>
+
+        <p style="max-width: 720px; color: var(--text-muted); font-size: 15px; line-height: 1.7; margin-bottom: 28px;">
+          You shipped the agent. The client wants the integration. The auditor wants the trail. The user wants you to ask the next question without restarting the whole workflow. v0.32 is the answer to every one of those. Sandcastle now exposes every Anthropic Managed Agents primitive shipped under the managed-agents-2026-04-01 beta umbrella, plus the things Anthropic doesn't ship: cryptographically verifiable trajectory replay, a Skills publisher that turns workflows into uploadable Claude Skills, and an Agent SDK runtime for teams that want in-process execution. Two weeks of work, 169 new tests, one release.
+        </p>
+
+        <div class="feature-grid">
+
+          <div class="feature-card wide">
+            <span class="feature-tag amber">Headline</span>
+            <h4>Every Anthropic primitive surfaced</h4>
+            <p>Memory Stores, Multiagent coordinator, Outcomes API, Webhooks. All four came out from Anthropic under the same beta header in April and May 2026; Sandcastle now wires each one into the YAML and the dashboard so a single workflow can attach versioned memory, spawn 20 parallel specialist agents, define outcomes the eval pipeline reads automatically, and emit lifecycle events to a webhook endpoint.</p>
+            <p>Add three lines of YAML, get a multi-agent system that meets your eval gate.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag blue">New</span>
+            <h4>Skills Publisher</h4>
+            <p><code>sandcastle publish-skills --upload</code> converts every workflow into a tar.gz SKILL.md package with strict frontmatter and uploads to <code>/v1/skills</code>. Your workflows are now callable from every Anthropic Skills-aware client. Pair with v0.31 MCP-first publishing and Sandcastle covers both major distribution channels.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag green">New</span>
+            <h4>Trajectory Replay</h4>
+            <p>New <code>type: trajectory-replay</code> step computes SHA-256 over a recorded tool-call sequence and diffs against a candidate run. Sandcastle's audit trail is a hash chain, so the replay is cryptographically verifiable - a property neither LangSmith nor Braintrust ships. EU AI Act auditors love this.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag blue">New</span>
+            <h4>Agent SDK runtime</h4>
+            <p><code>runtime: "agent-sdk"</code> swaps Managed Agents for in-process Claude Agent SDK. For EU sovereignty, air-gapped, regulated teams who can't send session state to Anthropic's hosted infra. Lazy-imports the SDK; graceful error when not installed.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag amber">Polish</span>
+            <h4>Live Agent Reasoning panel</h4>
+            <p>Run detail page now subscribes to an SSE event stream and shows agent.thinking, agent.tool_use, and agent.message events as they fire. Thread-grouped, collapsible. The dashboard finally shows what the agent is actually doing.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag purple">Fix</span>
+            <h4>Five wire fixes (table stakes)</h4>
+            <p>tools_enabled finally reaches the API (was parsed but ignored). temperature, max_tokens, thinking_budget on ManagedAgentConfig. stream config field actually used. Pricing table mapping every Claude 4.x model. Fallback chains of up to 5 templates.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag green">New</span>
+            <h4>Tool Search + examples convention</h4>
+            <p>Mark tools with <code>defer_loading: true</code> and 1-5 invocation examples per tool. Anthropic's own measurements: tool-selection accuracy 49% to 74%, usable context up 85%, parameter accuracy 72% to 90%. New docs/tool-examples-convention.md.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag blue">New</span>
+            <h4>Computer Use + MCP Elicitation</h4>
+            <p>New <code>type: computer-use</code> step type with the <code>computer_20251124</code> beta + an 8-item safety pre-flight (prompt-injection guard, page-load deadline, screenshot bounds). Plus the 6th MCP primitive (Elicitation, spec rev 2025-11-25) lets workflows ask the user for a missing input mid-run.</p>
+          </div>
+
+          <div class="feature-card">
+            <span class="feature-tag purple">Milestone</span>
+            <h4>By the numbers</h4>
+            <div class="stat-row">
+              <div class="stat-item">
+                <div class="stat-num">15,176</div>
+                <div class="stat-label">Tests passing</div>
+              </div>
+              <div class="stat-item">
+                <div class="stat-num">169</div>
+                <div class="stat-label">New v0.32 tests</div>
+              </div>
+              <div class="stat-item">
+                <div class="stat-num">9</div>
+                <div class="stat-label">New modules</div>
+              </div>
+              <div class="stat-item">
+                <div class="stat-num">24</div>
+                <div class="stat-label">Step types</div>
+              </div>
+            </div>
+          </div>
+
+        </div>
+      </div>
+
       <!-- v0.31 -->
       <div class="release fade-in">
         <div class="release-header">
           <span class="release-version">v0.31</span>
-          <span class="release-name">Compliance & Connections <span class="badge-latest">Latest</span></span>
+          <span class="release-name">Compliance & Connections</span>
           <span class="release-date">May 14, 2026</span>
         </div>
 
diff --git a/src/sandcastle/__init__.py b/src/sandcastle/__init__.py
index 30db4a7..5b6a75a 100644
--- a/src/sandcastle/__init__.py
+++ b/src/sandcastle/__init__.py
@@ -1,6 +1,6 @@
 """Sandcastle - Production-ready workflow orchestrator for AI agents."""
 
-__version__ = "0.31.0"
+__version__ = "0.32.0"
 
 from sandcastle.sdk import AsyncSandcastleClient, SandcastleClient