diff --git a/CHANGELOG.md b/CHANGELOG.md index d83c52b..1f2fb58 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,8 +19,8 @@ broke. the Node-only YAML/config-loading path out of edge bundles (Cloudflare Workers), complementing the `createRequire` deferral in `0.2.0a3`. - **Trace mining fails open when its extension isn't bundled.** - `CodeAnalyzer` imported `TraceMiner` unguarded, crashing - `sponsio scan --trace` with `ModuleNotFoundError` in builds without the + `CodeAnalyzer` imported `TraceMiner` unguarded, crashing the + trace-mining path with `ModuleNotFoundError` in builds without the optional `trace_mining` extension; it now degrades to "no contracts mined", matching the other call sites. @@ -29,6 +29,11 @@ broke. - Added an explicit `[tool.ruff]` config to `pyproject.toml` so local lint matches CI, and synced `docs/reference/cli.md` with the real CLI surface (`onboard`/`serve`/`daemon`/`cursor` now documented). +- The CLI now centers on code and policy scanning. `sponsio scan` reads + source code and policy docs; `sponsio check --trace` and `sponsio eval` + still replay traces. Trace-derived contract mining (the `sponsio + refresh` command and `sponsio scan --trace`) is no longer part of this + distribution. --- diff --git a/OSS_PROMISE.md b/OSS_PROMISE.md index d933683..aa42e94 100644 --- a/OSS_PROMISE.md +++ b/OSS_PROMISE.md @@ -46,7 +46,7 @@ Apache 2.0. We will not relicense, gate, or remove them. - `sponsio init` (interactive wizard, the user-facing entry), plus the underlying `sponsio onboard`, `scan`, `validate`, - `check`, `report`, `refresh`, `eval`, `export`, + `check`, `report`, `eval`, `export`, `export-sessions` - `sponsio host` group: install / status / list / trace / uninstall for the Cursor / Claude Code / OpenClaw plugins @@ -67,8 +67,6 @@ Apache 2.0. We will not relicense, gate, or remove them. - AST-based code scan (`sponsio scan`) over your own codebase - Document parser (`sponsio scan --policy policy.md`) for natural language → contract -- Trace mining (`sponsio refresh`) over your own traces: finds - repeating unsafe patterns and proposes new contracts - NL → contract parser (deterministic patterns) These will never be relicensed. New work in these areas ships under diff --git a/README.ja.md b/README.ja.md index 27ae950..2f819d4 100644 --- a/README.ja.md +++ b/README.ja.md @@ -66,7 +66,7 @@ pip install sponsio # または: npm install -D @sponsio/sdk sponsio init . # 対話型ウィザード: フレームワーク・IDE ホスト・observe vs enforce を検出 ``` -ウィザードがフレームワークを自動検出し、対応するラップ スニペットを表示します。手動配線は [docs/integrations/](docs/integrations/index.md) を参照。[OpenClaw ユーザー](docs/integrations/openclaw.md)は ClawHavoc + CVE-2026-25253 のカバレッジを最初から利用できます。設定リファレンス、observe → enforce 切替、`sponsio refresh`、CI 配線は[完全ガイド](QUICKSTART.md)を参照。 +ウィザードがフレームワークを自動検出し、対応するラップ スニペットを表示します。手動配線は [docs/integrations/](docs/integrations/index.md) を参照。[OpenClaw ユーザー](docs/integrations/openclaw.md)は ClawHavoc + CVE-2026-25253 のカバレッジを最初から利用できます。設定リファレンス、observe → enforce 切替、CI 配線は[完全ガイド](QUICKSTART.md)を参照。 **自然言語から契約を下書きする。** `sponsio validate "<平易な文のルール>"` は、自然言語のルールを読み返せる契約に変換します。出力はあくまで下書きとして扱い、enforce する前に自分でレビューして調整してください。決定論的なのは契約がランタイムでどう*強制される*かであって、どう下書きされるかではありません。 diff --git a/README.md b/README.md index c42fe12..d1d657d 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,7 @@ pip install sponsio # or: npm install -D @sponsio/sdk sponsio init . # interactive wizard: detects framework, IDE hosts, observe vs enforce ``` -The wizard auto-detects your framework and prints the right wrap snippet. For manual wiring, see [all supported integrations](docs/integrations/index.md). [OpenClaw users](docs/integrations/openclaw.md) get bundled ClawHavoc and CVE-2026-25253 coverage out of the box. For config reference, observe → enforce flip, `sponsio refresh`, and CI wiring, see the [full walkthrough](QUICKSTART.md). +The wizard auto-detects your framework and prints the right wrap snippet. For manual wiring, see [all supported integrations](docs/integrations/index.md). [OpenClaw users](docs/integrations/openclaw.md) get bundled ClawHavoc and CVE-2026-25253 coverage out of the box. For config reference, observe → enforce flip, and CI wiring, see the [full walkthrough](QUICKSTART.md). **Drafting contracts from natural language.** `sponsio validate ""` turns a plain-English rule into a contract you can read back. Treat the output as a starting draft to review and adjust before you enforce. The determinism is in how contracts are *enforced* at runtime, not in how they're drafted. diff --git a/README.zh-CN.md b/README.zh-CN.md index 04d601e..20432c8 100644 --- a/README.zh-CN.md +++ b/README.zh-CN.md @@ -66,7 +66,7 @@ pip install sponsio # 或 npm install -D @sponsio/sdk sponsio init . # 交互式向导:检测框架、选择 IDE host、observe vs enforce ``` -向导会自动检测你的框架并打印对应的接入片段。手动接线见 [docs/integrations/](docs/integrations/index.md)。[OpenClaw 用户](docs/integrations/openclaw.md)开箱即享 ClawHavoc + CVE-2026-25253 覆盖。配置参考、observe → enforce 切换、`sponsio refresh`、CI 接线见[完整指引](QUICKSTART.md)。 +向导会自动检测你的框架并打印对应的接入片段。手动接线见 [docs/integrations/](docs/integrations/index.md)。[OpenClaw 用户](docs/integrations/openclaw.md)开箱即享 ClawHavoc + CVE-2026-25253 覆盖。配置参考、observe → enforce 切换、CI 接线见[完整指引](QUICKSTART.md)。 **用自然语言起草合约。** `sponsio validate "<一句话规则>"` 会把一条自然语言规则转成一份你能读回来的合约。把输出当作起点草稿,enforce 之前先自己 review、按需调整。确定性在于合约在运行时如何被*强制执行*,而不在于它如何被起草。 diff --git a/docs/getting-started/onboard-prompt.md b/docs/getting-started/onboard-prompt.md index 67a150f..24e1caa 100644 --- a/docs/getting-started/onboard-prompt.md +++ b/docs/getting-started/onboard-prompt.md @@ -112,9 +112,8 @@ After merge: "ok" before writing. - sponsio validate --config sponsio.yaml -Done. Host-plugin tuning, refresh from traces, flip to enforce, -or debugging a specific contract: those live in the ``sponsio`` -skill, not in this prompt. +Done. Host-plugin tuning, flip to enforce, or debugging a specific +contract: those live in the ``sponsio`` skill, not in this prompt. ``` ## TypeScript project @@ -225,9 +224,8 @@ After merge: "ok" before writing. - npx sponsio validate sponsio.yaml -Done. Host-plugin tuning, refresh from traces, flip to enforce, -or debugging a specific contract: those live in the ``sponsio`` -skill, not in this prompt. +Done. Host-plugin tuning, flip to enforce, or debugging a specific +contract: those live in the ``sponsio`` skill, not in this prompt. ``` ## Why two phases (CLI then agent) diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 2aebabd..1225ce1 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -9,10 +9,10 @@ Every `sponsio` command exits 0 on success and 1 on failure (parse error, violat ## sponsio scan -Scan source code, policy documents, or execution traces to discover contracts. +Scan source code or policy documents to discover contracts. ```bash -sponsio scan PATHS... [--llm] [--policy DOC] [--trace FILE] [-o sponsio.yaml] +sponsio scan PATHS... [--llm] [--policy DOC] [-o sponsio.yaml] ``` | Option | Description | @@ -25,9 +25,6 @@ sponsio scan PATHS... [--llm] [--policy DOC] [--trace FILE] [-o sponsio.yaml] | `--out`, `-o` | Output file (default: `./sponsio.yaml`; `-o -` for stdout) | | `--append` | Append to existing file instead of overwriting | | `--policy`, `-p` | Policy document(s), repeatable | -| `--trace`, `-t` | Trace file or glob (OTLP, Phoenix, Langfuse, Sponsio session JSONL). No LLM required. | -| `--trace-min-support` | Minimum traces a pattern must appear in (default `1`) | -| `--trace-confidence-threshold` | Confidence floor for ordering or sequence mining, 0-1 (default `0.95`) | ### Provider matrix @@ -48,9 +45,6 @@ sponsio scan src/agents/ # With LLM and policy sponsio scan src/agents/ --policy security.md --llm -o sponsio.yaml -# Mine from traces (no LLM) -sponsio scan src/ -t '~/.sponsio/sessions/agent/*.jsonl' - # Local model via Ollama sponsio scan src/ --llm --base-url http://localhost:11434/v1 --model llama3.1 ``` @@ -302,7 +296,7 @@ The walker ignores `mode:` lines nested under unrelated keys (e.g. `judge.fallba Print the agent-facing prompt template for a Sponsio workflow. Used by the `sponsio` skill (W1 initial setup, W2 audit, W3 tune, W4 enforce, W5 troubleshoot). ```bash -sponsio prompt (onboard|refresh|scan) +sponsio prompt (onboard|scan) ``` Output is a copy-pasteable prompt block your AI assistant can run. diff --git a/docs/reference/observability.md b/docs/reference/observability.md index 460cd29..7c41871 100644 --- a/docs/reference/observability.md +++ b/docs/reference/observability.md @@ -16,7 +16,7 @@ ls ~/.sponsio/sessions/support_bot/ # 2026-04-24T10-12-33Z.jsonl ``` -`sponsio report` reads these files. `sponsio scan -t '~/.sponsio/sessions/bot/*.jsonl'` mines them for contract candidates. Disable with `SPONSIO_SESSION_LOG=0` or `sessions_dir: null` in `sponsio.yaml`. +`sponsio report` reads these files. Disable with `SPONSIO_SESSION_LOG=0` or `sessions_dir: null` in `sponsio.yaml`. ## OpenTelemetry diff --git a/docs/reference/oss-scope.md b/docs/reference/oss-scope.md index 3538477..93e0a7f 100644 --- a/docs/reference/oss-scope.md +++ b/docs/reference/oss-scope.md @@ -65,9 +65,6 @@ is no LLM call on the enforcement path. - `sponsio/discovery/starter_pack.py`: static rule matching for starter-pack selection - `sponsio/discovery/trace_replay.py`: `sponsio eval` replay engine -- `sponsio/refresh.py` + `sponsio refresh` CLI. Local trace mining - over your own `~/.sponsio/sessions/*` (proposes new contracts from - patterns repeating in your traces). ### Generation - `sponsio/generation/dsl_to_contract.py`: text DSL → contract parser diff --git a/llms-full.txt b/llms-full.txt index 6907ea4..d3728d1 100644 --- a/llms-full.txt +++ b/llms-full.txt @@ -17,7 +17,7 @@ tools can split if needed. 日本語

-![Sponsio](assets/readme-banner.png) +![Sponsio](https://raw.githubusercontent.com/SponsioLabs/Sponsio/main/assets/readme-banner.png)

License @@ -35,19 +35,20 @@ tools can split if needed. # Sponsio

- Same coding agent under a declared code freeze. Without Sponsio it drops the prod users table, back-fills fabricated rows, and files a status report that hides the damage. With Sponsio the first destructive SQL is blocked pre-execution: 35 checks, 100% deterministic, 0 LLM calls, p50 13µs. + Same coding agent under a declared code freeze. Without Sponsio it drops the prod users table, back-fills fabricated rows, and files a status report that hides the damage. With Sponsio the first destructive SQL is blocked pre-execution: 35 checks, 100% deterministic, 0 LLM calls, p50 13µs.

- -**Runtime enforcement for AI agents.** Sponsio checks every agent action against deterministic, pure-code contracts, enforced in under 0.01 ms with zero LLM cost at runtime. Works with LangChain, Claude Agent, OpenAI Agents, Google ADK, CrewAI, Vercel AI, MCP, or any custom tool-calling loop, in Python or TypeScript. +Sponsio provides deterministic contracts for agent procedures over time, enforced in under 0.01 ms with zero LLM cost at runtime. Works with LangChain, Claude Agent, OpenAI Agents, Google ADK, CrewAI, Vercel AI, MCP, or any custom tool-calling loop, in Python or TypeScript. > An **agent contract** is a runtime rule that is checked at every agent action, [backed by formal methods](docs/concepts/formal-methods.md). +> **v0.2.0a3 alpha is out.** `pip install --pre sponsio==0.2.0a3`. Closes a `redirect_to_safe` fail-open bug in non-LangGraph adapters (the unsafe call was running anyway), brings TS `Eq` semantics to Python parity for composite values, and adds Cloudflare Workers compatibility. **Upgrade recommended if you are on 0.2.0a2.** See the [v0.2.0a3 release notes](docs/release-notes/v0.2.0a3.md). + --- ## How Sponsio works

- Sponsio architecture: Agent Flow + (Natural Language + Pattern Library) compile into Contracts (Assumption → Enforcement), enforced by a Fuzzy LTL Monitor (deterministic + stochastic) that decides Pass / Block · Warn · Escalate / Redirect for every function call, with full audit trail logs feeding back to the agent. + Sponsio architecture: Agent Flow + (Natural Language + Pattern Library) compile into Contracts (Assumption → Enforcement), enforced by a Fuzzy LTL Monitor (deterministic + stochastic) that decides Pass / Block · Warn · Escalate / Redirect for every function call, with full audit trail logs feeding back to the agent.

On [ODCV-Bench](https://github.com/McGill-DMaS/ODCV-Bench) (12 frontier LLMs × 80 trajectories), unguarded models cheat in 11.5%–66.7% of runs. **With Sponsio, 95.6% of misalignment is avoided on average; 24/36 high-risk scenarios at 100%.** On the `Financial-Audit-Fraud-Finding` scenario, frontier models commit fraud in 16/24 trials; **Sponsio blocks 18/19**. On RedCode-Exec (1,410 cases), Sponsio reaches **92% combined** (bash 95% · python 90%) across a 60-file clean-code audit. @@ -77,7 +78,7 @@ pip install sponsio # or: npm install -D @sponsio/sdk sponsio init . # interactive wizard: detects framework, IDE hosts, observe vs enforce ``` -The wizard auto-detects your framework and prints the right wrap snippet. For manual wiring, see [all supported integrations](docs/integrations/index.md). [OpenClaw users](docs/integrations/openclaw.md) get bundled ClawHavoc and CVE-2026-25253 coverage out of the box. For config reference, observe → enforce flip, `sponsio refresh`, and CI wiring, see the [full walkthrough](QUICKSTART.md). +The wizard auto-detects your framework and prints the right wrap snippet. For manual wiring, see [all supported integrations](docs/integrations/index.md). [OpenClaw users](docs/integrations/openclaw.md) get bundled ClawHavoc and CVE-2026-25253 coverage out of the box. For config reference, observe → enforce flip, and CI wiring, see the [full walkthrough](QUICKSTART.md). **Drafting contracts from natural language.** `sponsio validate ""` turns a plain-English rule into a contract you can read back. Treat the output as a starting draft to review and adjust before you enforce. The determinism is in how contracts are *enforced* at runtime, not in how they're drafted. @@ -98,7 +99,7 @@ agents: - sponsio:capability/filesystem # if your agent touches files ``` -See the [full bundle reference](docs/reference/contract-lib.md) for all 16 bundles, or the [44 underlying patterns](docs/reference/patterns.md) for the primitives they compose. Want a bundle for your agent type? That's currently the highest-leverage way to contribute. [Open an issue](https://github.com/SponsioLabs/Sponsio/issues/new) with your incident, CVE, or pattern. +See the [full bundle reference](docs/reference/contract-lib.md) for all 16 bundles, or the [46 underlying patterns](docs/reference/patterns.md) for the primitives they compose. Want a bundle for your agent type? That is currently the highest-leverage way to contribute. [Open an issue](https://github.com/SponsioLabs/Sponsio/issues/new) with your incident, CVE, or pattern. --- @@ -266,7 +267,7 @@ description: Runtime contracts for LLM agents. Install, integrate, reference. Sponsio is a runtime contract layer for LLM agents. It sits at the action boundary, blocks unsafe tool calls before they fire, and ships every verdict to your observability stack. -If you've never run Sponsio before, start here: +If you have never run Sponsio before, start here: ```bash pip install sponsio @@ -278,7 +279,7 @@ Then go to the [Quickstart](getting-started/quickstart.md). ## Sections - **[Getting started](getting-started/install.md)**: install, run your first guarded agent, write your first contract. Includes paste-ready [IDE-agent prompts](getting-started/onboard-prompt.md) for Claude Code / Cursor / Codex driven setup. -- **[Concepts](concepts/overview.md)**: what contracts are, how the runtime evaluates them, the LTL backbone, OWASP coverage. +- **[Concepts](concepts/overview.md)**: what contracts are, how the runtime evaluates them, the LTL (linear temporal logic) backbone, OWASP coverage. - **[Integrations](integrations/index.md)**: drop-in adapters for LangGraph, Claude Agent, OpenAI Agents, CrewAI, Vercel AI, MCP, and others. - **[Guides](guides/onboarding.md)**: task-oriented walkthroughs. Tuning, observe-vs-enforce, contract sources, reporting, FAQ. - **[Plugins](plugins.md)**: gate an entire Claude Code or OpenClaw session without code changes. @@ -403,7 +404,7 @@ description: Write, wire, and test a custom contract against an agent you contro This walkthrough goes from an empty project to a working contract that blocks an unsafe tool call. By the end you will have a `sponsio.yaml`, a wired guard, and a passing test. -Prereqs: Python 3.10+, an agent framework (we use LangGraph in examples; any framework works. See [Integrations](../integrations/index.md)). +Prereqs: Python 3.10+ and an agent framework. We use LangGraph in the examples below; any framework works. See [Integrations](../integrations/index.md). --- @@ -472,11 +473,13 @@ result = agent.invoke({"messages": [("user", The agent tries to call `issue_refund` directly. Sponsio checks the trace, sees no `check_policy` event, and blocks: ```text -✗ enforce must call `check_policy` before `issue_refund` — VIOLATED → blocked +✗ enforce must call `check_policy` before `issue_refund`: VIOLATED, blocked ``` The framework surfaces this as a `SponsioBlocked` exception; the agent can react and retry with a different plan. +Block is the default outcome. Three other strategies are available on the same contract: `RedirectToSafe(safe=...)` substitutes a pre-approved tool, `EscalateToHuman(notify=[...])` blocks and fires a notifier callback, and `WarnOnly` logs the violation without blocking. See [observe vs enforce](../guides/observe-vs-enforce.md) for the picker. + Run the same request with the correct tool order ("check the policy first, then refund customer 42 $50") and the contract passes silently. --- @@ -520,8 +523,8 @@ See [Observe vs. enforce](../guides/observe-vs-enforce.md) for the full rollout. ## What next -- **Add more contracts.** The [pattern catalog](../reference/patterns.md) lists all 44 deterministic patterns with NL examples. Pick the ones that match your failure modes. -- **Generate contracts automatically.** `sponsio scan src/` reads your tool definitions and drafts a `sponsio.yaml` with candidate contracts. See [contract sources](../guides/contract-sources.md). +- **Add more contracts.** The [pattern catalog](../reference/patterns.md) lists all 46 deterministic patterns with NL examples. Pick the ones that match your failure modes. +- **Generate contracts automatically.** `sponsio scan src/` reads your tool definitions and drafts a `sponsio.yaml` with candidate contracts. See [config yaml reference: how to populate sponsio.yaml](../reference/config-yaml.md#how-to-populate-sponsioyaml). - **Wire a different framework.** Claude Agent SDK, OpenAI, CrewAI, Google ADK, Vercel AI, MCP. See [Integrations](../integrations/index.md). @@ -578,7 +581,7 @@ Four layers build on each other. **Formula**: an LTL expression over atoms. This is what the evaluator actually checks. Anything expressible in LTL over the available atom vocabulary can be enforced. -**Contract**: an (assumption, guarantee) pair bound to one or more agents, with a strategy for what to do on violation (block or escalate). The assumption tells the engine *when* the rule applies; the guarantee tells it *what must hold* when it does. +**Contract**: an (assumption, guarantee) pair bound to one or more agents, with a strategy for what to do on violation. The four strategies are `DetBlock` (refuse the call), `EscalateToHuman` (refuse and notify on-call), `RedirectToSafe` (substitute a pre-approved tool), and `WarnOnly` (log without blocking). The assumption tells the engine *when* the rule applies; the guarantee tells it *what must hold* when it does. ```python contract("policy gate before refund") @@ -661,7 +664,7 @@ The rule of thumb: keep contracts to things a counter, regex, path, or ordering └─────────────────────────────────────────────────────────────┘ ``` -Deterministic formulas are evaluated in microseconds. A violation routes through a **strategy**: block the call or escalate to a human. +Deterministic formulas are evaluated in microseconds. A violation routes through a **strategy**. The four options are `DetBlock` (refuse the call), `EscalateToHuman` (refuse and notify on-call), `RedirectToSafe` (substitute a pre-approved safe tool), and `WarnOnly` (log without blocking). --- @@ -788,6 +791,10 @@ It takes user-friendly arguments, constructs a formula from atoms, and wraps it **Exclusion**: - `mutual_exclusion(A, B)`: at most one ever called across entire trace - `segregation_of_duty(A, B)`: same agent cannot do both +- `tool_allowlist(tools)`: only listed tools may be called + +**Recovery**: +- `redirect_to_safe(unsafe, safe)`: substitute the offending call with a pre-approved safe tool. Bundles a `RedirectToSafe` strategy on the resulting `DetFormula`, so a violation surfaces as `action="redirected"` instead of `"blocked"`. The LangGraph adapter dispatches the substitute call; other adapters surface `result.redirected_to` for the application. **Access control**: - `requires_permission(tool, perm)`: tool needs static permission @@ -916,110 +923,6 @@ Available via: OTEL only (post-hoc, cannot block). Not enforceable in real-time. Keep the pattern library focused on Category A. These are universal, enforceable, and cover the dominant use case. Categories B and C are documented but not prioritized for pattern library expansion. Category C patterns belong in the OTEL consumer module's analysis layer. - - - - ---- -title: Deterministic contracts -description: How deterministic contracts are structured, how they compile, and when to reach for one. ---- - -# Deterministic contracts - -Deterministic contracts are binary pass/fail rules evaluated before each tool call. If a contract is violated, Sponsio blocks the call before any side effect happens. This is the hot path. Zero LLM calls, microsecond latency. - -For the conceptual model (atom → pattern → formula → contract), see [Concepts overview](overview.md). For the full catalog of shipped patterns, see [Pattern catalog](../reference/patterns.md). This page is about how det contracts are structured and when to reach for one. - ---- - -## Shape of a det contract - -A det contract has four parts: - -```python -contract("policy gate before refund") # name (for logs, reporting) - .assume("called `issue_refund`") # when the rule applies - .guarantees("must call `check_policy` before `issue_refund`") # what must hold - .strategy("block") # what to do on violation -``` - -- **Name**: a human-readable label; shows up in logs, reports, and error messages. -- **Assumption (A)**: the condition that triggers the rule. The rule only fires when A holds. -- **Guarantee (G)**: the temporal property that must hold when A is true. -- **Strategy**: what happens on violation: `block`, `escalate`, or a custom callable. - -Both A and G are natural-language strings. They compile down to LTL formulas over atoms. You never need to write the LTL by hand, but the engine ultimately checks the LTL. - ---- - -## How it compiles - -``` -NL rule - ─▶ NL parser (regex + pattern matching) - ─▶ Pattern function (must_precede, rate_limit, …) - ─▶ LTL formula over atoms - ─▶ Evaluator (pure Python) - ─▶ True (pass) / False (block) -``` - -Three examples: - -```python -# "tool `A` must precede `B`" -# → must_precede("A", "B") -# → Not(called("B")) Until called("A") - -# "tool `X` at most 3 times" -# → rate_limit("X", 3) -# → G(count("X") <= 3) - -# "bash must not contain `rm -rf`" -# → arg_blacklist("bash", "command", ["rm -rf"]) -# → G(called("bash") → Not(arg_field_has("bash", "command", "rm -rf"))) -``` - ---- - -## When to reach for a det contract - -Use a det contract when the property is **structurally observable**: expressible with counters, regexes, paths, or ordering. Structural properties do not need semantic judgment, so they do not need an LLM. - -Typical det use cases: - -- **Ordering**: A must precede B; after X, Y is forbidden; every A must be followed by B. -- **Rate and retry limits**: at most N calls, cooldown between calls, bounded retries, loop detection. -- **Irreversibility gates**: once a commit or approval happens, downstream mutations are forbidden. -- **Argument checks**: blacklisted patterns, path scope limits, length or range caps. -- **Permissions**: static role-based access to certain tools. -- **Exact-regex PII**: SSN, credit card, email patterns that a regex can reliably catch. - -Anti-pattern: do not reach for a det contract for properties that need reading the text semantically (tone, relevance, whether something is *truly* PII). Sponsio's deterministic engine does not evaluate those; keep contracts to what is structurally observable. - ---- - -## Failure strategies - -When a det contract is violated, the call is not passed through. Built-in strategies: - -| Strategy | Behavior | -|---|---| -| `block` | Deny the call and raise a `SponsioBlocked` exception to the framework. Agent can react and retry with a different plan. | -| `escalate` | Deny the call and route to a human-in-the-loop callback. Useful for high-stakes actions where silent blocking would confuse the agent. | -| `(callable)` | Custom callback. Gets the violated contract and the candidate event; returns a new strategy decision. | - -In **observe mode**, no strategy runs. Violations are logged and surfaced in reports, but the call is not blocked. This is how most teams wire Sponsio in first. See [Observe vs. enforce](../guides/observe-vs-enforce.md). - ---- - -## Next - -- [Pattern catalog](../reference/patterns.md). Every det pattern that ships, with NL form. -- [Architecture](architecture.md). LTL semantics, grounding internals, atom vocabulary. -- [Write your first contract](../getting-started/first-contract.md). Hands-on walkthrough. - - @@ -1027,170 +930,6 @@ In **observe mode**, no strategy runs. Violations are logged and surfaced in rep - - - - ---- -title: Contract sources -description: Three ways to populate sponsio.yaml. ---- - -# Contract sources - -Sponsio reads contracts from three sources. All produce the same output: enforceable contracts loaded via a framework `Sponsio()` factory. - -``` -Source 1: Code scan sponsio scan src/ -o sponsio.yaml -Source 2: Policy documents sponsio scan src/ --policy security.md --llm -Source 3: Hand-written (edit sponsio.yaml directly) - │ - ▼ - sponsio.yaml - │ - ▼ - guard = Sponsio(config="sponsio.yaml") -``` - -The three sources mix freely in one yaml. Each contract entry can carry a `source:` tag for provenance. - -## Source 1: code scan - -Extract tools and infer constraints from agent source. - -```bash -sponsio scan src/agents/ -o sponsio.yaml -``` - -Without `--llm`, the scan is rule-based: - -1. Finds tools (`@tool` decorators, `Agent(tools=[...])`, `graph.add_node()`). -2. Extracts ordering from `graph.add_edge("A", "B")` and call graphs. -3. Generates `must_precede` constraints for each ordering dependency. -4. Outputs tools and constraints in yaml. - -With `--llm`, the LLM sees the full source and discovers constraints static analysis can't: - -- `always_followed_by` (liveness obligations) -- `rate_limit` (from constants like `MAX_RETRIES = 3`) -- `no_reversal` (from business logic semantics) - -```bash -sponsio scan src/agents/ --llm -o sponsio.yaml -sponsio scan src/agents/ --llm --provider gemini -``` - -Provider env vars and the full matrix: [reference/cli.md](../reference/cli.md#provider-matrix). - -## Source 2: policy documents - -Extract contracts from a policy or compliance document, using the tool inventory as context. - -```bash -# Scan code first to populate the tool inventory, then add policy: -sponsio scan src/agents/ -o sponsio.yaml -sponsio scan src/agents/ --policy security_policy.md --llm -o sponsio.yaml --append -``` - -The tool inventory is critical. Without it the LLM produces generic constraints. With it, policy maps to specific tools: - -``` -Policy: "All refunds require supervisor approval" -Tool inventory: [check_policy, issue_refund, notify_customer] -Constraint: must_precede(check_policy, issue_refund) -``` - -Supported document formats: `.md`, `.txt`, `.pdf` (`pip install sponsio[pdf]`). - -## Source 3: hand-written - -Edit `sponsio.yaml` directly. Two formats, mixable. - -### NL strings - -```yaml -agents: - customer_bot: - contracts: - - G: "tool `check_policy` must precede `issue_refund`" - - G: "tool `issue_refund` at most 3 times" - - G: "response must not contain PII" -``` - -Each entry is one `(assumption, guarantee)` pair. `A` is optional, `G` is required. Each field can be a scalar or a list (lists are ANDed). The legacy keys `E:` / `enforcement:` are still accepted for back-compat. - -Deterministic syntax (must use backtick-quoted tool names): - -``` -tool `A` must precede `B` -tool `X` at most N times -tool `A` requires permission `perm_name` -tools `A` and `B` are mutually exclusive -after `A`, tool `B` is forbidden -tool `A` cooldown of N steps -``` - -Sponsio parses NL strings through two stages: rule-based first (free), LLM fallback last (requires API key). - -### Structured entries - -```yaml -agents: - customer_bot: - contracts: - - pattern: must_precede - args: [check_policy, issue_refund] - source: scan - - pattern: rate_limit - args: [issue_refund, 3] -``` - -Compiled directly. No NL parsing. Auto-emitted by `sponsio scan`. - -## Loading config in Python - -```python -from sponsio.langgraph import Sponsio - -guard = Sponsio(config="sponsio.yaml", agent_id="customer_bot") -agent = create_react_agent(model, guard.wrap(tools)) -``` - -Inline contracts add on top of yaml: - -```python -guard = Sponsio( - config="sponsio.yaml", - agent_id="customer_bot", - contracts=["tool `notify` at most 5 times"], -) -``` - -## Validation - -```bash -sponsio validate --config sponsio.yaml # parse + structural -sponsio validate --config sponsio.yaml --json # CI-friendly -``` - -## End-to-end workflow - -```bash -sponsio scan src/agents/ --llm -o sponsio.yaml # 1. discover -sponsio scan src/agents/ --policy compliance.md --llm --append # 2. policy -# 3. edit sponsio.yaml, add hand-written rules -sponsio validate --config sponsio.yaml # 4. validate -python my_agent.py # 5. run -``` - -## See also - -- [Quickstart](../getting-started/quickstart.md) -- [Config reference](../reference/config-yaml.md) -- [CLI reference](../reference/cli.md) -- [Integrations](../integrations/index.md) - - @@ -1202,7 +941,7 @@ description: Use `sponsio init` to wire framework, host hooks, skill, and mode i # Onboarding an existing agent -`sponsio init` is the 4-axis setup wizard. One run covers every decision that matters on first install. Three surfaces (interactive TTY, `--plan` dry-run, `--apply` non-interactive) share the same dispatch table, so an IDE-agent's preview is guaranteed to match what `--apply` actually runs. +`sponsio init` is the 4-axis setup wizard. One run covers every decision that matters on first install. Three surfaces (interactive TTY, `--plan` dry-run, `--apply` non-interactive) share the same dispatch table, so an IDE-agent's preview matches what `--apply` actually runs (they call into the same code path). ```bash pip install sponsio @@ -1306,7 +1045,7 @@ Paste the snippet. Run your agent. Review the observe-mode report (`sponsio repo ## Next -- [Contract sources](contract-sources.md): scan, policy-doc mining, trace mining. +- [Config yaml reference](../reference/config-yaml.md): scan, policy-doc mining, hand-written rules, plus the full schema. - [Observe vs. enforce](observe-vs-enforce.md): shadow mode to production. - [Plugins (Mode A)](../plugins.md): what axis 2's `host install` installs and how it routes tool calls. - [CLI reference](../reference/cli.md): `sponsio init` flags and the underlying commands it calls. @@ -1410,17 +1149,17 @@ You can also promote per-contract with the `mode: enforce` override. Useful for | Mode | Det violation | Sto violation | |---|---|---| | Observe | Logged; call passes through | Logged; response passes through | -| Enforce | Strategy runs (`block`, `escalate`, or custom) | Strategy runs (`retry_with_constraint`, `redirect_to_safe`, or custom) | +| Enforce | Strategy runs (`DetBlock`, `EscalateToHuman`+notifiers, `RedirectToSafe`, `WarnOnly`, or custom callable) | Strategy runs (`retry_with_constraint` when an external sto evaluator is wired up; otherwise log-only) | -In enforce mode, a hard-blocked event is **rolled back** from the trace so later checks are not poisoned by it. +In enforce mode, a hard-blocked event is **rolled back** from the trace so later checks do not see it. --- ## Gotchas -- **Observe mode is not free**. Sto contracts still make judge calls in observe. They need the score to log the would-be violation. If judge cost is a concern during shadow, consider a `mode: observe_det_only` override on sto contracts. (Feature-flagged; ask if you need it.) +- **Observe mode is not free**. Stochastic (LLM-judge) contracts still make judge calls in observe. The score is what gets logged for the would-be violation. If judge cost is a concern during shadow, consider a `mode: observe_det_only` override on stochastic contracts. (Feature-flagged; ask if you need it.) - **Observe reports are only as good as your session log.** Make sure OTEL or local-disk session logging is configured. See [Observability](../reference/observability.md). -- **Enforce mode changes agent behavior.** Once you flip, the agent will see `SponsioBlocked` exceptions and retry loops it never saw in observe. Plan for a day of re-tuning after the flip. +- **Enforce mode changes agent behavior.** Once you flip, the agent starts seeing `SponsioBlocked` exceptions and enters retry loops that did not occur in observe. Plan for a day of re-tuning after the flip. --- @@ -1459,7 +1198,7 @@ Contract Fires Sessions Agents Tools ───────────────────────────────────── ────── ───────── ─────── ────────────── policy gate before refund 3 3 1 issue_refund bash must not contain rm -rf 1 1 1 bash -token_budget(50000) 0 — — — +token_budget(50000) 0 - - - ``` Columns: @@ -1525,7 +1264,7 @@ ls ~/.sponsio/sessions/support_bot/ # 2026-04-24T10-12-33Z.jsonl ``` -`sponsio report` reads these files. `sponsio scan -t '~/.sponsio/sessions/bot/*.jsonl'` mines them for contract candidates. Disable with `SPONSIO_SESSION_LOG=0` or `sessions_dir: null` in `sponsio.yaml`. +`sponsio report` reads these files. Disable with `SPONSIO_SESSION_LOG=0` or `sessions_dir: null` in `sponsio.yaml`. ## OpenTelemetry @@ -1612,10 +1351,10 @@ sponsio.agent_turn (root, one per check_action) | Attribute | Description | |---|---| -| `sponsio.enforcement.strategy` | `DetBlock`, `EscalateToHuman`, `RetryWithConstraint`, `RedirectToSafe`. | -| `sponsio.enforcement.action` | `blocked`, `escalated`, `retrying`, `redirected`, `observed`. | -| `sponsio.enforcement.retry_prompt` | Retry-with-lesson prompt, truncated to 2 KB. | -| `sponsio.enforcement.fallback_action` | Fallback action name for RedirectToSafe. | +| `sponsio.enforcement.strategy` | `DetBlock`, `EscalateToHuman`, `WarnOnly`, `RedirectToSafe`. `RetryWithConstraint` emits through the same attribute when the optional stochastic (LLM-judge) pipeline is plugged in. | +| `sponsio.enforcement.action` | `blocked`, `escalated`, `redirected`, `warned`, `observed`. `retrying` is reserved for the stochastic pipeline and not reachable in this OSS build. | +| `sponsio.enforcement.retry_prompt` | Retry-with-lesson prompt, truncated to 2 KB. Only emitted when an external stochastic (LLM-judge) evaluator is wired up. | +| `sponsio.enforcement.fallback_action` | Fallback tool name for `RedirectToSafe` (e.g. `log_refund_request` when the model attempted `issue_refund`). | ## Privacy and cost defaults @@ -1637,7 +1376,7 @@ OtlpHttpExporter(redact_args=False, truncate=False) |---|---| | Per-conversation `shield-trace.jsonl` | Carries raw tool args from prior subprocesses with no verdict context. Internal cross-process trace state. | | `~/.sponsio/cursor-subagents.jsonl` | Internal subagent registry, not user-facing. | -| User prompt original text | Default redacted because user prompts can carry PII or secrets. Opt in to `redact_args=False` only after legal sign-off. | +| User prompt original text | Default redacted because user prompts can carry personally identifiable information (PII) or secrets. Opt in to `redact_args=False` only after legal sign-off. | ## Versioning @@ -1764,51 +1503,183 @@ while not done: guard.guard_after(tool_name, output) ``` -## Framework-specific notes +## Tool policy: default-deny + proactive filtering (v0.2) -### Claude Agent SDK +Sponsio's `tool_policy` section lets you declare an allow-list once and have it surface either reactively (the AI tries a denied tool, gets blocked at call time) or proactively (the denied tool never reaches the AI's tool menu). -`guard.hooks()` plugs into `ClaudeAgentOptions(hooks=...)` directly. No tool wrapping needed. +```yaml +tool_policy: + default: deny # allow (default) | deny + approved: [search, read_file, list_dir] + enforcement: reactive # reactive (default) | proactive +``` -### OpenAI SDK +Or inline: -`patch_openai()` returns a guard whose every `client.chat.completions.create(...)` is checked automatically. Set `SPONSIO_OPENAI_STRICT_TOOL_ARGS=1` to fail closed when the model returns malformed JSON in `tool_call.function.arguments`. Default warns and degrades. +```python +guard = sponsio.Sponsio( + contracts=[...], + tool_policy={"default": "deny", "approved": ["search"], "enforcement": "proactive"}, +) +``` -### Google ADK +### What `proactive` does per adapter -`functools.wraps` preserves the original signatures, so ADK's introspection still works. Both sync and `async` tools are supported. Blocked calls return `{"status": "error", "error_message": "BLOCKED..."}` instead of executing the wrapped function, so the model sees a normal tool result and can self-correct. +The adapter matrix below reflects the real listing surface each framework exposes. Where an adapter can drop tools before the agent sees them, it does. Where it cannot, the rule still fires reactively via `guard_before`. -### MCP +| Adapter | `proactive` behavior | +|---|---| +| LangGraph, CrewAI, OpenAI Agents SDK, Google ADK | One-shot static filter in `guard.wrap(tools)`. Denied tools never get bound to the agent. Temporal rules (`must_precede`, `count_at_most`) still apply reactively at call time. | +| Claude Agent SDK | Hooks-based: the SDK owns the tool list. `enforcement: proactive` is a no-op here; reactive blocking via `guard.hooks()` is the supported path. | +| OpenAI SDK, Vercel AI SDK | Per-call by user: filter the `tools=[...]` array before each request with `guard.filter_tools([t.name for t in ALL_TOOLS])` (see custom-loop snippet below). | +| Custom loop (no framework) | Per-turn filter using `guard.filter_tools(...)` (see snippet below). Catches everything including temporal rules. | +| MCP | `MCPContractProxy` already reactive-blocks at `call_tool`. Per-turn filtering of `list_tools` is on the roadmap. | -MCP is a tool transport, not an agent framework. Use `guard_before()` / `guard_after()` directly, or wrap an MCP client transparently: +### Custom loop with per-turn proactive filtering + +`guard.filter_tools(candidates)` returns the subset of candidate tool names whose call would not be blocked right now. The call is pure (no events, logs, callbacks, or perf samples) and evaluates *all* contracts including temporal ones. Call it before each LLM turn: ```python -from sponsio.mcp import MCPContractProxy +import sponsio -proxy = MCPContractProxy(mcp_client=client, system=system) -result = await proxy.call_tool("send_email", {"to": "user@example.com"}) +guard = sponsio.Sponsio( + agent_id="my_agent", + contracts=["must call `verify_identity` before `transfer_funds`"], + tool_policy={"default": "deny", "approved": ["verify_identity", "transfer_funds"]}, +) + +ALL_TOOLS = [verify_identity_tool, transfer_funds_tool, debug_tool, ...] +ALL_NAMES = [t.name for t in ALL_TOOLS] + +while not done: + # Per-turn refresh: returns only tools legal under the current trace. + legal_names = set(guard.filter_tools(ALL_NAMES)) + legal_tools = [t for t in ALL_TOOLS if t.name in legal_names] + tool_name, args = llm_decide_next_action(messages, tools=legal_tools) + result = guard.guard_before(tool_name, args) + if result.blocked: + messages.append(f"Action blocked: {result.det_violations[0].message}") + continue + output = execute_tool(tool_name, args) + guard.guard_after(tool_name, output) ``` -## Config-driven (every framework) +The difference from `wrap()`-time filtering: `filter_tools` is called each turn and consults the live trace, so `must_precede(A, B)` opens B in the menu only *after* A fires. This is the most thorough proactive option Sponsio offers; it requires you to own the agent loop. -All integrations support loading contracts from a YAML file: +## Redirect to safe (v0.2) + +`redirect_to_safe(unsafe, safe)` substitutes a forbidden tool call with a pre-approved one instead of blocking the agent outright. The model can continue, just not down the unsafe path. ```python -from sponsio.langgraph import Sponsio +from sponsio import contract +from sponsio.patterns import redirect_to_safe -guard = Sponsio(config="sponsio.yaml", agent_id="my_bot") +guard = sponsio.Sponsio( + contracts=[ + contract("trash instead of rm") + .guarantees(redirect_to_safe("rm_rf", "trash")), + + # Conditional redirect: only large refunds get rerouted. + contract("large refunds go to review") + .assume("called `issue_refund`") + .guarantees(redirect_to_safe("issue_refund", "log_refund_request")), + ], +) ``` -See [Contract sources](../guides/contract-sources.md) for the YAML specification. +When the agent calls `rm_rf`, Sponsio: -## Long-running agents +1. Rolls back the `rm_rf` event from the trace so downstream counters (`rate_limit`, `count_at_most`) don't tick on the attempted call. +2. Surfaces `result.redirected=True` + `result.redirected_to="trash"` from `guard_before`. +3. The adapter invokes `trash` with the model's original arguments. The trace records the `trash` call (via the normal `guard_before(safe, args)` path), so the audit log reflects what actually executed. -The trace is append-only during a session. For 24/7 services, call `guard.rotate_session()` periodically to cap memory and keep the verifier's atom caches fresh. +The model sees the safe tool's result, not an error. Substitution is transparent unless the safe tool returns something the model cannot interpret (schema mismatch). -```python -for turn_idx, user_msg in enumerate(conversation): - response = agent_step(user_msg) - if turn_idx > 0 and turn_idx % 1000 == 0: +### Constraints + +- Both `unsafe` and `safe` must be registered with your framework. Sponsio does NOT synthesize tools. +- The safe tool should accept the same arguments as the unsafe one. If schemas diverge, the adapter passes args verbatim; the user is responsible for compatibility. +- A `redirect → blocked` chain (safe tool also violates a different contract) raises a hard block. Sponsio does not chain redirects to avoid loops. +- Self-redirect (`unsafe == safe`) is rejected loudly via `ToolCallBlocked`. The pattern factory already rejects `redirect_to_safe("X", "X")` at construction; this guard catches the case where a user wired `RedirectToSafe(safe="X")` directly via `policy={}` and bound it to a contract that triggers on tool `X`. +- A `redirect → redirect` chain (`safe` tool itself has a `redirect_to_safe` contract pointing elsewhere) is also rejected. Resolve the chain by pointing the original `unsafe` directly at the final safe tool. + +### Interaction with other contracts on the same tool + +If a tool has both a `redirect_to_safe` contract AND another contract (e.g. `must_precede`, `count_at_most`) that fires on the same call, the LangGraph adapter takes the **redirect path first** before checking for a block. The model never sees the block message because the call gets substituted; the substitute call is then checked against everything else. + +This means a `must_precede(check_policy, issue_refund)` contract paired with `redirect_to_safe("issue_refund", "log_refund_request")` will effectively skip the ordering check for `issue_refund` (the call gets redirected to `log_refund_request` immediately, and `must_precede` only applies to `issue_refund`'s actual execution which never happens). This is by design: redirecting and refusing are conflicting outcomes, and the redirect was your explicit intent for that tool. + +If you want both behaviors, write the `must_precede` against the safe tool (`must_precede(check_policy, log_refund_request)`), or use the framework-agnostic `guard.guard_before(unsafe_tool, args)` inspection in a custom loop where you can branch on `check.blocked` before `check.redirected`. + +### What `redirect_to_safe` does per adapter + +| Adapter | Redirect behavior | +|---|---| +| LangGraph | Built in. `wrap()` indexes tools by name; on redirect the wrapped `ToolNode` invokes the safe tool's `func` / `coroutine` with the model's original kwargs. Unknown safe tool name raises `ToolCallBlocked`. | +| CrewAI, OpenAI Agents SDK, Google ADK, Vercel AI, Claude Agent SDK | Surface only: `result.redirected_to` is set on the `CheckResult`. Adapter-side dispatch lands in a follow-up release. For now, custom loops can read `result.redirected_to` and call the substitute tool themselves. | +| Custom loop (no framework) | Read `check.redirected_to`, look up the safe tool in your registry, call it with the same args. See snippet below. | + +```python +# Custom loop pattern that honors redirect_to_safe outcomes +check = guard.guard_before(tool_name, args) +if check.redirected and check.redirected_to: + actual = check.redirected_to + check2 = guard.guard_before(actual, args) + if check2.allowed: + output = registry[actual](**args) + guard.guard_after(actual, output) +elif check.blocked: + messages.append(f"blocked: {check.det_violations[0].message}") +elif check.allowed: + output = registry[tool_name](**args) + guard.guard_after(tool_name, output) +``` + +## Framework-specific notes + +### Claude Agent SDK + +`guard.hooks()` plugs into `ClaudeAgentOptions(hooks=...)` directly. No tool wrapping needed. + +### OpenAI SDK + +`patch_openai()` returns a guard whose every `client.chat.completions.create(...)` is checked automatically. Set `SPONSIO_OPENAI_STRICT_TOOL_ARGS=1` to fail closed when the model returns malformed JSON in `tool_call.function.arguments`. Default warns and degrades. + +### Google ADK + +`functools.wraps` preserves the original signatures, so ADK's introspection still works. Both sync and `async` tools are supported. Blocked calls return `{"status": "error", "error_message": "BLOCKED..."}` instead of executing the wrapped function, so the model sees a normal tool result and can self-correct. + +### MCP + +MCP is a tool transport, not an agent framework. Use `guard_before()` / `guard_after()` directly, or wrap an MCP client transparently: + +```python +from sponsio.mcp import MCPContractProxy + +proxy = MCPContractProxy(mcp_client=client, system=system) +result = await proxy.call_tool("send_email", {"to": "user@example.com"}) +``` + +## Config-driven (every framework) + +All integrations support loading contracts from a YAML file: + +```python +from sponsio.langgraph import Sponsio + +guard = Sponsio(config="sponsio.yaml", agent_id="my_bot") +``` + +See [Config yaml reference](../reference/config-yaml.md) for the YAML specification. + +## Long-running agents + +The trace is append-only during a session. For 24/7 services, call `guard.rotate_session()` periodically to cap memory and keep the verifier's atom caches fresh. + +```python +for turn_idx, user_msg in enumerate(conversation): + response = agent_step(user_msg) + if turn_idx > 0 and turn_idx % 1000 == 0: guard.rotate_session() ``` @@ -1844,10 +1715,10 @@ Every `sponsio` command exits 0 on success and 1 on failure (parse error, violat ## sponsio scan -Scan source code, policy documents, or execution traces to discover contracts. +Scan source code or policy documents to discover contracts. ```bash -sponsio scan PATHS... [--llm] [--policy DOC] [--trace FILE] [-o sponsio.yaml] +sponsio scan PATHS... [--llm] [--policy DOC] [-o sponsio.yaml] ``` | Option | Description | @@ -1860,9 +1731,6 @@ sponsio scan PATHS... [--llm] [--policy DOC] [--trace FILE] [-o sponsio.yaml] | `--out`, `-o` | Output file (default: `./sponsio.yaml`; `-o -` for stdout) | | `--append` | Append to existing file instead of overwriting | | `--policy`, `-p` | Policy document(s), repeatable | -| `--trace`, `-t` | Trace file or glob (OTLP, Phoenix, Langfuse, Sponsio session JSONL). No LLM required. | -| `--trace-min-support` | Minimum traces a pattern must appear in (default `1`) | -| `--trace-confidence-threshold` | Confidence floor for ordering or sequence mining, 0-1 (default `0.95`) | ### Provider matrix @@ -1883,9 +1751,6 @@ sponsio scan src/agents/ # With LLM and policy sponsio scan src/agents/ --policy security.md --llm -o sponsio.yaml -# Mine from traces (no LLM) -sponsio scan src/ -t '~/.sponsio/sessions/agent/*.jsonl' - # Local model via Ollama sponsio scan src/ --llm --base-url http://localhost:11434/v1 --model llama3.1 ``` @@ -1925,6 +1790,30 @@ sponsio init . --plan "framework=crewai;ides=cursor:skill" # dr See [getting-started/quickstart.md](../getting-started/quickstart.md) for the typical interactive flow. +## sponsio onboard + +One-shot project wire-up: composes `init` + `scan` + `doctor` into a single command so first-time users don't have to learn three subcommands. Detects the framework, picks the best available LLM provider (env → `OPENAI_BASE_URL` → local Ollama → none), writes `sponsio.yaml` in observe mode with an inferred contract set, then prints the framework-specific agent-entry patch. + +```bash +sponsio onboard [TARGET] [--agent NAME] [--mode observe|enforce] [--force] +``` + +| Option | Description | +|---|---| +| `TARGET` | File or directory to scan (default: current). | +| `--mode` | Runtime mode written into `sponsio.yaml`. Omit to be prompted; `observe` is the safe default. | +| `--force` | Overwrite an existing `sponsio.yaml` without prompting. | +| `--no-probe-ollama` | Skip the `localhost:11434` liveness probe. | +| `--no-doctor` | Skip the post-onboard `sponsio doctor` run. | +| `--emit-context` | Skip the LLM step; emit the structured inputs as JSON for the `sponsio` skill. Pair with `sponsio prompt onboard`. | +| `--json` | Emit the structured `OnboardReport` as JSON. | + +```bash +sponsio onboard +sponsio onboard src/ --agent customer_bot +sponsio onboard --force --no-probe-ollama +``` + ## sponsio validate Parse-check contract strings. CI-friendly. @@ -2020,17 +1909,6 @@ Health checks: install integrity, config syntax, framework wiring. sponsio doctor ``` -## sponsio refresh - -Re-mine `source: trace` contracts from recent sessions. - -```bash -sponsio refresh --since 7d # dry-run -sponsio refresh --since 7d --apply # write back, with .sponsio.bak -``` - -User-written rules and `customized:` blocks pass through unchanged. Mines your own local session log. - ## sponsio packs List shipped contract packs with rule counts and `include:` syntax. @@ -2111,16 +1989,55 @@ sponsio mode (observe|enforce) [--config sponsio.yaml] [--agent NAME] Equivalent to setting `runtime.mode:` in yaml. The `SPONSIO_MODE` env var still wins over both. +**Parent-aware patching (v0.2)**. The CLI walks the yaml line by line tracking the current top-level key, then: + +1. Prefers updating an existing `mode:` line nested under `runtime:`. This is the only line the TypeScript loader reads, so picking the wrong line would silently leave TS stale. +2. Falls back to `mode:` nested under `defaults:` if no `runtime.mode` exists. Both loaders honor this. +3. On a yaml that has neither, appends a fresh `runtime:` block ONLY when target is `observe`. Refuses to append a fresh `enforce` block when no mode line exists and exits 1 with a clear hint. CI scripts that relied on the old exit-1 behavior for malformed configs keep working. To flip a clean yaml to enforce, run `sponsio mode observe` first (which appends the block), then `sponsio mode enforce`. + +The walker ignores `mode:` lines nested under unrelated keys (e.g. `judge.fallback_mode:` is not the runtime mode), and preserves inline comments and line endings on the patched line. + ## sponsio prompt Print the agent-facing prompt template for a Sponsio workflow. Used by the `sponsio` skill (W1 initial setup, W2 audit, W3 tune, W4 enforce, W5 troubleshoot). ```bash -sponsio prompt (onboard|refresh|scan) +sponsio prompt (onboard|scan) ``` Output is a copy-pasteable prompt block your AI assistant can run. +## sponsio serve + +Placeholder for the web-dashboard server. This distribution ships the contract runtime + CLI only; the long-lived HTTP backend is not bundled, so the command exits non-zero and points you at the local-inspection alternatives. + +```bash +sponsio serve # prints the alternatives below and exits 2 +``` + +For local observability use `sponsio host trace --follow` (live stream), `sponsio report --since 1h` (session summary), `sponsio replay ` (re-render a recorded session), or `sponsio export-sessions` (ship to a collector). + +## sponsio daemon + +Privileged-process side of the IPC split. The daemon owns the host bucket / per-plugin yaml files and is the only entity the host agent can reach to write them, so self-modify protection becomes an OS-level guarantee (ideally a separate UID under launchd/systemd) rather than a regex-on-tool-args one. + +```bash +sponsio daemon run [--socket PATH] [--mode 0600] # foreground; used by launchd/systemd +sponsio daemon ping [--echo VALUE] # round-trip health check +sponsio daemon status # resolved socket path + reachability +``` + +Socket path resolves to `$SPONSIO_DAEMON_SOCKET`, then `/var/run/sponsio.sock` if writable, else `~/.sponsio/sponsio.sock`. + +## sponsio cursor + +Cursor IDE integration. Cursor 1.7+ ships a deny-capable hook system (`hooks.json`); Sponsio plugs in as the command for the relevant pre-* events so every Shell/Read/Write/MCP call is evaluated against the contract library before Cursor executes it. + +```bash +sponsio cursor install-hooks # writes ~/.cursor/hooks.json (or project .cursor/hooks.json) +sponsio cursor guard --event # runtime hook handler; reads payload on stdin, denies via exit 2 +``` + --- ## TypeScript CLI @@ -2162,21 +2079,78 @@ Cross-language scenarios in `tests/cross_language/` validate identical verdicts --- -title: Pattern catalog -description: The full deterministic pattern library. Each pattern's NL form, what it enforces, and the LTL it compiles to. +title: Deterministic contracts and pattern catalog +description: The full deterministic pattern library, contract anatomy, and the failure strategies that run on violation. --- -# Pattern catalog +# Deterministic contracts and pattern catalog -Patterns are named factories that emit LTL formulas over the atom vocabulary. You write a natural-language rule; the parser matches it against these patterns and hands back a compiled contract. Patterns are *sugar*. They do not expand the expressiveness of the language, only the ergonomics. +A deterministic contract is a binary pass/fail rule evaluated before each tool call. If the rule is violated, Sponsio acts before any side effect happens. This is the hot path: zero LLM calls, microsecond latency. -Run `sponsio patterns` on the CLI to browse this catalog interactively with NL examples. +This page covers the shape of a contract, the four failure strategies, the full catalog of patterns that ship with Sponsio, and how to add a new one. For the conceptual model (atom → pattern → formula → contract) see [Concepts overview](../concepts/overview.md). For the full atom vocabulary see [Architecture § Atoms](../concepts/architecture.md). --- -## Safety +## Contract anatomy + +A deterministic contract has four parts: + +```python +contract("policy gate before refund") # name + .assume("called `issue_refund`") # when the rule applies + .guarantees("must call `check_policy` before `issue_refund`") # what must hold + .strategy("block") # what to do on violation +``` + +- **Name**: a human-readable label; shows up in logs, reports, and error messages. +- **Assumption (A)**: the condition that triggers the rule. The rule only fires when A holds. Omit for unconditional rules. +- **Guarantee (G)**: the temporal property that must hold when A is true. +- **Strategy**: what happens on violation: `DetBlock`, `EscalateToHuman`, `RedirectToSafe`, `WarnOnly`, or a custom callable. + +Both A and G can be natural-language strings or structured pattern calls. They compile down to LTL formulas over atoms. You never need to write the LTL by hand, but the engine ultimately checks the LTL. + +--- + +## When to reach for a deterministic contract + +Use a deterministic contract when the property is **structurally observable**: expressible with counters, regexes, paths, or ordering. Structural properties do not need semantic judgment, so they do not need an LLM in the hot path. + +Typical use cases: + +- **Ordering**: A must precede B; after X, Y is forbidden; every A must be followed by B. +- **Rate and retry limits**: at most N calls, cooldown between calls, bounded retries, loop detection. +- **Irreversibility gates**: once a commit or approval happens, downstream mutations are forbidden. +- **Argument checks**: blacklisted patterns, path scope limits, length or range caps. +- **Permissions**: static role-based access to certain tools. +- **Exact-regex PII**: SSN, credit card, email patterns that a regex can reliably catch. + +Anti-pattern: do not use a deterministic contract for properties that need reading the text semantically (tone, relevance, whether something is *truly* PII). The deterministic engine does not evaluate those; keep contracts to what is structurally observable. + +--- + +## Failure strategies + +When a contract is violated, the call routes through a **strategy**. Four ship in the box. + +| Strategy | Behavior | +|---|---| +| `DetBlock` (`block`) | Deny the call and raise `SponsioBlocked` to the framework. The agent can react and retry with a different plan. This is the default. | +| `EscalateToHuman` (`escalate`) | Deny the call AND fire user-supplied notifier callables (Slack webhook, email, oncall pager). Accepts `notify=[callable, ...]`. Notifier failures are isolated: a broken Slack hook does not crash the agent loop and does not silence the remaining notifiers. | +| `RedirectToSafe` (`redirect_to_safe`) | Substitute the offending call with a pre-declared safe tool. The agent continues on a safer path. Both `unsafe` and `safe` must be registered with the framework. The LangGraph adapter dispatches the substitute call transparently; other adapters surface `result.redirected_to` for the application to consume. | +| `WarnOnly` (`warn_only`) | Allow the call and emit a violation event to logs and dashboards. Useful when the contract is informational rather than enforcing. | +| `(callable)` | Custom callback. Receives the violated contract and the candidate event; returns a new strategy decision. | + +In **observe mode**, no strategy runs. Violations are logged and surfaced in reports, but the call is not blocked. This is how most teams wire Sponsio in first. See [Observe vs. enforce](../guides/observe-vs-enforce.md). + +--- + +## Catalog + +Run `sponsio patterns` on the CLI to browse this catalog interactively with NL examples. + +### Safety | Pattern | NL example | What it enforces | |---|---|---| @@ -2185,8 +2159,9 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `requires_permission(tool, perm)` | `"tool `transfer` requires permission `manager`"` | Agent must hold a static permission to use the tool | | `no_data_leak(src, dest)` | `"no data leak from `read_db` to `send_email`"` | Data must not flow between two agents/tools | | `destructive_action_gate(action)` | `"destructive action `drop_table` requires confirmation"` | A destructive tool needs an explicit gate step | +| `workflow_step(trigger, next_action)` | `workflow_step(Atom("ctx", "roaming_status", "disabled"), Atom("called", "toggle_roaming"))` | When `trigger` holds, the **next** event must satisfy `next_action`. Prescriptive counterpart to block-style patterns: instead of "you must not do X", it says "you must do X next". Both arguments are arbitrary atoms (`called(...)`, `ctx(k, v)`, `arg_field_has(...)`, etc.), so the same pattern covers tool-ordering, ctx-driven remediation, and arg-conditional follow-ups. | -## Compliance +### Compliance | Pattern | NL example | What it enforces | |---|---|---| @@ -2195,7 +2170,7 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `always_followed_by(A, B)` | `"every `refund` must be followed by `notify`"` | Whenever A happens, B must eventually happen | | `required_steps_completion(steps)` | `"`aml_check` must complete before `issue_loan`"` | All steps must have completed before a gate is passed | -## Operational +### Operational | Pattern | NL example | What it enforces | |---|---|---| @@ -2206,14 +2181,20 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `bounded_retry(action, N)` | `"tool `deploy` at most 3 retries"` | Action limited to N retries | | `loop_detection(action, N)` | `"tool `search` must not loop more than 5 times"` | Detects repeated calls with similar args | -## Exclusion +### Exclusion | Pattern | NL example | What it enforces | |---|---|---| | `mutual_exclusion(A, B)` | `"tools `approve` and `reject` are mutually exclusive"` | At most one of A or B can ever be called | | `tool_allowlist(tools)` | `"agent may only call `search`, `summarize`"` | Only listed tools may be called | -## Argument and path checks +### Recovery + +| Pattern | NL example | What it enforces | +|---|---|---| +| `redirect_to_safe(unsafe, safe)` | `"redirect `issue_refund` to `log_refund_request`"` | Substitute a forbidden tool with a pre-approved alternative. Bundled with the `RedirectToSafe` strategy: a violation surfaces as `action="redirected"` with `fallback_action=safe`, the trace records the substitute call. | + +### Argument and path checks | Pattern | NL example | What it enforces | |---|---|---| @@ -2223,7 +2204,7 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `arg_value_range(tool, field, lo, hi)` | `"`transfer.amount` between 0 and 10000"` | Numeric argument range | | `data_intact(tool, field)` | `"`aml_report` must not be edited after `aml_check`"` | Payload field is immutable once written | -## Agentic security +### Agentic security | Pattern | NL example | What it enforces | |---|---|---| @@ -2233,14 +2214,14 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `dangerous_sql_verbs()` | `"sql must not issue `DROP`, `TRUNCATE`, `ALTER`"` | Built-in SQL verb blacklist | | `irreversible_once(action)` | `"`post_tweet` at most once per session"` | Irreversible actions capped to a single call | -## Resource +### Resource | Pattern | NL example | What it enforces | |---|---|---| | `token_budget(N)` | `"total LLM tokens under 50000"` | Session-wide token cap | | `delegation_depth_limit(N)` | `"sub-agent delegation at most 3 levels"` | Bounds recursive agent delegation | -## Approval and audit +### Approval and audit | Pattern | NL example | What it enforces | |---|---|---| @@ -2251,14 +2232,14 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `dry_run_before_commit(dry_run, commit)` | `"`plan` must precede `apply`"` | Plan / preview step required before commit | | `sanitized_before_sink(source, sanitizer, sink)` | `"`untrusted_input` must pass `sanitize` before `db_write`"` | Untrusted input must pass a sanitizer before reaching a sink | -## Identity and context +### Identity and context | Pattern | NL example | What it enforces | |---|---|---| | `ctx_required(tool, key, values)` | `"`publish` requires ctx[`msg_verified`]=`true`"` | A `ctx(k, v)` fact must be set before the tool runs | | `ctx_matches_required(tool, key, regex)` | `"`issue_refund` requires caller_id matching `^spiffe://prod/finance-`"` | A `ctx(k, v)` value must match a regex | -## Argument allowlist and content +### Argument allowlist and content | Pattern | NL example | What it enforces | |---|---|---| @@ -2266,9 +2247,9 @@ For the conceptual model (atom → pattern → formula → contract) see [Concep | `duplicate_call_limit(tool, args_pattern, N)` | `"`send_email` to same recipient at most 1 time"` | Cap on repeated calls with similar args | | `time_since(predicate_key, max_seconds)` | `"action within 60s of `user_request`"` | Bounded time window since a referenced predicate | -## Output checks (det) +### Output checks (deterministic) -These are det atoms that match against `llm_response` events. Distinct from the Cloud sto atoms (`tone`, `faithfulness`, etc.) that need an LLM judge. +These are deterministic atoms that match against `llm_response` events via regex or exact string compare. They are distinct from stochastic atoms (judge-backed, like `tone` or `faithfulness`), which need an LLM judge at runtime and are not part of this OSS release. | Pattern | NL example | What it enforces | |---|---|---| @@ -2315,7 +2296,7 @@ Six steps: 2. If it needs a new observable, add atom extraction in [`sponsio/tracer/grounding.py`](../../sponsio/tracer/grounding.py). 3. Register it in the text DSL at [`sponsio/generation/dsl_to_contract.py`](../../sponsio/generation/dsl_to_contract.py). 4. Tests in [`tests/test_patterns.py`](../../tests/test_patterns.py) (formula) and [`tests/test_nl_parser.py`](../../tests/test_nl_parser.py) (NL round-trip). -5. Mirror in [`ts/packages/sdk/src/core/patterns.ts`](../../ts/packages/sdk/src/core/patterns.ts), or add a row to [`ts-sdk-parity.md`](ts-sdk-parity.md) if TS can't ground the atoms it uses. +5. Mirror in [`ts/packages/sdk/src/core/patterns.ts`](../../ts/packages/sdk/src/core/patterns.ts), or add a row to [`ts-sdk-parity.md`](ts-sdk-parity.md) if TS cannot ground the atoms it uses. 6. Document a row here, plus a `### Added` entry in `CHANGELOG.md`. For the full worked example end-to-end, with code excerpts from `sanitized_before_sink`, see [CONTRIBUTING § Adding a new pattern](../../CONTRIBUTING.md#adding-a-new-pattern). @@ -2334,12 +2315,14 @@ For the full worked example end-to-end, with code excerpts from `sanitized_befor --- title: sponsio.yaml reference -description: Full schema for the Sponsio config file. Agents, tools, contracts, modes, thresholds, strategies. +description: Full schema for the Sponsio config file plus the three ways to populate it. Agents, tools, contracts, modes, thresholds, strategies. --- # `sponsio.yaml` reference -`sponsio.yaml` is the canonical way to declare contracts. `sponsio scan` writes it; `sponsio init` writes it; `Sponsio(config=…)` reads it. +`sponsio.yaml` is the canonical way to declare contracts. `sponsio scan` writes it, `sponsio init` writes it, and `Sponsio(config=...)` reads it. + +This page covers two things: the three ways to populate the file, and the full schema of what can live inside it. A minimal valid file: @@ -2351,6 +2334,92 @@ agents: G: "must call `check_policy` before `issue_refund`" ``` +--- + +## How to populate sponsio.yaml + +Three sources produce the same output: enforceable contracts loaded via `Sponsio()`. + +``` +Source 1: Code scan sponsio scan src/ -o sponsio.yaml +Source 2: Policy documents sponsio scan src/ --policy security.md --llm +Source 3: Hand-written (edit sponsio.yaml directly) + │ + ▼ + sponsio.yaml + │ + ▼ + guard = Sponsio(config="sponsio.yaml") +``` + +The three sources mix freely in one yaml. Each contract entry can carry a `source:` tag for provenance. + +### Source 1: code scan + +Extract tools and infer constraints from agent source. + +```bash +sponsio scan src/agents/ -o sponsio.yaml +``` + +Without `--llm`, the scan is rule-based: + +1. Finds tools (`@tool` decorators, `Agent(tools=[...])`, `graph.add_node()`). +2. Extracts ordering from `graph.add_edge("A", "B")` and call graphs. +3. Generates `must_precede` constraints for each ordering dependency. +4. Outputs tools and constraints in yaml. + +With `--llm`, the LLM sees the full source and discovers constraints a static scan cannot find: + +- `always_followed_by` (liveness obligations) +- `rate_limit` (from constants like `MAX_RETRIES = 3`) +- `no_reversal` (from business logic semantics) + +```bash +sponsio scan src/agents/ --llm -o sponsio.yaml +sponsio scan src/agents/ --llm --provider gemini +``` + +Provider env vars and the full matrix: [reference/cli.md](cli.md#provider-matrix). + +### Source 2: policy documents + +Extract contracts from a policy or compliance document, using the tool inventory as context. + +```bash +# Scan code first to populate the tool inventory, then add policy: +sponsio scan src/agents/ -o sponsio.yaml +sponsio scan src/agents/ --policy security_policy.md --llm -o sponsio.yaml --append +``` + +The tool inventory is critical. Without it the LLM produces generic constraints. With it, policy maps to specific tools: + +``` +Policy: "All refunds require supervisor approval" +Tool inventory: [check_policy, issue_refund, notify_customer] +Constraint: must_precede(check_policy, issue_refund) +``` + +Supported document formats: `.md`, `.txt`, `.pdf` (`pip install sponsio[pdf]`). + +### Source 3: hand-written + +Edit `sponsio.yaml` directly. The two forms (NL strings and structured entries) are described in [Contracts](#contracts) below. + +### End-to-end workflow + +```bash +sponsio scan src/agents/ --llm -o sponsio.yaml # 1. discover +sponsio scan src/agents/ --policy compliance.md --llm --append # 2. policy +# 3. edit sponsio.yaml, add hand-written rules +sponsio validate --config sponsio.yaml # 4. validate +python my_agent.py # 5. run +``` + +--- + +## Full schema + A complete file, with every top-level field: ```yaml @@ -2379,9 +2448,7 @@ agents: strategy: block ``` ---- - -## Top-level fields +### Top-level fields | Field | Type | Default | Notes | |---|---|---|---| @@ -2389,10 +2456,32 @@ agents: | `framework` | string | auto-detect | `langgraph`, `claude_agent`, `openai`, `openai_agents`, `crewai`, `google_adk`, `vercel_ai`, `mcp`, or omitted. | | `sessions_dir` | path | `~/.sponsio/sessions/` | Set to `null` to disable local session logging. | | `tools` | map | `{}` | Optional tool metadata; scan populates automatically. | +| `tool_policy` | map | `{}` | Default-deny posture + approved-tool allowlist. See [`tool_policy`](#tool_policy) below. | | `agents` | map | required | Per-agent contract set. | --- +## `tool_policy` + +Declarative default-deny posture. The agent can only call tools in `approved:` when `default: deny` is set. Adding a new tool to the underlying framework does not auto-trust it. + +```yaml +tool_policy: + default: deny # allow (default, backwards-compat) | deny + approved: [search, read_file, list_dir] + enforcement: reactive # reactive (default) | proactive +``` + +| Field | Default | Behavior | +|---|---|---| +| `default` | `allow` | `deny` synthesizes a `tool_allowlist` contract that blocks every tool not in `approved`. `allow` is a no-op (backwards-compat). | +| `approved` | `[]` | Explicit allowlist. Empty plus deny blocks every tool (useful for a complete lockdown). Accepts a flat list or `{tools: [...]}` for future per-host scoping. | +| `enforcement` | `reactive` | `reactive`: the agent still sees the full tool menu; denied calls get blocked at call time via `guard_before`. `proactive`: wrap-time adapters (LangGraph, CrewAI, OpenAI Agents SDK, Google ADK) strip denied tools from the bound toolset before the model ever sees them. | + +Inline equivalent on `Sponsio(tool_policy={...})`. The two paths produce the same synthesized contract. + +--- + ## `agents.` Each agent has a dedicated contract list. Contracts do not leak across agents. @@ -2415,30 +2504,58 @@ Each entry in `contracts:` has these fields: |---|---|---|---| | `name` | string | no | Human-readable label for logs and reports. | | `A` | string \| object | no | Assumption. When the rule fires. Omit for unconditional rules. | -| `E` | string \| object | yes | Enforcement. The rule itself. | -| `strategy` | string | no | `block`, `escalate`, `retry_with_constraint`, `redirect_to_safe`, or a dotted callable path. | +| `G` | string \| object | yes | Guarantee. The rule itself. | +| `strategy` | string | no | `block` (`DetBlock`), `escalate` (`EscalateToHuman`, accepts `notify:` list of dotted callable paths), `redirect_to_safe` (substitute a pre-approved tool), `warn_only` (log without blocking), or a dotted callable path. | | `mode` | `observe` \| `enforce` | no | Per-contract override. | -### Shorthand form +### Shorthand form (natural-language strings) `A:` and `G:` accept a natural-language string; the parser matches it to a pattern: ```yaml -- G: "tool `check_policy` must precede `issue_refund`" -- G: "bash command must not contain `rm -rf`" -- G: "tool `query_db` at most 5 times" +agents: + customer_bot: + contracts: + - G: "tool `check_policy` must precede `issue_refund`" + - G: "tool `issue_refund` at most 3 times" + - G: "response must not contain PII" +``` + +Each entry is one `(assumption, guarantee)` pair. `A` is optional, `G` is required. Each field can be a scalar or a list (lists are ANDed). The legacy keys `E:` and `enforcement:` are still accepted for backward compatibility. + +Sponsio parses NL strings through two stages: rule-based first (free), LLM fallback last (requires API key). + +Common NL forms: + +``` +tool `A` must precede `B` +tool `X` at most N times +tool `A` requires permission `perm_name` +tools `A` and `B` are mutually exclusive +after `A`, tool `B` is forbidden +tool `A` cooldown of N steps ``` ### Structured form -For patterns that need typed arguments (lists, regex tuples, threshold floats). Use the structured form: +For patterns that need typed arguments (lists, regex tuples, threshold floats), use the structured form: ```yaml -- G: - pattern: arg_blacklist - args: ["bash", "rm -rf"] +agents: + customer_bot: + contracts: + - pattern: must_precede + args: [check_policy, issue_refund] + source: scan + - pattern: rate_limit + args: [issue_refund, 3] + - G: + pattern: arg_blacklist + args: ["bash", "rm -rf"] ``` +Compiled directly. No NL parsing. Auto-emitted by `sponsio scan`. + See the [pattern catalog](patterns.md) for the full list of deterministic patterns. --- @@ -2455,14 +2572,15 @@ tools: tags: [destructive, financial] ``` -Tags are free-form and can be referenced in patterns (e.g. `destructive_action_gate(tag="destructive")`). `sponsio scan` populates these from your tool definitions automatically. +Tags are arbitrary strings and can be referenced in patterns (for example, `destructive_action_gate(tag="destructive")`). `sponsio scan` populates these from your tool definitions automatically. --- ## Validating a config ```bash -sponsio validate sponsio.yaml +sponsio validate --config sponsio.yaml # parse + structural +sponsio validate --config sponsio.yaml --json # CI-friendly ``` Parses, type-checks, resolves every pattern reference, and reports unresolved names, mis-typed args, or atoms referenced but not registered. @@ -2471,7 +2589,7 @@ Parses, type-checks, resolves every pattern reference, and reports unresolved na sponsio doctor ``` -Broader. Also checks framework detection, provider credentials, session-log writability. +Broader. Also checks framework detection, provider credentials, and session-log writability. --- @@ -2481,17 +2599,27 @@ Broader. Also checks framework detection, provider credentials, session-log writ from sponsio.langgraph import Sponsio guard = Sponsio(config="sponsio.yaml", agent_id="support_bot") +agent = create_react_agent(model, guard.wrap(tools)) ``` `agent_id` picks which entry in `agents:` applies. If omitted, the default is the first agent in the file. +Inline contracts add on top of yaml: + +```python +guard = Sponsio( + config="sponsio.yaml", + agent_id="support_bot", + contracts=["tool `notify` at most 5 times"], +) +``` + --- ## Next - [Pattern catalog](patterns.md). Every deterministic pattern with NL form. -- [CLI reference](cli.md), `sponsio scan`, `sponsio validate`, `sponsio doctor`. -- [Contract sources](../guides/contract-sources.md). Scan, policy-doc mining, trace mining. +- [CLI reference](cli.md): `sponsio scan`, `sponsio validate`, `sponsio doctor`. @@ -2537,7 +2665,7 @@ No. Those tools score runs after the fact. Sponsio blocks unsafe calls in the ho ### "Isn't all of this just prompt engineering?" -Prompt engineering defines intent. Sponsio enforces the action boundary. A well-engineered prompt still leaves room for a fabricated AML check, a retry loop that burns budget, or a sudden decision to wire $800k. Contracts catch those regardless of how the prompt is worded. Use both. +Prompt engineering defines intent. Sponsio enforces the action boundary. A well-engineered prompt still leaves room for a fabricated compliance check (e.g. AML, KYC), a retry loop that burns budget, or a sudden decision to wire $800k. Contracts catch those regardless of how the prompt is worded. Use both. --- @@ -2545,7 +2673,7 @@ Prompt engineering defines intent. Sponsio enforces the action boundary. A well- ### Can I enforce a property that isn't in the atom vocabulary? -No, by design. The atom vocabulary is the observation boundary. If you need a new atom, add it (see [Architecture](../concepts/architecture.md)) and then write patterns over it. The engine can only reason about facts the grounding layer produces. +No, by design. An *atom* is one observable fact the engine can read from the trace (for example, "called `tool X`", "tool X was called with argument `path` containing `/etc`"). The set of atoms is the observation boundary. If you need a new one, add it (see [Architecture](../concepts/architecture.md)) and then write patterns over it. The engine can only reason about facts the grounding layer produces. ### Can OTEL do the blocking? @@ -2565,7 +2693,7 @@ No. If your LLM app calls tools, APIs, databases, or files, you can use Sponsio ### Python and TypeScript. Same semantics? -For deterministic contracts, yes. The Python and TS engines share the same LTL core and produce identical block/allow decisions over the same trace. The DFA/verifier, YAML config, discovery, and OTEL export are Python-only today. +For deterministic contracts, yes. The Python and TS engines share the same LTL (linear temporal logic) core and produce identical block/allow decisions over the same trace. The DFA (deterministic finite automaton) verifier, YAML config, discovery, and OTEL export are Python-only today. --- @@ -2579,6 +2707,12 @@ Two signals: the violation rate has plateaued (you're not discovering new false It will change behavior. Your agent starts seeing `SponsioBlocked` exceptions and has to react (retry, pick a different tool, escalate). Plan for a day of tuning after the flip. +Three soft-landing options when a hard block is too harsh: + +- **`redirect_to_safe(unsafe, safe)`**: substitute the unsafe call with a pre-approved one (e.g. `issue_refund` → `log_refund_request` for review). The agent continues on a safer path instead of bouncing off refusals. +- **`filter_tools(candidates)`**: call this before each model turn to pre-filter the tool menu against the live trace. The model never sees tools that would be blocked, so it does not waste tokens on attempts that will fail. +- **`tool_policy: { default: deny, enforcement: proactive }`**: the wrap-time variant of the above for adapters that own tool binding (LangGraph, CrewAI, OpenAI Agents SDK, Google ADK). Denied tools never reach the agent's bound toolset. + ### Can I enforce some contracts while observing others? Yes. Set the global `mode: observe` and add `mode: enforce` per-contract for the handful of hard-block rules you are already sure of. @@ -2589,7 +2723,7 @@ Yes. Set the global `mode: observe` and add `mode: enforce` per-contract for the ### Is Sponsio in the hot path of every tool call? -Yes. That's the point. The det pipeline is designed to stay there: pure Python, sub-10μs p99, zero LLM calls. +Yes. That is the point. The deterministic pipeline is designed to stay there: pure Python, sub-10μs at the 99th percentile (p99), zero LLM calls. ### Does it scale with trace length? @@ -2630,7 +2764,7 @@ Sponsio enforces the behavioral layer of all ten OWASP Agentic Top 10 (2026) ris Three risks span two layers. ASI-03 (Identity), ASI-04 (Supply Chain), and ASI-07 (Inter-Agent Comms) have a behavioral side (what the agent does with identities, tools, channels) and an infrastructure side (how identities get issued, packages get signed, channels get encrypted). Sponsio covers behavior. Issuance, signing, and encryption belong to your IAM, build pipeline, and transport stack. -To bridge those upstream systems into a contract, push facts via `guard.observe_context({k: v})` once per request. Contracts then reference them as `ctx(k, v)` atoms. Each affected risk lists its coverage condition. +To bridge those upstream systems into a contract, push facts via `guard.observe_context({k: v})` once per request. Contracts then reference them as `ctx(k, v)` atoms (one observable fact each, like `ctx("caller_id", "alice")`). Each affected risk lists its coverage condition. ## Coverage summary @@ -2644,7 +2778,7 @@ To bridge those upstream systems into a contract, push facts via `guard.observe_ | [ASI-06](#asi-06-memory-and-context-poisoning) | Memory Poisoning | `G(called(A) → ctx_matches(content_source, π)) ∧ G(arg_has(T, orig) → arg_paths_within(T, P))` | `ctx_matches_required`, `data_intact` | | [ASI-07](#asi-07-inter-agent-comms) | Inter-Agent Comms | `G(called(A) → ctx(msg_verified, "true")) ∧ G(delegation_depth ≤ D)` | `ctx_required`, `delegation_depth_limit` | | [ASI-08](#asi-08-cascading-failures) | Cascading Failures | `G(count(T) ≤ N) ∧ G(token_count ≤ B) ∧ G(consecutive_count(T) ≤ L)` | `rate_limit`, `token_budget`, `loop_detection` | -| [ASI-09](#asi-09-human-agent-trust) | Trust Exploitation | `((¬called(W) U called(Ap)) ∨ G(¬called(W))) ∧ G(arg_numeric(W, amount) ≤ N)` | `must_precede`, `must_confirm`, `arg_value_range` | +| [ASI-09](#asi-09-human-agent-trust) | Trust Exploitation | `((¬called(W) U called(Ap)) ∨ G(¬called(W))) ∧ G(arg_numeric(W, amount) ≤ N)` | `must_precede`, `must_confirm`, `arg_value_range`, `redirect_to_safe` | | [ASI-10](#asi-10-rogue-agents) | Rogue Agents | `G(called(Trig) → ⋀ᵢ F(called(stepᵢ))) ∧ G(count(act) ≤ 1)` | `required_steps_completion`, `irreversible_once` | ## Vocabulary @@ -2727,7 +2861,7 @@ contracts: args: [fetch, url, ["telemetry\\.acme-corp\\.io", "analytics\\..*\\.net"]] ``` -The runtime slice is genuinely thinner here. Sponsio stops unregistered tool calls and known-bad arg shapes. Sigstore, `pip-audit`, `osv-scanner`, Dependabot, and Socket.dev stop the compromised package from being installed in the first place. Run Sponsio on top of a build-time posture and you have defense in depth. Run it alone and "allowlisted tool whose implementation got swapped" stays uncovered. +The runtime slice is genuinely thinner here. Sponsio stops unregistered tool calls and known-bad arg shapes. Sigstore, `pip-audit`, `osv-scanner`, Dependabot, and Socket.dev stop the compromised package from being installed in the first place. Run Sponsio on top of those build-time tools and you have defense in depth. Run it alone and "allowlisted tool whose implementation got swapped" stays uncovered. ## ASI-05 Unexpected code execution @@ -2770,7 +2904,7 @@ for chunk in chunks: guard.observe_context({"content_source": chunk.source_uri}) ``` -Caveat. `ctx` is merge-on-write. A `retrieve(poison) → retrieve(canonical) → approve` trace passes the source check even though the poisoned chunk sat in context between the two retrieves. The `data_intact` clause covers most of this gap. A `ctx_ever_seen(k, v)` atom that propagates forward is on the roadmap. +Caveat. `ctx` is merge-on-write: a later `observe_context` value overwrites an earlier one for the same key, so the contract only sees the most recent value. A `retrieve(poison) → retrieve(canonical) → approve` trace passes the source check even though the poisoned chunk sat in context between the two retrieves. The `data_intact` clause covers most of this gap. A `ctx_ever_seen(k, v)` atom that propagates forward is on the roadmap. ## ASI-07 Inter-agent comms @@ -2848,7 +2982,7 @@ contracts: args: [approve_invoice] ``` -On an $847k wire to an unverified vendor, three contracts fire on the same call: amount over cap, no compliance approval, no confirm on file. The `required_steps_completion` rule handles the skipped-onboarding case. +On an $847k wire to an unverified vendor, three contracts fire on the same call: amount over cap, no compliance approval, no confirm on file. The `required_steps_completion` rule handles the skipped-onboarding case. For a softer landing, pair `wire_transfer` with `redirect_to_safe("wire_transfer", "request_supervisor_approval")` so large wires open a review ticket instead of a hard refusal. ## ASI-10 Rogue agents @@ -3049,7 +3183,7 @@ touched these: | Change | Update | |--------|--------| -| New pattern | `sponsio/patterns/library.py` + `sponsio/generation/dsl_to_contract.py` + `README.md` Pattern Library table + `docs/concepts/contracts.md` | +| New pattern | `sponsio/patterns/library.py` + `sponsio/generation/dsl_to_contract.py` + `README.md` Pattern Library table + `docs/reference/patterns.md` | | New integration | `sponsio/integrations/` + `README.md` Integrations table + `docs/integrations/index.md` | | New CLI subcommand | `sponsio/cli.py` + `docs/reference/cli.md` + `README.md` | | Public API change | `CHANGELOG.md` under `[Unreleased]` with `### Changed` or `### Added` | @@ -3300,11 +3434,323 @@ broke. ## [Unreleased] -_Nothing yet._ +### Fixed + +- **`@sponsio/sdk` is now edge-runtime safe.** Marked the package + `sideEffects` (narrowed to the CLI entry) so bundlers can tree-shake + the Node-only YAML/config-loading path out of edge bundles (Cloudflare + Workers), complementing the `createRequire` deferral in `0.2.0a3`. +- **Trace mining fails open when its extension isn't bundled.** + `CodeAnalyzer` imported `TraceMiner` unguarded, crashing + `sponsio scan --trace` with `ModuleNotFoundError` in builds without the + optional `trace_mining` extension; it now degrades to "no contracts + mined", matching the other call sites. + +### Changed + +- Added an explicit `[tool.ruff]` config to `pyproject.toml` so local + lint matches CI, and synced `docs/reference/cli.md` with the real CLI + surface (`onboard`/`serve`/`daemon`/`cursor` now documented). --- -## [0.1.1] — 2026-05-22 +## [0.2.0a3]: 2026-06-08 + +Security-relevant fix on top of `0.2.0a2`. If you are on `0.2.0a2` and +use any adapter OTHER than LangGraph with a `redirect_to_safe` +contract, you should upgrade. + +### Fixed + +- **`redirect_to_safe` now fails closed in non-LangGraph adapters** + (`sponsio/integrations/base.py`, `crewai.py`, `agents.py`, + `claude_agent.py`, `google_adk.py`, `vercel_ai.py`, `mcp.py`). + Previously, a `redirect_to_safe` violation returned + `action="redirected"` with `blocked=False`, and every adapter + except LangGraph gated on `if check.blocked` — meaning the + guard rolled the unsafe call out of the trace AND THEN the + adapter executed the original unsafe tool anyway. A new + `CheckResult.stop_original` property (`blocked OR redirected`) + is wired through every non-substituting adapter, so a redirect + now refuses the unsafe call. LangGraph still branches on + `redirected` first and performs the substitution. The Cursor + adapter takes a separate `evaluate_event` path and is tracked + as follow-up. Regression test added at + `tests/test_redirect_to_safe.py`. + +- **TS `Eq` now matches Python `==` for composite values** + (`ts/packages/sdk/src/core/evaluator.ts`). The previous `===` + comparison was reference equality for arrays and objects, so + `Eq(ArgValue("tool", "field"), CtxValue("expected"))` on + list- or object-valued args could pass in Python and fail in + TS on the same trace. New `valuesEqual` does element-/key-wise + deep comparison; parity test added at + `ts/packages/sdk/src/__tests__/parity.test.ts`. + +- **TS SDK no longer crashes on Cloudflare Workers** at import + time (`ts/packages/sdk/src/core/config-loader.ts`, + `pack-loader.ts`). The eager top-level + `createRequire(import.meta.url)` threw when + `import.meta.url` was undefined (Workers, some edge runtimes). + Now built lazily on first YAML load with a + `?? "file:///sponsio-noop.js"` fallback, so a Worker bundle + that never loads YAML never calls `createRequire`. + +- **Suite-wide pytest setup errors cleared up** + (`tests/conftest.py`). The autouse rich-style cache reset + invoked `isinstance(obj, Style)` on every live object; lazy + proxies from optional SDK imports (notably OpenAI's + `sounddevice`-pulling submodules) raised from their + `__class__` getter and errored 1684 of 2312 test setups. Now + swallows introspection failures. + +### Changed + +- **`filter_tools` documents O(candidates × trace_length) + re-grounding cost** + (`sponsio/integrations/base.py`). +- **`workflow_step` documents the end-of-trace weak-next + vacuity caveat** for batch verify / replay paths + (`sponsio/patterns/library.py`). +- **`Var.__eq__`, `_warned_missing_vars`, and `arg_value` + retention** all get explicit footgun notes + (`sponsio/formulas/evaluator.py`, `sponsio/formulas/formula.py`, + `sponsio/tracer/grounding.py`). +- **Test infrastructure** moves off the deprecated + `asyncio.get_event_loop().run_until_complete` to `asyncio.run` + (`tests/test_claude_agent_integration.py`). + +### Documentation + +- Several docstrings repaired (artifacts left over from the v0.2 + em-dash sweep, mostly first-line typos that surfaced in + `help()` and IDE hover popups). + +### Compatibility + +No breaking API changes. The `CheckResult` shape is unchanged +(`stop_original` is a new derived property, computed from +existing fields). Existing tests against `blocked` / +`redirected` still hold. + +### Credits + +Thanks to @donalddellapietra for the review pass that surfaced +the fail-open bug, the TS `Eq` parity gap, and the Worker +runtime crash. PR +[#78](https://github.com/SponsioLabs/Sponsio/pull/78). + +--- + +## [0.2.0a2]: 2026-06-07 + +### Added + +- **`Term` abstraction in the formula AST** (`sponsio/formulas/formula.py`). + The arithmetic comparison family (`Eq`, `Le`, `Lt`, `Ge`, `Gt`) now + accepts any `Term`, not just `Var` or `Const`. Four new term subclasses + unlock contracts that compare runtime values against each other: + - `ArgValue(tool, field)`: raw value of `args[field]` when the current + event is a call to `tool`. + - `CtxValue(key)`: raw value of an externally pushed context fact + (`guard.observe_context`). + - `ArgLength(tool, field)`: `len(args[field])` shorthand. + - `UnaryFn(fn, term)`: apply a Python callable to another term. + + `Var` and `Const` become `Term` subclasses, so their existing + counter-style semantics (default `0` for missing, numeric-only + coercion) are preserved. `ArithExpr` is now an alias of `Term` so + existing type hints keep working. + +- **`workflow_step(trigger, next_action)` pattern** + (`sponsio/patterns/library.py`). Prescriptive counterpart to the + block-style patterns: when `trigger` holds at the current event, the + next event must satisfy `next_action`. Both arguments are arbitrary + atoms, so the same factory covers tool-ordering, ctx-driven + remediation, and arg-conditional follow-ups. Compiles to + `G(trigger -> X(next_action))`. + +- **Five benchmark contract libraries** + (`sponsio/contracts/benchmark/*.yaml`). Hand-curated YAML libraries + that reproduce Sponsio's published benchmark numbers on RedCode-Exec, + ODCV-Bench, τ²-bench, AgentDojo, and SWE-bench. Loadable via + `include: [sponsio:benchmark/]` like a capability pack but kept + separate in intent (benchmark-reproduction artefacts, not auto-selected + by `onboard`). Documented in + [`docs/reference/benchmark-libraries.md`](docs/reference/benchmark-libraries.md). + +- **NL DSL extensions for the new primitives** + (`sponsio/generation/dsl_to_contract.py`). The natural-language parser + recognises `workflow_step` and the new `Term` comparison forms so + YAML hand-authoring and `sponsio validate` reach the new surface. + +### Changed + +- **Pattern count is now 46** (was 45). Catalog tables and README + callouts are updated to match. + +### Known limitations + +- **TypeScript SDK parity gap.** The `Term` abstraction, the + `workflow_step` factory, and the five benchmark YAML libraries are + Python-only in this release. TS will catch up in a follow-up. See + [`docs/reference/ts-sdk-parity.md`](docs/reference/ts-sdk-parity.md) + for the tracked gap list. + +--- + +## [0.2.0a1]: 2026-06-06 + +PyPI-render fix on top of `0.2.0a0`. No runtime changes; if you are +already on `0.2.0a0` there is no functional reason to upgrade. + +### Fixed + +- **README image references are now absolute GitHub raw URLs** + (`https://raw.githubusercontent.com/SponsioLabs/Sponsio/main/assets/...`). + The PyPI / TestPyPI README renderer does not resolve relative paths, + so the banner / architecture diagram / freeze comparison were + missing on the project page. Three READMEs (en / zh-CN / ja) are + updated for consistency; only `README.md` is what PyPI actually + serves. +- **CI lint regex updated to accept either relative or absolute URL** + for the banner check, so the old `WYSIWYG-stripped-the-banner` + warning keeps working under both URL forms. + +--- + +## [0.2.0a0]: 2026-06-03 + +Three new enforcement primitives plus a sharper failure-strategy +surface. The story: agents shouldn't have to fail catastrophically +when a contract fires. Block is one option, but it's the harshest one. +This release ships three softer-landing options that keep the agent +making progress while still gating the unsafe behavior. + +### Added + +- **`tool_policy` block (YAML + inline kwarg)**: declarative + default-deny posture. `default: deny` + `approved: [search, …]` + synthesizes a `tool_allowlist` contract automatically. Adding a new + tool to your framework does not auto-trust it: the policy is the + single source of truth for which tools the agent can reach. + Available in `sponsio.yaml` and on `Sponsio(tool_policy={…})`. Both + paths share one synthesis point so the resulting contract is + identical. +- **`enforcement: proactive` mode**: wrap-time tool filtering on + LangGraph, CrewAI, OpenAI Agents SDK, and Google ADK adapters. + Denied tools never reach the agent's bound toolset. Prompt + injection that tries to call them silently no-ops because the + model literally cannot name them. `enforcement: reactive` (the + default) keeps the legacy "block at call time" behavior. +- **`filter_tools(candidates)`**: pure-probe API on `BaseGuard` that + returns the subset of tool names legal to call given the live + trace. Custom agent loops (no framework) call this before each + model turn to pre-filter the tool menu and avoid wasted attempts + on temporal-precondition tools (`must_precede(A, B)` only allows B + after A has fired). Side-effect free: no log entry, no callback + fanout, no perf sample, no observe-mode wrapping. Implemented via + a `dry_run` flag on `RuntimeMonitor.check_action` that suppresses + every observable side effect under a depth counter. +- **`redirect_to_safe(unsafe, safe)` pattern + `RedirectToSafe` + strategy**: substitute a forbidden tool call with a pre-declared + safe one (`issue_refund` → `log_refund_request`, + `run_sql_destructive` → `select_only_dryrun`). The model keeps + making progress; it just can't do the unsafe thing. Trace honestly + records the substitute call, not the original. LangGraph adapter + dispatches the substitute transparently; other adapters surface + `result.redirected_to` for the application loop to invoke. +- **`EscalateToHuman(notify=[…])`**: strategy now accepts a callable + or a list of notifier callables that fire synchronously on each + violation. Each notifier gets `(violation, context, reason)`. + Notifier failures are isolated per-callback: a broken Slack + webhook does not crash the agent loop and does not silence the + remaining notifiers; the exception becomes a `RuntimeWarning` + naming the offending callable. +- **Cross-integration verification script.** + `scripts/verify_v0_2.py` runs 15 checks across the core runtime + and four adapters. Skip-on-missing-SDK rather than fail. Run + before any release to catch the kind of cross-mode bug that + `pytest` misses (conftest pins `SPONSIO_MODE=enforce`, production + default is `observe`). +- **Three workflow case studies.** + `examples/integrations/python/v0_2_*.py`. Refund agent + (LangGraph + `redirect_to_safe` + `filter_tools`), coding agent + (CrewAI + `tool_policy` default-deny + proactive), AP automation + (vanilla `Sponsio` + `EscalateToHuman` with Slack / email / + PagerDuty notifiers). Each exits 0 on success and surfaces FAIL + with detail on regression. + +### Changed + +- **`sponsio mode ` CLI is now parent-aware.** + Prefers updating `runtime.mode` (the only line the TS loader + reads), falls back to `defaults.mode`, refuses to append a fresh + `enforce` block out of thin air on a yaml without an existing + mode line, allows appending `observe` only. CI scripts that + relied on the old exit-1 behavior for malformed configs keep + working. Walk-and-track replaces the naïve `re.subn`. +- **`EscalateToHuman` action semantics documented.** The class + docstring now spells out the two patterns: notify-only (agent + continues, useful for high-stakes-action telemetry) and the + `DetBlock` + `register_callback` pairing for notify-and-refuse. + The runtime layer does NOT gate `CheckResult.allowed` on + `action="escalated"` because the monitor uses + `EscalateToHuman()` as the default strategy for + unfired-assumption verdicts; gating on it would break every + conditional contract whose assumption hasn't fired yet. +- **All pattern factories accept a `desc=` keyword.** + `redirect_to_safe` was the lonely exception; LLM extraction + (`llm_extraction.py:535`) always passes `desc=nl` to the pattern + factory, so the previous signature silently failed any + LLM-extracted `redirect_to_safe` rule. Now uniform. +- **TS SDK gets a `redirectToSafe` factory.** Formula side only: + same LTL semantics (`G(Not(called(unsafe)))`) so a TS evaluator + produces the same verdict as the Python verifier. The strategy + bundle and adapter dispatch are Python-only for now; documented + caveat in the TS docstring. +- **`Sponsio` factory + every framework-specific guard class + synthesize the `tool_policy` deny contract uniformly.** The + earlier code path only synthesized in the `Sponsio(framework=…)` + factory; direct framework-specific construction + (`LangGraphGuard(tool_policy=…)`, the idiomatic Python pattern) + silently dropped the policy. Centralized into + `BaseGuard.__init__`. + +### Fixed + +- **`LangGraphGuard` rejects chained redirects (A → B → C) and + self-redirects (A → A) loudly.** Previously a chained redirect + silently executed the intermediate tool, and a self-redirect + would have infinite-looped. Both now raise `ToolCallBlocked` with + a clear message naming the chain. +- **`render/components.py:contracts_table` wraps the name column in + `Text(name)`.** Rich interprets `[…]` as markup; contract descs + containing brackets (e.g. `only [search, read_file] approved`) + were having the bracketed segment silently swallowed. +- **`discovery/trace_replay.py` threads `content_atoms` into + `ground()`.** The previous call site dropped the argument, so + parameterised content predicates (`contains(pii)`, `arg_has(...)`) + were silently false-negative during historical-trace replay. + +### Documentation + +- Per-benchmark deep dives under `docs/reference/benchmarks/` + (agentdojo, odcv, redcode, swebench, tau2). Cross-reference fixed + (the index claimed "Four third-party benchmarks" but had five). +- HIGH-priority strategy / pattern enumeration fixes across + `docs/concepts/contracts.md`, `docs/concepts/overview.md`, + `docs/concepts/architecture.md`, `docs/reference/oss-scope.md`, + `docs/reference/config-yaml.md`, `docs/reference/patterns.md`, + `docs/reference/observability.md`, `docs/guides/observe-vs-enforce.md`, + `docs/guides/faq.md`. The strategy taxonomy is consistent across + all of them now: `DetBlock` / `EscalateToHuman` / `WarnOnly` / + `RedirectToSafe`. `RetryWithConstraint` is an extension point. +- `sponsio/tracer/semconv.py` stale comments updated to match. + +--- + +## [0.1.1]: 2026-05-22 ### Fixed @@ -3327,7 +3773,7 @@ _Nothing yet._ --- -## [0.1.0] — 2026-05-06 +## [0.1.0]: 2026-05-06 Open-source launch build. Closes the missing-implementation gap in 0.1.0a3 (CLI imported `sponsio.daemon` / `sponsio.plugin.append_ops` but the wheel @@ -3335,26 +3781,26 @@ shipped without them) and tunes the bundled capability rules. ### Added -- **`sponsio.daemon`** — Unix-socket IPC server + client + handlers; powers +- **`sponsio.daemon`**: Unix-socket IPC server + client + handlers; powers the privileged-process side of `sponsio plugin append` so a system install can give kernel-level (separate-UID) self-modify protection. -- **`sponsio plugin append`** — structurally-additive merge from a staging +- **`sponsio plugin append`**: structurally-additive merge from a staging YAML into a host bucket library; the only blessed write path through the self-modify pack. ### Changed -- **Capability/shell pack** — drop session-wide `rate_limit(exec, 50)` and +- **Capability/shell pack**: drop session-wide `rate_limit(exec, 50)` and `loop_detection(exec, 20)`. The 24-hour cross-session trace store turned these into rolling caps that false-positived heavy interactive work; the targeted `arg_blacklist` and confirm-gate rules already cover the real attacks. -- **Capability/self-modify pack** — extend protection to the upstream +- **Capability/self-modify pack**: extend protection to the upstream `sponsio` package (contract bundles + engine `.py`) so an editable / `--user` / venv install can't be used as an "edit the bundle to silence the rule" bypass. Maintainer workflow: override with `customized: {match: {source: "library:tier1.self-modify"}, disabled: true}`. -- **Onboard wizard** — drop redundant trailing "mode flip" hint (axis 3 +- **Onboard wizard**: drop redundant trailing "mode flip" hint (axis 3 already asks); language-aware bare-loop guard API hint (`guardBefore`/`guardAfter` for TS, `guard_before`/`guard_after` for Python). @@ -3369,7 +3815,7 @@ shipped without them) and tunes the bundled capability rules. --- -## [0.1.0a3] — 2026-05-02 +## [0.1.0a3]: 2026-05-02 Pre-launch test build. Sponsio is a runtime contract enforcement layer for AI agents: deterministic LTL contracts evaluated as a compiled DFA @@ -3378,37 +3824,37 @@ and a CLI for scanning, mining, and reporting. ### Added -- **Runtime engine** — LTL → DFA compiler, finite-trace evaluator, +- **Runtime engine**: LTL → DFA compiler, finite-trace evaluator, observe / enforce modes, session log writer, OTel exporter. -- **Pattern library** — 44 deterministic patterns (`must_precede`, +- **Pattern library**: 44 deterministic patterns (`must_precede`, `rate_limit`, `idempotent`, `arg_blacklist`, `arg_allowlist`, `no_data_leak`, `segregation_of_duty`, `cooldown`, `must_confirm`, `bounded_retry`, `loop_detection`, `scope_limit`, `arg_length_limit`, `data_intact`, `destructive_action_gate`, etc.) exposed both as Python factories and as natural-language triggers. -- **Contract bundles** — `sponsio:core/runaway`, `sponsio:core/universal`, +- **Contract bundles**: `sponsio:core/runaway`, `sponsio:core/universal`, `sponsio:capability/shell`, `sponsio:capability/filesystem`, `sponsio:incident/openclaw`. -- **Framework integrations** — LangGraph / LangChain.js, Claude Agent +- **Framework integrations**: LangGraph / LangChain.js, Claude Agent SDK, OpenAI SDK, OpenAI Agents SDK, Google ADK, Vercel AI SDK, CrewAI, MCP, plus a no-framework `guard_before` / `guard_after` API. -- **CLI** — `sponsio init` (interactive 4-axis wizard), plus the +- **CLI**: `sponsio init` (interactive 4-axis wizard), plus the underlying `sponsio onboard`, `scan`, `validate`, `check`, `report`, `refresh`, `eval`, `export`, `export-sessions`, `host`, `plugin`, `packs`, `patterns`, `prompt`, `mode`, `doctor`, `skill`, `replay`, `explain`, `demo`. -- **TypeScript SDK** (`@sponsio/sdk`) — deterministic engine + the +- **TypeScript SDK** (`@sponsio/sdk`): deterministic engine + the same set of framework integrations. -- **Static scanner** (`@sponsio/sdk`) — AST-based code scanner +- **Static scanner** (`@sponsio/sdk`): AST-based code scanner for proposing contracts from a TS / JS codebase. -- **Local observability** — session log JSONL writer, +- **Local observability**: session log JSONL writer, `sponsio host trace --follow` live stream, `sponsio report` rich / markdown / HTML / JSON output, OTel HTTP exporter for shipping to your own collector. -- **Plugins** — Claude Code plugin (production), OpenClaw plugin - (beta — type definitions track the public OpenClaw plugin docs; +- **Plugins**: Claude Code plugin (production), OpenClaw plugin + (beta: type definitions track the public OpenClaw plugin docs; end-to-end exercise inside a live OpenClaw runtime is in progress). -- **Benchmarks** — ODCV-Bench (**95.6% high-risk protection across 12 +- **Benchmarks**: ODCV-Bench (**95.6% high-risk protection across 12 LLMs**, 24 of 36 scenarios at 100% across every model) and RedCode-Exec (92% combined detection across 1,410 cases), with **0 FP increase** across 6 ODCV library iterations and 0% utility @@ -3420,5 +3866,5 @@ and a CLI for scanning, mining, and reporting. - Status: alpha. APIs may shift before 1.0; the trace event schema and CLI surface follow [SemVer](https://semver.org/) for breaking changes from 0.2 onward. -- Apache 2.0 — see [LICENSE](LICENSE) and the +- Apache 2.0: see [LICENSE](LICENSE) and the [OSS Promise](OSS_PROMISE.md). diff --git a/llms.txt b/llms.txt index c3028d4..5480ba6 100644 --- a/llms.txt +++ b/llms.txt @@ -10,7 +10,7 @@ ## Reference - [docs/concepts/contracts.md](docs/concepts/contracts.md): Contract DSL — deterministic patterns, LTL syntax, YAML schema. -- [docs/reference/cli.md](docs/reference/cli.md): CLI reference — scan, validate, check, report, demo, onboard, serve, refresh. +- [docs/reference/cli.md](docs/reference/cli.md): CLI reference — scan, validate, check, report, demo, onboard, serve. - [docs/integrations/index.md](docs/integrations/index.md): Per-framework integration guides. - [docs/concepts/architecture.md](docs/concepts/architecture.md): Design — LTL evaluator, det/sto pipelines, grounding. - [docs/concepts/owasp-coverage.md](docs/concepts/owasp-coverage.md): OWASP Agentic Top 10 (2026) → Sponsio control mapping. diff --git a/plugins/sponsio-claude-code/skills/configure/SKILL.md b/plugins/sponsio-claude-code/skills/configure/SKILL.md index 7a5de4a..4dc3409 100644 --- a/plugins/sponsio-claude-code/skills/configure/SKILL.md +++ b/plugins/sponsio-claude-code/skills/configure/SKILL.md @@ -231,8 +231,7 @@ LLM the prompt is written for. merge it into the heuristic library (or keep it as a separate `sponsio.semantic.yaml` next to the heuristic one). Each semantic contract should carry `source: agent-extracted` so - future `sponsio refresh` runs can distinguish them from - heuristic rules. + later tooling can distinguish them from heuristic rules. The whole loop is fast because (a) introspect + heuristic generation is one CLI call, (b) the prompt is short and on-disk (no network), diff --git a/pyproject.toml b/pyproject.toml index ae64fba..27696c5 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -125,10 +125,10 @@ sponsio = [ # reason over the tool inventory using its own LLM context — no # separate API call needed from the CLI. "plugin/prompts/*.md", - # Workflow prompt templates (onboard / refresh). ``sponsio prompt + # Workflow prompt templates (onboard / scan). ``sponsio prompt # `` prints these for the host agent driving the ``sponsio`` - # skill — pair with ``sponsio onboard --emit-context`` or - # ``sponsio refresh --emit-traces``. + # skill; pair with ``sponsio onboard --emit-context`` or + # ``sponsio scan --emit-context``. "prompts/*.md", # Agent Skills (Cursor / Claude Code / Codex). Bundled so # ``sponsio skill install`` can locate the canonical source after diff --git a/sponsio/cli.py b/sponsio/cli.py index 3ac9c7d..1dfad0a 100644 --- a/sponsio/cli.py +++ b/sponsio/cli.py @@ -2168,41 +2168,6 @@ def serve(host: str, port: int, dev: bool): type=click.Path(exists=True), help="Policy document (.md/.txt) to extract constraints from", ) -@click.option( - "--trace", - "-t", - "traces", - multiple=True, - type=str, - help=( - "Execution trace file, directory, or glob to mine contracts " - "from. Accepts OTLP/JSON, OTLP JSONL, native Sponsio " - "JSON/JSONL, and session-log JSONL " - "(~/.sponsio/sessions//*.jsonl). `~` is expanded. Can " - "be repeated: `-t 'traces/*.jsonl' -t extra.json`. No LLM required." - ), -) -@click.option( - "--trace-min-support", - type=int, - default=1, - show_default=True, - help=( - "Minimum number of traces a pattern must appear in before " - "trace-mining proposes it. Default `1` is loose. bump up " - "(e.g. `5`) when feeding a large production audit log." - ), -) -@click.option( - "--trace-confidence-threshold", - type=float, - default=0.95, - show_default=True, - help=( - "Confidence floor for ordering / sequence mining (0–1). " - "Higher = stricter. Default 0.95." - ), -) @click.option( "--push/--no-push", default=False, @@ -2238,7 +2203,7 @@ def serve(host: str, port: int, dev: bool): help=( "Skip the LLM step and instead emit the structured inputs " "(framework / tool inventory / scanned code excerpts / policy " - "docs / trace summaries) as JSON to stdout. Used by the host " + "docs) as JSON to stdout. Used by the host " "agent driving the ``sponsio`` skill: pair with " "``sponsio prompt scan`` and apply in the agent's own LLM " "context. no UnifiedExtractor call, no extra API key." @@ -2254,36 +2219,29 @@ def scan( out: str | None, append: bool, policy: tuple[str, ...], - traces: tuple[str, ...], - trace_min_support: int, - trace_confidence_threshold: float, push: bool, push_url: str, config_path: str | None, emit_context: bool, ): - """Scan source code, policy docs, and traces to propose contracts. + """Scan source code and policy docs to propose contracts. For first-time setup, prefer ``sponsio onboard``. it composes framework detection + scan + ``init``-style provider config + ``doctor`` health checks into a single command. ``scan`` is the library-maintenance tool you reach for *after* you have a - ``sponsio.yaml``: re-mine contracts from new code, append from - a policy doc, or pull in trace-derived ordering rules. + ``sponsio.yaml``: re-mine contracts from new code or append from + a policy doc. Analyzes tool definitions, decorators, and call patterns to infer safety constraints. Optionally extracts constraints from policy - documents (.md/.txt) using the discovered tool inventory as context, - and mines ordering / exclusion / rate-limit patterns from execution - traces (OTLP/JSON, OTLP JSONL, or native Sponsio). + documents (.md/.txt) using the discovered tool inventory as context. \b Examples: sponsio scan src/ # writes ./sponsio.yaml (rule-based) sponsio scan src/ --llm # + LLM inference sponsio scan src/ --policy security.md --llm # code + policy - sponsio scan src/ -t 'traces/*.jsonl' # code + trace mining - sponsio scan src/ -t traces/ --trace-min-support 5 sponsio scan src/ -o custom.yaml # write to custom path sponsio scan src/ -o sponsio.yaml --append # merge into existing sponsio scan src/ -o - # print to stdout (pipe) @@ -2300,8 +2258,8 @@ def _scan_progress(msg: str) -> None: # ---- agent-driven path: dump inputs, skip LLM step ------------------ # ``--emit-context`` runs the deterministic scan stages (AST tool - # inventory, policy doc collection, trace summary) and stops short - # of the LLM contract-mining inside ``CodeAnalyzer.generate_yaml``. + # inventory, policy doc collection) and stops short of the LLM + # contract-mining inside ``CodeAnalyzer.generate_yaml``. # The host agent picks up using ``sponsio prompt scan``. if emit_context: analyzer = CodeAnalyzer(use_llm=False) @@ -2320,26 +2278,6 @@ def _scan_progress(msg: str) -> None: except OSError: continue - # Lightweight trace summary: how many traces / events, no full - # event dump (the agent doesn't need every event to write - # sequence-shape contracts; per-pair counts are enough). - trace_summary: dict = {"files": [], "total_events": 0} - if traces: - from sponsio.discovery.trace_replay import ( # noqa: F401 - load_traces_from_paths, - ) - - try: - loaded = load_traces_from_paths(list(traces)) - trace_summary["files"] = sorted( - {str(t.source_path) for t in loaded if hasattr(t, "source_path")} - ) - trace_summary["total_events"] = sum( - len(t.events) for t in loaded if hasattr(t, "events") - ) - except Exception as e: # pragma: no cover. best-effort - trace_summary["error"] = str(e) - existing_yaml_text = "" out_path = Path(out) if out and out != "-" else Path("sponsio.yaml") if out_path.exists(): @@ -2355,7 +2293,6 @@ def _scan_progress(msg: str) -> None: "source_paths": source_paths, "tool_inventory": tool_inventory, "policy_docs": policy_docs, - "trace_summary": trace_summary, "existing_yaml": existing_yaml_text, "out_path": str(out_path), "next_steps_hint": ( @@ -2422,9 +2359,6 @@ def _scan_progress(msg: str) -> None: agent_id=agent, policy_paths=list(policy), tool_inventory=tool_inventory, - trace_paths=list(traces) if traces else None, - trace_min_support=trace_min_support, - trace_confidence_threshold=trace_confidence_threshold, ) # --- Auto-validate & drop unparseable contracts --------------------- @@ -4863,11 +4797,11 @@ def _progress(msg: str) -> None: # --------------------------------------------------------------------------- # # Counterpart to ``sponsio plugin prompt ``: prints the agent-facing -# extraction prompt for a top-level workflow (``onboard`` / ``refresh``). +# extraction prompt for a top-level workflow (``onboard`` / ``scan``). # The setup skill at ``sponsio/skills/sponsio/SKILL.md`` calls this so the # host agent (Claude Code, Cursor, Codex) can apply the prompt in its own # LLM context against the JSON emitted by ``sponsio onboard --emit-context`` -# or ``sponsio refresh --emit-traces``. +# or ``sponsio scan --emit-context``. def _patch_mode_in_yaml(text: str, target_mode: str) -> tuple[str, str]: @@ -5033,15 +4967,14 @@ def cmd_mode(target_mode: str, config_path: Path): @cli.command(name="prompt") @click.argument( "flow", - type=click.Choice(["onboard", "refresh", "scan"]), + type=click.Choice(["onboard", "scan"]), ) def cmd_prompt(flow: str): """Print the agent-facing prompt template for a sponsio workflow. Used by the ``sponsio`` skill (``W1``. initial setup, ``W2``. - audit & refine, ``W3b``. refresh from traces) to drive the host - agent through contract authoring without burning a separate LLM - API call. + audit & refine) to drive the host agent through contract authoring + without burning a separate LLM API call. Pair with the corresponding ``--emit-*`` flag: @@ -5053,10 +4986,6 @@ def cmd_prompt(flow: str): sponsio scan src/ --emit-context sponsio prompt scan - \b - sponsio refresh sponsio.yaml --emit-traces - sponsio prompt refresh - The agent reads both, applies the prompt to the JSON in its own context, and writes the result via Edit/Write. No ``UnifiedExtractor`` / API key needed for this path. diff --git a/sponsio/discovery/__init__.py b/sponsio/discovery/__init__.py index 01fbd7a..9144c8b 100644 --- a/sponsio/discovery/__init__.py +++ b/sponsio/discovery/__init__.py @@ -121,8 +121,7 @@ def discover( # --- Phase 2: Trace mining --- # Cross-trace pattern mining is an extension point not shipped in - # this build (`sponsio refresh` is its user-facing entry point). - # We skip Phase 2 silently when the module is absent so + # this build. We skip Phase 2 silently when the module is absent so # `discover(documents=[...], code_paths=[...])` still works for # the single-project case. if all_traces: diff --git a/sponsio/discovery/extractors/__init__.py b/sponsio/discovery/extractors/__init__.py index 9f56656..52b54b3 100644 --- a/sponsio/discovery/extractors/__init__.py +++ b/sponsio/discovery/extractors/__init__.py @@ -3,10 +3,10 @@ from sponsio.discovery.extractors.document import DocumentExtractor from sponsio.discovery.extractors.code_analysis import CodeAnalyzer -# ``TraceMiner`` is the cross-trace mining extractor that backs the -# ``sponsio refresh`` CLI; it is not part of this build. Best-effort -# import keeps ``from sponsio.discovery.extractors import TraceMiner`` -# working when a separate implementation is installed alongside. +# ``TraceMiner`` is the cross-trace mining extractor; it is not part of +# this build. Best-effort import keeps ``from sponsio.discovery.extractors +# import TraceMiner`` working when a separate implementation is installed +# alongside. try: # pragma: no cover - guarded import from sponsio.discovery.extractors.trace_mining import ( # type: ignore[import-not-found] TraceMiner, diff --git a/sponsio/discovery/extractors/code_analysis.py b/sponsio/discovery/extractors/code_analysis.py index 3a19308..645e680 100644 --- a/sponsio/discovery/extractors/code_analysis.py +++ b/sponsio/discovery/extractors/code_analysis.py @@ -2120,9 +2120,9 @@ def _extract_from_traces( Args: trace_paths: Paths or globs to trace files. min_support: Minimum traces that must exhibit a pattern - before it's proposed. Default **1** (loose) — CLI - callers can tighten via ``--trace-min-support`` when - feeding a large production audit log. + before it's proposed. Default **1** (loose); callers + can tighten via ``min_support`` when feeding a large + production audit log. confidence_threshold: Floor for ordering / sequence confidence (0–1). existing: Proposals already in the list — any trace-mined diff --git a/sponsio/onboard_setup.py b/sponsio/onboard_setup.py index 6b30fd5..3be8eb4 100644 --- a/sponsio/onboard_setup.py +++ b/sponsio/onboard_setup.py @@ -255,7 +255,7 @@ def render_sponsiorc(answers: SetupAnswers) -> str: "", f"framework: {answers.framework}", "", - "# Parse-time LLM — used by `sponsio scan` / `sponsio refresh`", + "# Parse-time LLM used by `sponsio scan`", "# to infer contracts from your tool definitions.", "extractor:", f" provider: {answers.provider or 'none'}", diff --git a/sponsio/plugin/scan.py b/sponsio/plugin/scan.py index 8b5db08..1f78fc2 100644 --- a/sponsio/plugin/scan.py +++ b/sponsio/plugin/scan.py @@ -260,8 +260,8 @@ def _render_library_yaml( Output shape matches what ``plugin init`` already writes for ``_host`` — top-level ``agents:`` with optional ``include:`` and a list of ``contracts:``. Each contract gets a - ``source: plugin-scan`` tag so future ``sponsio refresh`` runs - can distinguish heuristic contracts from user-written ones. + ``source: plugin-scan`` tag so later tooling can distinguish + heuristic contracts from user-written ones. """ contracts: list[dict] = [] for p in proposed: diff --git a/sponsio/prompts/__init__.py b/sponsio/prompts/__init__.py index 87d2ea8..5351184 100644 --- a/sponsio/prompts/__init__.py +++ b/sponsio/prompts/__init__.py @@ -1,10 +1,9 @@ """Workflow prompt templates for ``sponsio prompt ``. These markdown files are read by the host agent driving the -``sponsio`` skill's W1 (initial setup / onboard) and W3b (refresh -from traces) workflows. Same pattern as -:mod:`sponsio.plugin.prompts`: agent gets the prompt + structured -context (via ``--emit-context`` / ``--emit-traces``) and applies the -prompt in its own LLM context — no extra API key, no extra round +``sponsio`` skill's W1 (initial setup / onboard) and W2 (scan) +workflows. Same pattern as :mod:`sponsio.plugin.prompts`: agent gets +the prompt + structured context (via ``--emit-context``) and applies +the prompt in its own LLM context. no extra API key, no extra round trip. """ diff --git a/sponsio/prompts/onboard.md b/sponsio/prompts/onboard.md index c496272..c43aa00 100644 --- a/sponsio/prompts/onboard.md +++ b/sponsio/prompts/onboard.md @@ -48,9 +48,8 @@ A JSON object from `sponsio onboard --emit-context`: the user has to clean up. - **Source tagging.** Every contract YOU author carries - `source: agent-extracted` so future `sponsio refresh` runs can - distinguish your additions from pack rules and from - CLI-emitted starter rules. + `source: agent-extracted` so later tooling can distinguish your + additions from pack rules and from CLI-emitted starter rules. - **One contract per concrete failure mode.** Plain-English `desc:` so the user can review by reading. No omnibus rules. @@ -184,9 +183,8 @@ agents: ## Source tagging Every contract YOU author should carry `source: agent-extracted` so -future `sponsio refresh` runs know they were agent-generated and -can be re-considered. Don't tag pack-included rules — those have -their own source from the pack. +later tooling knows they were agent-generated. Don't tag +pack-included rules; those have their own source from the pack. ## What to do after diff --git a/sponsio/prompts/refresh.md b/sponsio/prompts/refresh.md deleted file mode 100644 index 46ea122..0000000 --- a/sponsio/prompts/refresh.md +++ /dev/null @@ -1,118 +0,0 @@ -# Contract refresh prompt — sponsio refresh from traces - -You are tuning an existing `sponsio.yaml` against accumulated -session traces. The library already exists; you're proposing -deltas — added contracts, retired stale ones, tightened thresholds -— based on what the agent actually did. - -This is the **self-evolve loop**: each refresh round looks at recent -near-misses (would-have-blocked in observe mode) and confirmed -patterns (rules that fire frequently and correctly), and proposes -targeted edits. - -## Input - -A JSON object from `sponsio refresh --emit-traces`: - -```json -{ - "agent": "...", - "since": "7d", - "existing_contracts": [ - {"desc": "...", "pattern": "...", "args": [...], "source": "..."} - ], - "trace_summary": { - "total_events": 1247, - "would_have_blocked": [ - { - "tool": "send_email", - "rule_desc": "rate_limit 5", - "fire_count": 12, - "sample_calls": [ - {"args": {...}, "ts": "...", "agent_outcome": "succeeded"} - ] - } - ], - "blocked_actual": [...], - "uncovered_patterns": [ - { - "tool": "transfer_funds", - "call_count": 3, - "sample_args": [{...}], - "note": "tool was called 3x; no contract covers it" - } - ] - } -} -``` - -## What you produce - -A YAML diff with three sections: - -```yaml -proposed_changes: - add: - - desc: "..." - G: {pattern: ..., args: [...]} - source: agent-extracted-from-traces - retire: - - match: { desc: "" } - reason: "fired 0 times in 7d window" - tighten: - - match: { desc: "" } - from: {pattern: rate_limit, args: [send_email, 5]} - to: {pattern: rate_limit, args: [send_email, 10]} - reason: "false-positive rate 12/30 over last 7d" -``` - -## Rules of thumb - -### Adding contracts - -* `uncovered_patterns` with `call_count` ≥ 3 and clear semantics → - add a contract. -* Don't add for tools that fired 1–2x — too sparse. -* Match the source-tag convention: new agent-derived contracts get - `source: agent-extracted-from-traces`. - -### Retiring (only `source: trace` or `source: agent-extracted-*`) - -* Existing contract that fired 0 times in the window → candidate for - retirement, but ONLY if it's `source: trace` or - `source: agent-extracted-*`. -* **Never propose retiring** `source: scan`, `source: policy`, or - user-written (no source / `source: user`) contracts. -* If retiring, include a one-line `reason`. - -### Tightening / loosening - -* High false-positive rate (`would_have_blocked` clusters with - `agent_outcome: succeeded` — the call was actually fine) → - loosen the cap or add `arg_blacklist` carve-outs. -* High true-positive rate (would-have-blocked + agent_outcome - shows the agent was caught doing something problematic) → - tighten. -* Always cite the trace counts in `reason` so the user can verify. - -## Pattern vocabulary - -Same as plugin scan / onboard — `arg_blacklist`, `rate_limit`, -`loop_detection`, `irreversible_once`, `must_precede`, -`arg_value_range`, `arg_length_limit`. - -## What you must not do - -* **Don't** retire user-written or pack-derived contracts. -* **Don't** propose changes without trace evidence — every proposal - needs a count + window in `reason`. -* **Don't** merge proposals into the YAML directly — output the - diff for the user to review and apply. Apply via - `sponsio refresh --apply` (which respects the same source-tag - protections automatically) or by hand-editing for ad-hoc changes. - -## Output format - -ONLY the `proposed_changes:` YAML block above. No prose, no -markdown wrapping. The host driver will show it to the user -verbatim. diff --git a/sponsio/prompts/scan.md b/sponsio/prompts/scan.md index 9604e74..56f13b5 100644 --- a/sponsio/prompts/scan.md +++ b/sponsio/prompts/scan.md @@ -105,11 +105,9 @@ to enforce and they wedge the agent. For each tool in here — take them seriously. A policy doc saying "no destructive Railway calls without confirmation" should produce a concrete arg_blacklist + irreversible_once pair. -- **`trace_summary.total_events`** > 0: ordering / sequence rules are - more reliable now (you can run `sponsio refresh --emit-traces` to - mine them properly). Without traces, prefer single-event patterns - (`arg_blacklist`, `rate_limit`, `irreversible_once`) over - trace-aware ones. +- Prefer single-event patterns (`arg_blacklist`, `rate_limit`, + `irreversible_once`) unless the code clearly establishes a required + ordering; only then reach for sequence-shaped rules. - **`existing_yaml`**: don't duplicate rules already there. Read its contracts, only emit gaps. diff --git a/sponsio/refresh.py b/sponsio/refresh.py deleted file mode 100644 index 484e920..0000000 --- a/sponsio/refresh.py +++ /dev/null @@ -1,488 +0,0 @@ -"""``sponsio refresh`` — re-mine contracts from recent traces and -surgically merge into an existing ``sponsio.yaml``. - -Design goals (see chat transcript for the design discussion): - -* **Preserve user tuning**: ``customized:``, ``include:``, ``runtime:``, - ``judge:``, ``workspace:``, ``tool_rename:``, and every contract - without a ``source: trace`` tag are left untouched. -* **Only touch what we own**: MVP updates exclusively ``source: trace`` - contracts. ``source: scan`` (from code) and ``source: policy`` are - treated as immutable, since a trace-only refresh has no signal about - whether they should stay or go. -* **Dry-run by default**: nothing is written until ``--apply``. Even - with ``--apply`` we backup to ``.sponsio.bak`` first. -* **Two modes**: - - * ``add-only`` — add new contracts, never remove or drift. Safe for - small trace windows. - * ``replace-trace`` (default with ``--apply``) — recent traces are - authoritative for the ``source: trace`` subset. Entries that no - longer show up in the fresh mining run are dropped. - -Identity (for dedup + drift detection) is -``(pattern_name, tuple_of_non_numeric_args)``. A numeric threshold -drift (e.g. ``rate_limit(send_email, 5)`` → ``(send_email, 12)``) is -surfaced as a "drifted" bucket rather than add+remove, because the -user usually wants to see it as a single "threshold moved" line. - -Comments and blank-line structure are NOT preserved through -``--apply`` because PyYAML's safe_dump doesn't round-trip them. The -backup file exists precisely so users can retrieve any prose -annotations they'd inlined. We warn about this on stderr. -""" - -from __future__ import annotations - -import re -import shutil -from dataclasses import dataclass, field -from pathlib import Path -from typing import Any - -__all__ = [ - "RefreshReport", - "compute_refresh", - "render_report", - "apply_refresh", - "DEFAULT_SESSION_GLOB", -] - - -# Trace-source tag values that refresh considers "owned". Keep this -# narrow: the MVP signal (trace mining) can only speak authoritatively -# about ``trace``-sourced contracts. -_REFRESHABLE_SOURCES = frozenset({"trace"}) - -# User-facing glob for the default session log location. The ``{agent}`` -# token is substituted by the CLI wrapper based on ``--agent``. -DEFAULT_SESSION_GLOB = "~/.sponsio/sessions/{agent}/*.jsonl" - - -# --------------------------------------------------------------------------- -# Identity / args normalization -# --------------------------------------------------------------------------- - - -def _is_numeric(v: Any) -> bool: - """Return True for ``int`` / ``float`` but NOT bool (since bool is - an int subclass and we want rules keyed on booleans to keep the - boolean in identity).""" - return isinstance(v, (int, float)) and not isinstance(v, bool) - - -def _normalize_arg(a: Any) -> Any: - """Canonical form of a single arg for identity/dedup purposes. - - Lists are recursed; everything else is stringified (so that the - YAML's ``[a, b, c]`` and the in-memory ``["a","b","c"]`` collapse - together).""" - if isinstance(a, list): - return tuple(_normalize_arg(x) for x in a) - if isinstance(a, bool): - return a - if _is_numeric(a): - return a # kept in `value_key` only; stripped from `identity_key` - return str(a) - - -def identity_key( - pattern: str | None, - args: list | tuple | None, - nl: str | None, -) -> tuple: - """Stable dedup key — used to decide whether two contracts refer to - the same rule. - - For structured contracts: ``(pattern, *non_numeric_args)``. Numeric - args are stripped so threshold drift shows up as "drift", not - "add+remove". - - For pure-NL contracts (no ``pattern:``): ``("__nl__", normalized_nl)`` - — we can't do semantic dedup without a parser round-trip, so use - the string itself. Case-folded + whitespace-collapsed so tiny - edits don't spuriously double-count. - """ - if pattern: - if args is None: - args = [] - non_num = tuple(_normalize_arg(a) for a in args if not _is_numeric(a)) - return (str(pattern), non_num) - if nl: - collapsed = re.sub(r"\s+", " ", nl.strip().lower()) - return ("__nl__", collapsed) - return ("__unknown__",) - - -def value_key(args: list | tuple | None) -> tuple: - """Full-args tuple — used for drift detection. Two contracts with - the same ``identity_key`` but different ``value_key`` are drifted, - not duplicates.""" - if args is None: - return () - return tuple(_normalize_arg(a) for a in args) - - -# --------------------------------------------------------------------------- -# Contract shape normalization -# --------------------------------------------------------------------------- - - -@dataclass -class _NormalizedContract: - """YAML-dict form flattened enough for diffing. - - We keep the ORIGINAL dict around as ``raw`` so that ``apply_refresh`` - can round-trip exactly what the user wrote — we only touch the - ``identity_key``-matching entries.""" - - raw: dict[str, Any] - source: str | None - pattern: str | None - args: list | None - assumption: str | None # raw A: text (str form only, for MVP) - nl: str | None # the E: text if it's a string, for NL-only entries - - def identity(self) -> tuple: - # For contracts with an A:, include its text in identity so - # ``must_precede(X, Y)`` conditional-on-A is distinct from the - # unconditional version. - base = identity_key(self.pattern, self.args, self.nl) - if self.assumption: - a = re.sub(r"\s+", " ", self.assumption.strip().lower()) - return base + ("A:" + a,) - return base - - def values(self) -> tuple: - return value_key(self.args) - - -def _text_of(field_value: Any) -> str | None: - """Flatten an A: / E: field to a single string when possible. - - The schema accepts either a scalar NL string OR a structured dict - ``{pattern, args, source}``. This helper returns the string form - (or ``None`` when it's a structured dict).""" - if isinstance(field_value, str): - return field_value - if isinstance(field_value, list): - # AND of strings — join with " and " so identity still sees them. - parts = [str(x) for x in field_value if isinstance(x, str)] - return " and ".join(parts) if parts else None - return None - - -def _normalize_contract_entry(entry: Any) -> _NormalizedContract | None: - """Collapse an ``agents..contracts[*]`` entry into the shape we - need for diffing. Returns ``None`` for entries we don't understand - (very malformed) — they'll be passed through untouched on apply.""" - if not isinstance(entry, dict): - return None - - # Extract A / assumption (accept both long and short keys). - a_raw = entry.get("A", entry.get("assumption")) - assumption = _text_of(a_raw) - - # Extract G / guarantee. - e_raw = entry.get("G", entry.get("guarantee")) - pattern = None - args: list | None = None - source: str | None = None - nl: str | None = None - if isinstance(e_raw, dict): - pattern = e_raw.get("pattern") - a = e_raw.get("args") - args = list(a) if isinstance(a, (list, tuple)) else None - source = e_raw.get("source") - elif isinstance(e_raw, str): - nl = e_raw - source = entry.get("source") # sometimes attached at entry level - elif isinstance(e_raw, list): - nl = _text_of(e_raw) - source = entry.get("source") - else: - return None - - return _NormalizedContract( - raw=entry, - source=source, - pattern=pattern, - args=args, - assumption=assumption, - nl=nl, - ) - - -# --------------------------------------------------------------------------- -# Diff structure -# --------------------------------------------------------------------------- - - -@dataclass -class RefreshReport: - """Per-agent diff summary produced by ``compute_refresh``. - - All lists hold ``_NormalizedContract`` instances (with ``.raw`` - pointing at the original dict). The "Drifted" bucket carries both - sides as a pair so the renderer can show old→new. - """ - - agent: str - added: list[_NormalizedContract] = field(default_factory=list) - drifted: list[tuple[_NormalizedContract, _NormalizedContract]] = field( - default_factory=list - ) - stale: list[_NormalizedContract] = field(default_factory=list) - unchanged_refreshable: list[_NormalizedContract] = field(default_factory=list) - untouched_immutable: list[_NormalizedContract] = field(default_factory=list) - - # Raw counts for programmatic access (e.g. tests). - @property - def net_change(self) -> int: - return len(self.added) - len(self.stale) - - @property - def is_noop(self) -> bool: - return not self.added and not self.drifted and not self.stale - - -# --------------------------------------------------------------------------- -# Core diff -# --------------------------------------------------------------------------- - - -def compute_refresh( - existing_contracts: list[Any], - fresh_contracts: list[Any], - agent: str, -) -> RefreshReport: - """Return a diff between the ``source: trace`` subset of - ``existing_contracts`` and the newly-mined ``fresh_contracts``. - - Entries whose source is NOT in ``_REFRESHABLE_SOURCES`` are bucketed - as ``untouched_immutable`` — they flow through any ``apply_refresh`` - call unchanged. This preserves user-written contracts, ``source: - scan`` (from code), ``source: policy``, and anything the user - hand-edited without a source tag. - """ - report = RefreshReport(agent=agent) - - refreshable_existing: list[_NormalizedContract] = [] - for e in existing_contracts: - nc = _normalize_contract_entry(e) - if nc is None: - # Keep unknowns as-is — we can't diff them but we must - # not drop them. Stash as immutable (use a placeholder - # with raw=e so the writer can round-trip). - report.untouched_immutable.append( - _NormalizedContract( - raw=e if isinstance(e, dict) else {"_raw": e}, - source=None, - pattern=None, - args=None, - assumption=None, - nl=None, - ) - ) - continue - if nc.source in _REFRESHABLE_SOURCES: - refreshable_existing.append(nc) - else: - report.untouched_immutable.append(nc) - - fresh_normalized: list[_NormalizedContract] = [] - for e in fresh_contracts: - nc = _normalize_contract_entry(e) - if nc is None: - continue - fresh_normalized.append(nc) - - ex_idx: dict[tuple, _NormalizedContract] = {} - for nc in refreshable_existing: - ex_idx.setdefault(nc.identity(), nc) # first wins on accidental dup - new_idx: dict[tuple, _NormalizedContract] = {} - for nc in fresh_normalized: - new_idx.setdefault(nc.identity(), nc) - - ex_keys = set(ex_idx) - new_keys = set(new_idx) - - for k in sorted(new_keys - ex_keys, key=lambda t: str(t)): - report.added.append(new_idx[k]) - for k in sorted(ex_keys - new_keys, key=lambda t: str(t)): - report.stale.append(ex_idx[k]) - for k in sorted(ex_keys & new_keys, key=lambda t: str(t)): - old = ex_idx[k] - new = new_idx[k] - if old.values() != new.values(): - report.drifted.append((old, new)) - else: - report.unchanged_refreshable.append(old) - - return report - - -# --------------------------------------------------------------------------- -# Rendering -# --------------------------------------------------------------------------- - - -def _fmt_contract(nc: _NormalizedContract) -> str: - """Short, stable one-line rendering for the diff output.""" - if nc.pattern: - args_str = "" - if nc.args: - parts = [a if isinstance(a, str) else repr(a) for a in (nc.args or [])] - args_str = "(" + ", ".join(parts) + ")" - prefix = f"A:{nc.assumption!r} ⇒ " if nc.assumption else "" - return f"{prefix}{nc.pattern}{args_str}" - if nc.nl: - preview = (nc.nl[:80] + "…") if len(nc.nl) > 80 else nc.nl - prefix = f"A:{nc.assumption!r} ⇒ " if nc.assumption else "" - return f"{prefix}NL: {preview}" - return "" - - -def render_report(reports: list[RefreshReport], *, color: bool = True) -> str: - """Turn a list of per-agent reports into the stderr-ready diff - summary. Passing ``color=False`` yields a plain string suitable - for tests and non-TTY environments.""" - - def _c(tag: str, text: str) -> str: - if not color: - return text - codes = { - "+": "\033[32m", - "-": "\033[33m", - "~": "\033[36m", - "=": "\033[90m", - "!": "\033[31m", - "reset": "\033[0m", - } - return f"{codes.get(tag, '')}{text}{codes['reset']}" - - lines: list[str] = [] - grand = {"added": 0, "drifted": 0, "stale": 0, "unchanged": 0, "immutable": 0} - for r in reports: - grand["added"] += len(r.added) - grand["drifted"] += len(r.drifted) - grand["stale"] += len(r.stale) - grand["unchanged"] += len(r.unchanged_refreshable) - grand["immutable"] += len(r.untouched_immutable) - - lines.append(f"Agent: {r.agent}") - if r.added: - for nc in r.added: - lines.append(_c("+", f" + new {_fmt_contract(nc)}")) - if r.drifted: - for old, new in r.drifted: - lines.append( - _c( - "~", - f" ~ drifted {_fmt_contract(old)} " - f"→ args {list(new.values())}", - ) - ) - if r.stale: - for nc in r.stale: - lines.append( - _c("-", f" - stale {_fmt_contract(nc)} (not re-observed)") - ) - if r.unchanged_refreshable: - lines.append( - _c( - "=", - f" = {len(r.unchanged_refreshable)} unchanged " - f"(source: trace, re-observed)", - ) - ) - if r.untouched_immutable: - lines.append( - _c( - "=", - f" = {len(r.untouched_immutable)} preserved " - f"(user / scan / policy / overrides — not touched)", - ) - ) - lines.append("") - - lines.append( - f"Total: +{grand['added']} ~{grand['drifted']} -{grand['stale']} " - f"={grand['unchanged']} unchanged ={grand['immutable']} preserved" - ) - return "\n".join(lines) - - -# --------------------------------------------------------------------------- -# Apply -# --------------------------------------------------------------------------- - - -def apply_refresh( - config: dict[str, Any], - reports: dict[str, RefreshReport], - fresh_agent_contracts: dict[str, list[Any]], - *, - mode: str = "replace-trace", -) -> dict[str, Any]: - """Return a NEW top-level config dict with each agent's - ``contracts:`` list rewritten per ``mode``. Does NOT mutate the - input. - - * ``add-only``: existing contracts are kept verbatim; only - genuinely-new ``source: trace`` entries are appended. - * ``replace-trace``: every existing ``source: trace`` entry is - dropped, then the full set of freshly-mined contracts is appended - (so drift / re-observed / genuinely-new all land from the fresh - side). Non-refreshable entries (user, scan, policy, overrides) - pass through untouched. - """ - if mode not in ("add-only", "replace-trace"): - raise ValueError(f"mode must be 'add-only' or 'replace-trace', got {mode!r}") - - out = dict(config) - agents = dict(out.get("agents") or {}) - - for agent_id, report in reports.items(): - a_cfg = dict(agents.get(agent_id) or {}) - existing: list = list(a_cfg.get("contracts") or []) - fresh: list = list(fresh_agent_contracts.get(agent_id) or []) - - if mode == "add-only": - # Keep everything existing; append only contracts whose - # identity wasn't seen in the existing refreshable set. - added = [nc.raw for nc in report.added] - a_cfg["contracts"] = existing + added - else: - # replace-trace: strip source:trace entries from existing, - # keep the rest in their original order, then append the - # full fresh set at the bottom. - kept: list = [] - for e in existing: - nc = _normalize_contract_entry(e) - if nc is not None and nc.source in _REFRESHABLE_SOURCES: - continue - kept.append(e) - a_cfg["contracts"] = kept + fresh - - agents[agent_id] = a_cfg - - out["agents"] = agents - return out - - -def backup_then_write( - target: Path, - new_yaml_text: str, - *, - backup_suffix: str = ".sponsio.bak", -) -> Path | None: - """Copy ``target`` → ``target.with_suffix(backup_suffix)``, then - write ``new_yaml_text`` to ``target``. Returns the backup path - (or ``None`` if ``target`` didn't exist yet).""" - backup: Path | None = None - if target.exists(): - backup = target.with_name(target.name + backup_suffix) - shutil.copy2(target, backup) - target.write_text(new_yaml_text) - return backup diff --git a/sponsio/runtime/session_log.py b/sponsio/runtime/session_log.py index b00d9b0..2cdfd7f 100644 --- a/sponsio/runtime/session_log.py +++ b/sponsio/runtime/session_log.py @@ -40,9 +40,8 @@ def _resolve_default_base_dir() -> Path: ``SPONSIO_SESSIONS_DIR`` (if set) takes precedence over the user-home default. Used by tests + ops setups that want - sandboxed traces (e.g. ``sponsio refresh --emit-traces`` against - a CI-staged log directory). Resolved per-import — set the env - before launching the sponsio process. + sandboxed traces (e.g. a CI-staged log directory). Resolved + per-import; set the env before launching the sponsio process. """ import os as _os diff --git a/sponsio/skills/sponsio/SKILL.md b/sponsio/skills/sponsio/SKILL.md index 5ed0503..c10ec67 100644 --- a/sponsio/skills/sponsio/SKILL.md +++ b/sponsio/skills/sponsio/SKILL.md @@ -1,6 +1,6 @@ --- name: sponsio -description: Install, observe, tune, enforce, and periodically refresh Sponsio — a runtime contract layer for LLM agents that blocks unsafe tool calls and scores output quality against declared rules. Use when the user wants to set up / add / install Sponsio, add guardrails or runtime safety to an LLM agent, generate or refine a sponsio.yaml, audit tool configurations for risks (data leaks, unguarded writes, missing confirmations), explain or review existing contracts, check what Sponsio would have blocked (`sponsio report`), refresh the contract library from recent traces (`sponsio refresh`), move from observe to enforce mode, or debug why a contract is (or isn't) firing. Triggers on phrases like "set up sponsio", "add sponsio", "install sponsio", "add guardrails", "monitor my agent", "harden my agent", "audit my agent", "generate contracts", "explain my sponsio.yaml", "sponsio report", "refresh contracts", "update my sponsio.yaml from traces", "flip to enforce", "false positive", "why is this rule firing". +description: Install, observe, tune, and enforce Sponsio: a runtime contract layer for LLM agents that blocks unsafe tool calls and scores output quality against declared rules. Use when the user wants to set up / add / install Sponsio, add guardrails or runtime safety to an LLM agent, generate or refine a sponsio.yaml, audit tool configurations for risks (data leaks, unguarded writes, missing confirmations), explain or review existing contracts, check what Sponsio would have blocked (`sponsio report`), move from observe to enforce mode, or debug why a contract is (or isn't) firing. Triggers on phrases like "set up sponsio", "add sponsio", "install sponsio", "add guardrails", "monitor my agent", "harden my agent", "audit my agent", "generate contracts", "explain my sponsio.yaml", "sponsio report", "flip to enforce", "false positive", "why is this rule firing". --- # Sponsio — Agent Safety Lifecycle Companion @@ -21,7 +21,6 @@ Dispatch by what the user is trying to do. Pick ONE workflow and follow it; do n | Tightening rules that apply to Task-spawned subagents (Cursor / Claude Code) — they lack user context and need stricter privileges than the main agent | **W2c — Subagent privilege boundary** | | Tuning the IDE's OWN host-plugin library (Claude Code's Bash / Read / Write / MCP gating; Cursor likewise) — different from the user's project sponsio.yaml | hand off to the ``sponsio-claude-code:configure`` skill (or the cursor analog). Don't reimplement here. | | Has Sponsio running in observe mode and wants to review violations, tune thresholds, silence false positives | **W3 — Tune in observe** | -| Wants to re-mine contracts from accumulated production traces / periodically maintain the library | **W3b — Refresh from traces** | | Ready to ship — wants to move from observe to enforce, needs regression confidence | **W4 — Flip to enforce** | | Sponsio errored, a rule isn't firing when it should, a rule is firing when it shouldn't | **W5 — Troubleshoot** | @@ -48,8 +47,8 @@ are still a config-correctness bug we'd rather avoid up-front. ### Zone A — project YAML (you may add additively) Path: `/sponsio.yaml` — the file `sponsio onboard` writes -into the user's repo. This evolves through every onboard / scan / -refresh cycle. **Adding** new contracts via `Edit` (extending +into the user's repo. This evolves through every onboard / scan +cycle. **Adding** new contracts via `Edit` (extending `old_string`) is the supported workflow. Three legal write modes: @@ -75,12 +74,11 @@ Three legal write modes: # or disabled: true # to silence (last resort) ``` -3. **Run `sponsio scan`** for bulk additions from code / policy / - traces — merges additively and writes atomically: +3. **Run `sponsio scan`** for bulk additions from code / policy, + merges additively and writes atomically: ```bash sponsio scan -o ./sponsio.yaml --append - sponsio refresh --since 7d --apply --mode add-only ``` ### Zone B — host bucket + plugin bundle YAMLs (user-only — never write directly) @@ -276,7 +274,6 @@ workflows, not extensions of W1: That skill owns ``sponsio plugin scan`` / ``sponsio plugin append`` / per-MCP-server library generation. W1 doesn't duplicate it. - - **Refresh contracts from accumulated traces** → **W3b**. - **Move from observe to enforce** → **W4**. - **Something doesn't fire / fires wrong** → **W5**. @@ -397,10 +394,9 @@ Match the user's input to the source(s): - "Explain / review my `sponsio.yaml`" → source 1 and/or others already applied; jump to "Explain contracts" below. - "Scan my agent code" → source 2, code-only. - "We have a security policy document" → source 2, add `--policy --llm`. -- "We already have session logs / OTLP traces" → source 2, add `-t ''` (trace mining; no LLM needed). - "I know the pattern I want but not the syntax" → source 4, then show them the yaml entry. -If ambiguous: ask ONE question — "(a) scan your code, (b) extract from a policy document, or (c) mine from traces?" +If ambiguous: ask ONE question — "(a) scan your code, or (b) extract from a policy document?" ### Run scan (when extraction is needed) @@ -408,14 +404,14 @@ You ARE the LLM. Use the agent-mediated path — Sponsio collects deterministic ```bash # 1. Sponsio dumps the deterministic inputs (AST tool inventory, policy -# docs, trace summary, existing yaml) as JSON: -sponsio scan --agent [--policy ] [-t ''] --emit-context +# docs, existing yaml) as JSON: +sponsio scan --agent [--policy ] --emit-context # 2. Sponsio prints the contract-authoring prompt template: sponsio prompt scan ``` -Read both, apply the prompt to the JSON in your own context, and produce contract YAML entries. Source-tag every entry you author with `source: agent-extracted` so future `sponsio refresh` can re-consider them. Trace-mined entries (when `-t` was passed) carry `source: trace` and are the subset W3b maintains over time. +Read both, apply the prompt to the JSON in your own context, and produce contract YAML entries. Source-tag every entry you author with `source: agent-extracted` so later tooling can distinguish them from heuristic rules. **Decide the write target by intent BEFORE writing**, using the W1 step-4 (A) vs (B) test. Write path differs by destination: @@ -674,69 +670,6 @@ With the assumption, the rule stays silent until the integration actually emits --- -## W3b — Refresh from traces - -Goal: treat `sponsio.yaml` as a **living library**, not a one-shot output. As real session logs accumulate, re-mine them to discover new patterns and retire stale ones, without clobbering anything the user wrote or tuned. - -### When to run - -- Weekly / sprint boundary — recommended cadence for an agent seeing live traffic. -- After a material behavior change (new tools, new workflow, new integration). -- Before flipping to enforce (W4) — catches late-arriving trace-sourced rules that weren't present in the original scan. - -### Steps - -You ARE the LLM here too. Use the agent-mediated path: Sponsio mines deterministic candidates from the trace, you decide which to keep / drop / adjust in your own context. - -1. Dry-run first. Always. - - ```bash - sponsio refresh --since 7d --emit-traces - sponsio prompt refresh - ``` - - The first dumps recent trace events + the current `source: trace` rules as JSON; the second prints the merge / dedup / drift prompt. Apply the prompt to the JSON in your context — you produce a structured diff per agent. Show the user the diff in this shape: - - ``` - Agent: support_bot - + new must_precede(validate_payment, charge_card) - ~ drifted rate_limit(send_email, 5) → args [send_email, 12] - - stale idempotent(list_users) (not re-observed) - = 8 unchanged (source: trace, re-observed) - = 12 preserved (user / scan / policy / customized — not touched) - ``` - -2. Review with the user. For each bucket: - - `+ new` — "the agent started doing X that your current yaml doesn't cover." User usually wants this. - - `~ drifted` — threshold moved. If new value is bigger, the rule was too tight; if smaller, production tightened up. User decides. - - `- stale` — rule hasn't fired in this window. **Not necessarily dead** — could just be rare. Conservative default is `--mode add-only` (never remove), which we recommend for the first few refreshes until the user trusts the window size. - - `= preserved` — these include every user rule, `source: scan`, `source: policy`, and anything under `customized:`. The count being non-zero is the load-bearing invariant: refresh **only** ever changes `source: trace` entries. - -3. Apply via Edit/Write. You produced the YAML changes in step 1 in your own context — write them to `sponsio.yaml`, back the old file up to `sponsio.yaml.sponsio.bak`, then validate: - - ```bash - sponsio validate --config sponsio.yaml - ``` - - **Fallback** (bare CLI, no host agent): `sponsio refresh --since 7d --apply [--mode add-only|replace-trace]` — uses Sponsio's own LLM via API key. Only that path needs a key. - - **Comments in the YAML are not preserved** — the backup is how users recover any prose annotations they'd inlined. Warn them of this before running. - -### Mode selection - -| Situation | Mode | -|---|---| -| First time running refresh, or small trace window | `add-only` (never removes) | -| Healthy trace volume, stable agent behavior | `replace-trace` (default — recent traces are authoritative) | -| Window just covers a launch / migration / incident (abnormal traffic) | Skip — artifact traces mislead the miner | - -### Do NOT - -- Do NOT run refresh on traces from an agent still in early iteration. Wait until the workflow is stable; otherwise every sprint invalidates the library. -- Do NOT remove `source: scan` / `source: policy` entries in the yaml by hand just because refresh doesn't touch them — they represent knowledge refresh can't reconstruct (code AST, policy docs). If you think one is wrong, edit `customized:`, don't delete. - ---- - ## W4 — Flip to enforce Goal: move from observe to enforce with regression confidence — no more logging, actual blocking. **This is a production change**; don't skip the checks. @@ -954,7 +887,7 @@ If asked for something out of scope (e.g., "also check my DB schema"), say so an This skill only uses these. Internal refactors are safe as long as these stay stable. -1. **CLI**: `sponsio onboard`, `sponsio scan PATHS [--agent N] [--llm] [--policy P] [-t GLOB] [-o FILE] [--append]`, `sponsio refresh [-c FILE] [-a AGENT] [-t GLOB] [--since DUR] [--mode add-only|replace-trace] [--apply]`, `sponsio validate [--config FILE | "NL string"] [--json]`, `sponsio check --trace FILE --config FILE --agent ID`, `sponsio report --agent ID --since DUR`, `sponsio doctor`, `sponsio patterns`, `sponsio packs`, `sponsio skill install [--tool cursor|claude|codex|both|auto]`. Exit 0 on success. +1. **CLI**: `sponsio onboard`, `sponsio scan PATHS [--agent N] [--llm] [--policy P] [-o FILE] [--append]`, `sponsio validate [--config FILE | "NL string"] [--json]`, `sponsio check --trace FILE --config FILE --agent ID`, `sponsio report --agent ID --since DUR`, `sponsio doctor`, `sponsio patterns`, `sponsio packs`, `sponsio skill install [--tool cursor|claude|codex|both|auto]`. Exit 0 on success. 2. **YAML**: top-level `agents:` as dict; each agent has optional `include:` / `tool_rename:` / `customized:` / `workspace:` and required `contracts:`; top-level `runtime:`. 3. **Patterns**: names in the table above keep their semantics. Renaming is a breaking change for this skill. 4. **`validate --json` shape**: per-contract `ok` / `type` / `pattern` / `formula` / `agent`. diff --git a/tests/test_emit_context_loop.py b/tests/test_emit_context_loop.py index 70eb0f7..bfe6ee6 100644 --- a/tests/test_emit_context_loop.py +++ b/tests/test_emit_context_loop.py @@ -16,7 +16,6 @@ Coverage: W1 — onboard agent-driven path (`--emit-context` + `prompt onboard`) - W3b — refresh agent-driven path (`--emit-traces` + `prompt refresh`) Mode A — plugin scan agent-driven (`--introspect` + `plugin prompt `) If any prompt template is renamed, deleted, or changes the @@ -66,7 +65,7 @@ def _run_cli(*args: str, timeout: int = 60) -> subprocess.CompletedProcess: (("plugin", "prompt"), "openclaw"), (("plugin", "prompt"), "mcp-bare"), (("prompt",), "onboard"), - (("prompt",), "refresh"), + (("prompt",), "scan"), ], ) def test_prompt_template_prints_well_formed(subcmd, flow): diff --git a/tests/test_skill_doc_sync.py b/tests/test_skill_doc_sync.py index df5d161..d759c35 100644 --- a/tests/test_skill_doc_sync.py +++ b/tests/test_skill_doc_sync.py @@ -65,13 +65,8 @@ def _skill_md_path() -> Path: (("onboard",), []), ( ("scan",), - ["--agent", "--llm", "--policy", "-t", "-o", "--append"], + ["--agent", "--llm", "--policy", "-o", "--append"], ), - # ``refresh`` is not part of this build: cross-trace pattern - # mining is an extension point that ``sponsio refresh`` would back. - # The SKILL.md still describes the workflow narratively (W3b) so - # contract authors know it exists, but the CLI no longer exposes - # the subcommand. (("validate",), ["--config", "--json"]), (("check",), ["--trace", "--config", "--agent"]), (("report",), ["--agent", "--since"]), @@ -141,11 +136,9 @@ def test_skill_surface_subcommand_exists_and_has_required_flags( "runtime", "auto", "skill", # bare "sponsio skill" — the group, not a subcommand invocation - # ``refresh`` and ``bench`` are not part of this build: refresh - # is an extension point (cross-trace pattern mining), bench - # deleted. SKILL.md still mentions them in narrative context to - # explain the broader surface to users authoring contracts. - "refresh", + # ``bench`` is not part of this build (deleted). SKILL.md may + # still mention it in narrative context to explain the broader + # surface to users authoring contracts. "bench", } ) diff --git a/ts/packages/sdk/prompts/onboard.md b/ts/packages/sdk/prompts/onboard.md index c496272..c43aa00 100644 --- a/ts/packages/sdk/prompts/onboard.md +++ b/ts/packages/sdk/prompts/onboard.md @@ -48,9 +48,8 @@ A JSON object from `sponsio onboard --emit-context`: the user has to clean up. - **Source tagging.** Every contract YOU author carries - `source: agent-extracted` so future `sponsio refresh` runs can - distinguish your additions from pack rules and from - CLI-emitted starter rules. + `source: agent-extracted` so later tooling can distinguish your + additions from pack rules and from CLI-emitted starter rules. - **One contract per concrete failure mode.** Plain-English `desc:` so the user can review by reading. No omnibus rules. @@ -184,9 +183,8 @@ agents: ## Source tagging Every contract YOU author should carry `source: agent-extracted` so -future `sponsio refresh` runs know they were agent-generated and -can be re-considered. Don't tag pack-included rules — those have -their own source from the pack. +later tooling knows they were agent-generated. Don't tag +pack-included rules; those have their own source from the pack. ## What to do after diff --git a/ts/packages/sdk/prompts/refresh.md b/ts/packages/sdk/prompts/refresh.md deleted file mode 100644 index 46ea122..0000000 --- a/ts/packages/sdk/prompts/refresh.md +++ /dev/null @@ -1,118 +0,0 @@ -# Contract refresh prompt — sponsio refresh from traces - -You are tuning an existing `sponsio.yaml` against accumulated -session traces. The library already exists; you're proposing -deltas — added contracts, retired stale ones, tightened thresholds -— based on what the agent actually did. - -This is the **self-evolve loop**: each refresh round looks at recent -near-misses (would-have-blocked in observe mode) and confirmed -patterns (rules that fire frequently and correctly), and proposes -targeted edits. - -## Input - -A JSON object from `sponsio refresh --emit-traces`: - -```json -{ - "agent": "...", - "since": "7d", - "existing_contracts": [ - {"desc": "...", "pattern": "...", "args": [...], "source": "..."} - ], - "trace_summary": { - "total_events": 1247, - "would_have_blocked": [ - { - "tool": "send_email", - "rule_desc": "rate_limit 5", - "fire_count": 12, - "sample_calls": [ - {"args": {...}, "ts": "...", "agent_outcome": "succeeded"} - ] - } - ], - "blocked_actual": [...], - "uncovered_patterns": [ - { - "tool": "transfer_funds", - "call_count": 3, - "sample_args": [{...}], - "note": "tool was called 3x; no contract covers it" - } - ] - } -} -``` - -## What you produce - -A YAML diff with three sections: - -```yaml -proposed_changes: - add: - - desc: "..." - G: {pattern: ..., args: [...]} - source: agent-extracted-from-traces - retire: - - match: { desc: "" } - reason: "fired 0 times in 7d window" - tighten: - - match: { desc: "" } - from: {pattern: rate_limit, args: [send_email, 5]} - to: {pattern: rate_limit, args: [send_email, 10]} - reason: "false-positive rate 12/30 over last 7d" -``` - -## Rules of thumb - -### Adding contracts - -* `uncovered_patterns` with `call_count` ≥ 3 and clear semantics → - add a contract. -* Don't add for tools that fired 1–2x — too sparse. -* Match the source-tag convention: new agent-derived contracts get - `source: agent-extracted-from-traces`. - -### Retiring (only `source: trace` or `source: agent-extracted-*`) - -* Existing contract that fired 0 times in the window → candidate for - retirement, but ONLY if it's `source: trace` or - `source: agent-extracted-*`. -* **Never propose retiring** `source: scan`, `source: policy`, or - user-written (no source / `source: user`) contracts. -* If retiring, include a one-line `reason`. - -### Tightening / loosening - -* High false-positive rate (`would_have_blocked` clusters with - `agent_outcome: succeeded` — the call was actually fine) → - loosen the cap or add `arg_blacklist` carve-outs. -* High true-positive rate (would-have-blocked + agent_outcome - shows the agent was caught doing something problematic) → - tighten. -* Always cite the trace counts in `reason` so the user can verify. - -## Pattern vocabulary - -Same as plugin scan / onboard — `arg_blacklist`, `rate_limit`, -`loop_detection`, `irreversible_once`, `must_precede`, -`arg_value_range`, `arg_length_limit`. - -## What you must not do - -* **Don't** retire user-written or pack-derived contracts. -* **Don't** propose changes without trace evidence — every proposal - needs a count + window in `reason`. -* **Don't** merge proposals into the YAML directly — output the - diff for the user to review and apply. Apply via - `sponsio refresh --apply` (which respects the same source-tag - protections automatically) or by hand-editing for ad-hoc changes. - -## Output format - -ONLY the `proposed_changes:` YAML block above. No prose, no -markdown wrapping. The host driver will show it to the user -verbatim. diff --git a/ts/packages/sdk/prompts/scan.md b/ts/packages/sdk/prompts/scan.md index 9604e74..56f13b5 100644 --- a/ts/packages/sdk/prompts/scan.md +++ b/ts/packages/sdk/prompts/scan.md @@ -105,11 +105,9 @@ to enforce and they wedge the agent. For each tool in here — take them seriously. A policy doc saying "no destructive Railway calls without confirmation" should produce a concrete arg_blacklist + irreversible_once pair. -- **`trace_summary.total_events`** > 0: ordering / sequence rules are - more reliable now (you can run `sponsio refresh --emit-traces` to - mine them properly). Without traces, prefer single-event patterns - (`arg_blacklist`, `rate_limit`, `irreversible_once`) over - trace-aware ones. +- Prefer single-event patterns (`arg_blacklist`, `rate_limit`, + `irreversible_once`) unless the code clearly establishes a required + ordering; only then reach for sequence-shaped rules. - **`existing_yaml`**: don't duplicate rules already there. Read its contracts, only emit gaps. diff --git a/ts/packages/sdk/skills/SKILL.md b/ts/packages/sdk/skills/SKILL.md index 2ba932c..82b0c7f 100644 --- a/ts/packages/sdk/skills/SKILL.md +++ b/ts/packages/sdk/skills/SKILL.md @@ -1,6 +1,6 @@ --- name: sponsio -description: Install, observe, tune, enforce, and periodically refresh Sponsio — a runtime contract layer for LLM agents that blocks unsafe tool calls and scores output quality against declared rules. Use when the user wants to set up / add / install Sponsio, add guardrails or runtime safety to an LLM agent, generate or refine a sponsio.yaml, audit tool configurations for risks (data leaks, unguarded writes, missing confirmations), explain or review existing contracts, check what Sponsio would have blocked (`sponsio report`), refresh the contract library from recent traces (`sponsio refresh`), move from observe to enforce mode, or debug why a contract is (or isn't) firing. Triggers on phrases like "set up sponsio", "add sponsio", "install sponsio", "add guardrails", "monitor my agent", "harden my agent", "audit my agent", "generate contracts", "explain my sponsio.yaml", "sponsio report", "refresh contracts", "update my sponsio.yaml from traces", "flip to enforce", "false positive", "why is this rule firing". +description: Install, observe, tune, and enforce Sponsio: a runtime contract layer for LLM agents that blocks unsafe tool calls and scores output quality against declared rules. Use when the user wants to set up / add / install Sponsio, add guardrails or runtime safety to an LLM agent, generate or refine a sponsio.yaml, audit tool configurations for risks (data leaks, unguarded writes, missing confirmations), explain or review existing contracts, check what Sponsio would have blocked (`sponsio report`), move from observe to enforce mode, or debug why a contract is (or isn't) firing. Triggers on phrases like "set up sponsio", "add sponsio", "install sponsio", "add guardrails", "monitor my agent", "harden my agent", "audit my agent", "generate contracts", "explain my sponsio.yaml", "sponsio report", "flip to enforce", "false positive", "why is this rule firing". --- # Sponsio — Agent Safety Lifecycle Companion @@ -18,7 +18,6 @@ Dispatch by what the user is trying to do. Pick ONE workflow and follow it; do n | Setting up Sponsio for the first time in a project ("add sponsio", "install sponsio", "add guardrails") | **W1 — Initial setup** | | Handing you a codebase and asking "what could go wrong?" / wants a fresh contract file from scratch / has a policy doc to encode | **W2 — Audit & refine** | | Has Sponsio running in observe mode and wants to review violations, tune thresholds, silence false positives | **W3 — Tune in observe** | -| Wants to re-mine contracts from accumulated production traces / periodically maintain the library | **W3b — Refresh from traces** | | Ready to ship — wants to move from observe to enforce, needs regression confidence | **W4 — Flip to enforce** | | Sponsio errored, a rule isn't firing when it should, a rule is firing when it shouldn't | **W5 — Troubleshoot** | @@ -99,10 +98,9 @@ Match the user's input to the source(s): - "Explain / review my `sponsio.yaml`" → source 1 and/or others already applied; jump to "Explain contracts" below. - "Scan my agent code" → source 2, code-only. - "We have a security policy document" → source 2, add `--policy --llm`. -- "We already have session logs / OTLP traces" → source 2, add `-t ''` (trace mining; no LLM needed). - "I know the pattern I want but not the syntax" → source 4, then show them the yaml entry. -If ambiguous: ask ONE question — "(a) scan your code, (b) extract from a policy document, or (c) mine from traces?" +If ambiguous: ask ONE question — "(a) scan your code, or (b) extract from a policy document?" ### Run scan (when extraction is needed) @@ -115,12 +113,9 @@ sponsio scan --agent --llm -o ./sponsio.yaml # + policy doc: sponsio scan --policy --llm -o ./sponsio.yaml - -# + trace mining (works on OTLP/JSON, OTLP JSONL, native, Sponsio session logs): -sponsio scan -t '~/.sponsio/sessions//*.jsonl' -o ./sponsio.yaml ``` -Scan auto-validates before writing; only contracts that parse cleanly are saved. Source-tagged with `source: scan` / `source: policy` / `source: trace`. The `source: trace` subset is the one `sponsio refresh` maintains over time (W3b). +Scan auto-validates before writing; only contracts that parse cleanly are saved. Source-tagged with `source: scan` / `source: policy`. ### Validate existing yaml (explain-only path) @@ -211,66 +206,6 @@ With the assumption, the rule stays silent until the integration actually emits --- -## W3b — Refresh from traces - -Goal: treat `sponsio.yaml` as a **living library**, not a one-shot output. As real session logs accumulate, re-mine them to discover new patterns and retire stale ones, without clobbering anything the user wrote or tuned. - -### When to run - -- Weekly / sprint boundary — recommended cadence for an agent seeing live traffic. -- After a material behavior change (new tools, new workflow, new integration). -- Before flipping to enforce (W4) — catches late-arriving trace-sourced rules that weren't present in the original scan. - -### Steps - -1. Dry-run first. Always. - - ```bash - sponsio refresh --since 7d - ``` - - Prints a structured diff per agent: - - ``` - Agent: support_bot - + new must_precede(validate_payment, charge_card) - ~ drifted rate_limit(send_email, 5) → args [send_email, 12] - - stale idempotent(list_users) (not re-observed) - = 8 unchanged (source: trace, re-observed) - = 12 preserved (user / scan / policy / overrides — not touched) - ``` - -2. Review with the user. For each bucket: - - `+ new` — "the agent started doing X that your current yaml doesn't cover." User usually wants this. - - `~ drifted` — threshold moved. If new value is bigger, the rule was too tight; if smaller, production tightened up. User decides. - - `- stale` — rule hasn't fired in this window. **Not necessarily dead** — could just be rare. Conservative default is `--mode add-only` (never remove), which we recommend for the first few refreshes until the user trusts the window size. - - `= preserved` — these include every user rule, `source: scan`, `source: policy`, and anything under `overrides:`. The count being non-zero is the load-bearing invariant: refresh **only** ever changes `source: trace` entries. - -3. Apply. - - ```bash - sponsio refresh --since 7d --apply # default mode: replace-trace - sponsio refresh --since 7d --apply --mode add-only # conservative: never remove - ``` - - Writes `sponsio.yaml` and backs the old file up to `sponsio.yaml.sponsio.bak`. **Comments in the YAML are not preserved** — the backup is how users recover any prose annotations they'd inlined. Warn them of this before running. - -### Mode selection - -| Situation | Mode | -|---|---| -| First time running refresh, or small trace window | `add-only` (never removes) | -| Healthy trace volume, stable agent behavior | `replace-trace` (default — recent traces are authoritative) | -| Window just covers a launch / migration / incident (abnormal traffic) | Skip — artifact traces mislead the miner | - -### Do NOT - -- Do NOT run refresh on traces from an agent still in early iteration. Wait until the workflow is stable; otherwise every sprint invalidates the library. -- Do NOT use `--apply` without first showing the dry-run diff. -- Do NOT remove `source: scan` / `source: policy` entries in the yaml by hand just because refresh doesn't touch them — they represent knowledge refresh can't reconstruct (code AST, policy docs). If you think one is wrong, edit `overrides:`, don't delete. - ---- - ## W4 — Flip to enforce Goal: move from observe to enforce with regression confidence — no more logging, actual blocking. **This is a production change**; don't skip the checks. @@ -487,7 +422,7 @@ If asked for something out of scope (e.g., "also check my DB schema"), say so an This skill only uses these. Internal refactors are safe as long as these stay stable. -1. **CLI**: `sponsio onboard [--apply]`, `sponsio scan PATHS [--agent N] [--llm] [--policy P] [-t GLOB] [-o FILE] [--append]`, `sponsio refresh [-c FILE] [-a AGENT] [-t GLOB] [--since DUR] [--mode add-only|replace-trace] [--apply]`, `sponsio validate [--config FILE | "NL string"] [--json]`, `sponsio check --trace FILE --config FILE --agent ID`, `sponsio report --agent ID --since DUR`, `sponsio doctor`, `sponsio patterns`, `sponsio packs`, `sponsio skill install [--tool cursor|claude|codex|both|auto]`. Exit 0 on success. +1. **CLI**: `sponsio onboard [--apply]`, `sponsio scan PATHS [--agent N] [--llm] [--policy P] [-o FILE] [--append]`, `sponsio validate [--config FILE | "NL string"] [--json]`, `sponsio check --trace FILE --config FILE --agent ID`, `sponsio report --agent ID --since DUR`, `sponsio doctor`, `sponsio patterns`, `sponsio packs`, `sponsio skill install [--tool cursor|claude|codex|both|auto]`. Exit 0 on success. 2. **YAML**: top-level `agents:` as dict; each agent has optional `include:` / `tool_rename:` / `overrides:` / `workspace:` and required `contracts:`; top-level `runtime:`. 3. **Patterns**: names in the table above keep their semantics. Renaming is a breaking change for this skill. 4. **`validate --json` shape**: per-contract `ok` / `type` / `pattern` / `formula` / `agent`. diff --git a/ts/packages/sdk/src/cli/prompt.ts b/ts/packages/sdk/src/cli/prompt.ts index f159d11..ce0b9cc 100644 --- a/ts/packages/sdk/src/cli/prompt.ts +++ b/ts/packages/sdk/src/cli/prompt.ts @@ -1,9 +1,9 @@ /** * ``sponsio prompt`` — print the agent-facing prompt template for a - * sponsio workflow (onboard / scan / refresh). + * sponsio workflow (onboard / scan). * - * Mirrors the Python ``sponsio prompt`` command. The same three .md - * files live in ``ts/packages/sdk/prompts/`` (mirrored from + * Mirrors the Python ``sponsio prompt`` command. The same .md files + * live in ``ts/packages/sdk/prompts/`` (mirrored from * ``sponsio/prompts/``); this command just reads and prints the right * one. Used by the Sponsio skill to drive contract authoring without * a separate LLM API call. @@ -11,7 +11,7 @@ import { readFileSync, existsSync } from "node:fs"; import { join } from "node:path"; -const FLOWS = new Set(["onboard", "scan", "refresh"]); +const FLOWS = new Set(["onboard", "scan"]); const HELP = "sponsio prompt — print the contract-authoring prompt for a workflow\n" + @@ -20,7 +20,7 @@ const HELP = " sponsio prompt \n" + "\n" + "ARGUMENTS:\n" + - " onboard | scan | refresh\n" + + " onboard | scan\n" + "\n" + "EXAMPLES:\n" + " sponsio prompt onboard\n" + diff --git a/ts/packages/sdk/src/cli/scan.ts b/ts/packages/sdk/src/cli/scan.ts index 1eeeacd..c359d9f 100644 --- a/ts/packages/sdk/src/cli/scan.ts +++ b/ts/packages/sdk/src/cli/scan.ts @@ -4,8 +4,8 @@ * Mirrors the Python ``sponsio scan`` library-maintenance flow: AST * scan a path, infer deterministic contracts heuristically, write a * sponsio.yaml (or append to an existing one). Used *after* ``onboard`` - * has set up the project — when you've added new tools and want to - * refresh the contract list. + * has set up the project, when you've added new tools and want to + * update the contract list. * * Differs from the default ``npx sponsio `` mode (which * emits a tools.json inventory) and from ``onboard`` (first-time diff --git a/ts/packages/sdk/src/cli/skill.ts b/ts/packages/sdk/src/cli/skill.ts index 8666bc7..8ad989a 100644 --- a/ts/packages/sdk/src/cli/skill.ts +++ b/ts/packages/sdk/src/cli/skill.ts @@ -1,7 +1,7 @@ /** * ``sponsio skill install`` — drop the universal SKILL.md * into Cursor / Claude Code / Codex skill directories so your - * coding agent knows how to ``onboard`` / ``scan`` / ``refresh`` / + * coding agent knows how to ``onboard`` / ``scan`` / * flip-to-enforce on every future project without re-pasting the * one-prompt setup. *