Modular execution substrate for governed agentic workloads: enforces behavioral contracts and resource budgets between untrusted model outputs and system capabilities, with pluggable memory backends and portable skill bundles.
L1 framework, the full L2 implementation wave, the L3 default-path
wiring + audit wave, the third-audit + L3 capability wave, the
run-provenance + provider-batch wave, the fifth-code-audit
hardening (plus the post-audit approval-resume binding fix, BL-193),
the BL-180 durable-adapter MVCC + transactional Protocol wave, the
sixth-code-audit hardening (BL-197-208 plus the ADR 0015
deferred close BL-209-211), the BL-133 skill execution
isolation Protocol + subprocess reference, the BL-212-BL-214 / BL-224 / BL-225 sweeper
size-bound wave (BoundedSweepableStore extension Protocol with
in-memory, SQLite, and opt-in Redis / DynamoDB / S3 references), the
seventh-code-audit hardening (BL-215-BL-218), the
eighth-code-audit hardening (BL-219-BL-222), the
ninth-code-audit hardening (BL-223), and the tenth-code-audit
hardening (BL-226 / BL-227, against the just-merged
BoundedS3Store) on main (see
docs/backlog.md,
ADR 0007,
ADR 0010,
ADR 0011,
ADR 0012,
ADR 0013,
ADR 0014,
ADR 0015,
ADR 0016,
ADR 0017,
ADR 0018,
ADR 0019,
ADR 0020).
Every L2/L3 change is additive to the L1 Protocols: new optional
parameters, new modules, and side-by-side Protocols; nothing in the L1
surface was removed. The package imports and type-checks with no
optional dependencies installed.
See CLAUDE.md for repository structure and conventions.
agents/operator CLI (python -m agents)workloads/individual agent workloads + loader (in-tree and out-of-tree)skills/Agent Skills bundles, registry, dispatchers, install sourcesharness/contracts, enforcement, runtime adapter, budgets, eventsmemory/namespace-bound stores and production adaptersevaluation/behavioural regression gate (dispatch P@1/MRR, trajectory)tests/test suite (mirrors the source layout)docs/architecture, ADRs, the L2 backlog, generated JSON Schemascripts/operational and developer scripts
- Harness. Behavioral contracts (pre/invariant/post/governance,
hard/soft severity),
run_under_contractenforcement with opt-in default-path wiring (skill-contract composition, drift recording + threshold events, recovery directives, run-scoped lifecycles), action budgets (steps/tokens/wall-clock/tool-calls, per-tool quotas, plus a cost dimension and per-tool token/wall-clock caps, cumulative across an approval pause), structured OTel-ready events, Jensen-Shannon distributional drift, and opt-in self-attesting run-provenance records (record_sink,contract_digest,verify_run_record, thescripts/check_run_records.pyoffline gate). - Provider batch capabilities (optional extras).
AnthropicBatchProcessor(Message Batches) andcache_control_system(prefix-stable prompt caching) under theanthropicextra;OpenAIBatchProcessor(OpenAI Batch API) under theopenaiextra. Async bulk at roughly 50% token price; lazily imported, the package type-checks without either SDK. - Runtime adapter.
PydanticAIRuntimewires the guard and budget into the tool-call path: every local and MCP tool call passes the same guard gate (approve / reject / require-approval), a wall-clock watchdog (preempts at an await boundary), streaming budget enforcement, a pause/ResumableState/resume approval flow, an opt-inRetryPolicy(backoff + circuit breaker), and an opt-in structured soft-reject. Provider selection and credentials: docs/runtime-providers.md. - Memory. Namespace-bound
MemoryStorewithInMemoryStorereference plusSQLiteStore,RedisStore,S3Store,DynamoDBStoreadapters; extension Protocols for batch, cursor scan, content-addressing, CAS, MVCC version tokens (VersionedMemoryStore), and similarity query (SemanticMemoryStore+InMemorySemanticStore);TTLSweeper; transparentEncryptedStore(AES-256-GCM) with static / env / file / rotating (VersionedKeyProvider) key providers, andACLStorewith role and attribute-based (AttributeACL) policies and an auditedAccessDeniedevent, both withwrap_encrypted/wrap_aclforwarding the wrapped backend's extension Protocols truthfully; optional audit events. - Evaluation. A behavioural regression gate:
evaluate_dispatch(P@1 / MRR over a JSON golden set) andevaluate_trajectory(expected vs actual contract terminal outcome), deterministic and network-free, run as a blocking CI job viascripts/eval.py. - Skills. Agent Skills spec-compliant loader/registry, skill
versioning (
name@version), seven router dispatchers (the five core keyword, LLM, lane, routing-chain, skill-based, plus the L2 multi-ensemble and embedding), anInstrumentedDispatchertelemetry wrapper, and adefault_dispatcherfactory for the recommended instrumented chain; a deterministicHashingEmbeddingProvider; skill-level contracts; and pluggable install sources (local, GitHub, marketplace) with bounded symlink-safe extraction, optional checksum and signature verification, and gated contract execution for untrusted bundles. - CLI.
python -m agents workloads list | skills list | skills install <name> --from <src> | run <wl> <q> [--json].
uv sync --all-extras # dev: every adapter + test doublesProduction backends are optional extras, lazily imported:
pip install 'agents[redis]' # RedisStore
pip install 'agents[aws]' # S3Store, DynamoDBStore
pip install 'agents[crypto]' # EncryptedStore (AES-256-GCM)
pip install 'agents[otel]' # OTelSink (OTLP/HTTP)make check # ruff + mypy + pytest
make schema # regenerate docs/schema/*.json from the models
uv run python scripts/eval.py # the BL-130 dispatch regression gatePre-1.0 infrastructure. See STATUS.md for phase and document maturity, LIMITATIONS.md for explicit scope boundaries and known gaps, CHANGELOG.md for material changes, docs/releasing.md for the versioning, release, and operations policy, and SECURITY.md for the hardening posture and disclosure process. Roadmap: docs/backlog.md; decisions: docs/adr/.