Skip to content

rmednitzer/agents

Agents

Ask DeepWiki

Modular execution substrate for governed agentic workloads: enforces behavioral contracts and resource budgets between untrusted model outputs and system capabilities, with pluggable memory backends and portable skill bundles.

Status

L1 framework, the full L2 implementation wave, the L3 default-path wiring + audit wave, the third-audit + L3 capability wave, the run-provenance + provider-batch wave, the fifth-code-audit hardening (plus the post-audit approval-resume binding fix, BL-193), the BL-180 durable-adapter MVCC + transactional Protocol wave, the sixth-code-audit hardening (BL-197-208 plus the ADR 0015 deferred close BL-209-211), the BL-133 skill execution isolation Protocol + subprocess reference, the BL-212-BL-214 / BL-224 / BL-225 sweeper size-bound wave (BoundedSweepableStore extension Protocol with in-memory, SQLite, and opt-in Redis / DynamoDB / S3 references), the seventh-code-audit hardening (BL-215-BL-218), the eighth-code-audit hardening (BL-219-BL-222), the ninth-code-audit hardening (BL-223), and the tenth-code-audit hardening (BL-226 / BL-227, against the just-merged BoundedS3Store) on main (see docs/backlog.md, ADR 0007, ADR 0010, ADR 0011, ADR 0012, ADR 0013, ADR 0014, ADR 0015, ADR 0016, ADR 0017, ADR 0018, ADR 0019, ADR 0020). Every L2/L3 change is additive to the L1 Protocols: new optional parameters, new modules, and side-by-side Protocols; nothing in the L1 surface was removed. The package imports and type-checks with no optional dependencies installed.

See CLAUDE.md for repository structure and conventions.

Layout

  • agents/ operator CLI (python -m agents)
  • workloads/ individual agent workloads + loader (in-tree and out-of-tree)
  • skills/ Agent Skills bundles, registry, dispatchers, install sources
  • harness/ contracts, enforcement, runtime adapter, budgets, events
  • memory/ namespace-bound stores and production adapters
  • evaluation/ behavioural regression gate (dispatch P@1/MRR, trajectory)
  • tests/ test suite (mirrors the source layout)
  • docs/ architecture, ADRs, the L2 backlog, generated JSON Schema
  • scripts/ operational and developer scripts

Capabilities

  • Harness. Behavioral contracts (pre/invariant/post/governance, hard/soft severity), run_under_contract enforcement with opt-in default-path wiring (skill-contract composition, drift recording + threshold events, recovery directives, run-scoped lifecycles), action budgets (steps/tokens/wall-clock/tool-calls, per-tool quotas, plus a cost dimension and per-tool token/wall-clock caps, cumulative across an approval pause), structured OTel-ready events, Jensen-Shannon distributional drift, and opt-in self-attesting run-provenance records (record_sink, contract_digest, verify_run_record, the scripts/check_run_records.py offline gate).
  • Provider batch capabilities (optional extras). AnthropicBatchProcessor (Message Batches) and cache_control_system (prefix-stable prompt caching) under the anthropic extra; OpenAIBatchProcessor (OpenAI Batch API) under the openai extra. Async bulk at roughly 50% token price; lazily imported, the package type-checks without either SDK.
  • Runtime adapter. PydanticAIRuntime wires the guard and budget into the tool-call path: every local and MCP tool call passes the same guard gate (approve / reject / require-approval), a wall-clock watchdog (preempts at an await boundary), streaming budget enforcement, a pause/ResumableState/resume approval flow, an opt-in RetryPolicy (backoff + circuit breaker), and an opt-in structured soft-reject. Provider selection and credentials: docs/runtime-providers.md.
  • Memory. Namespace-bound MemoryStore with InMemoryStore reference plus SQLiteStore, RedisStore, S3Store, DynamoDBStore adapters; extension Protocols for batch, cursor scan, content-addressing, CAS, MVCC version tokens (VersionedMemoryStore), and similarity query (SemanticMemoryStore + InMemorySemanticStore); TTLSweeper; transparent EncryptedStore (AES-256-GCM) with static / env / file / rotating (VersionedKeyProvider) key providers, and ACLStore with role and attribute-based (AttributeACL) policies and an audited AccessDenied event, both with wrap_encrypted / wrap_acl forwarding the wrapped backend's extension Protocols truthfully; optional audit events.
  • Evaluation. A behavioural regression gate: evaluate_dispatch (P@1 / MRR over a JSON golden set) and evaluate_trajectory (expected vs actual contract terminal outcome), deterministic and network-free, run as a blocking CI job via scripts/eval.py.
  • Skills. Agent Skills spec-compliant loader/registry, skill versioning (name@version), seven router dispatchers (the five core keyword, LLM, lane, routing-chain, skill-based, plus the L2 multi-ensemble and embedding), an InstrumentedDispatcher telemetry wrapper, and a default_dispatcher factory for the recommended instrumented chain; a deterministic HashingEmbeddingProvider; skill-level contracts; and pluggable install sources (local, GitHub, marketplace) with bounded symlink-safe extraction, optional checksum and signature verification, and gated contract execution for untrusted bundles.
  • CLI. python -m agents workloads list | skills list | skills install <name> --from <src> | run <wl> <q> [--json].

Install

uv sync --all-extras        # dev: every adapter + test doubles

Production backends are optional extras, lazily imported:

pip install 'agents[redis]'   # RedisStore
pip install 'agents[aws]'     # S3Store, DynamoDBStore
pip install 'agents[crypto]'  # EncryptedStore (AES-256-GCM)
pip install 'agents[otel]'    # OTelSink (OTLP/HTTP)

Build and test

make check     # ruff + mypy + pytest
make schema    # regenerate docs/schema/*.json from the models
uv run python scripts/eval.py   # the BL-130 dispatch regression gate

Project status and security

Pre-1.0 infrastructure. See STATUS.md for phase and document maturity, LIMITATIONS.md for explicit scope boundaries and known gaps, CHANGELOG.md for material changes, docs/releasing.md for the versioning, release, and operations policy, and SECURITY.md for the hardening posture and disclosure process. Roadmap: docs/backlog.md; decisions: docs/adr/.

License

Apache License 2.0. See LICENSE and NOTICE.

About

Modular execution substrate for governed agentic workloads.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages