Yes, if you experience any of these pain points:
- Agent gives confident but wrong answers (no grounding)
- Debugging agent decisions takes hours (no audit trail)
- Workflow failures corrupt data without rollback (no checkpoints)
- Audit asks "why did the agent do X?" and you can't answer (no lineage)
- Multiple data sources give conflicting answers (no trust model)
- Integration between agent steps is brittle (no typed contracts)
Even one checked box means this standard can help. Three or more, and adoption is strongly recommended.
Lead with concrete problems this solves:
- Reference past incidents: "Remember when [agent failure] happened? This standard would have prevented that."
- Connect to compliance: "Audit asked for decision lineage—this provides it automatically."
- Quantify debug time: "We spent X hours debugging that agent issue. With grounding and audit trails, it's traceable in minutes."
- Show the validator: Run
python tools/validate_workflows.pyon an example—seeing errors caught statically is compelling.
Incremental adoption works. You don't need to adopt everything at once.
| Timeline | Action | Outcome |
|---|---|---|
| Week 1 | Run validator on existing workflows | Find hidden issues |
| Week 2 | Add checkpoints to mutation-heavy workflows | Enable rollback |
| Month 1 | Implement grounding for high-risk capabilities | Traceable decisions |
| Quarter 1 | Achieve L2 conformance | Full type safety |
| Quarter 2 | Implement trust model for multi-source workflows | Conflict resolution |
The standard is modular. Common partial adoptions:
- Checkpoints only: Just add
checkpointbefore mutations. Immediate rollback capability. - Validation only: Use the validator without changing runtime. Catch errors statically.
- Audit only: Add
auditsteps for compliance without other changes. - Full safety layer: Checkpoints + verification + rollback for high-risk workflows.
Start where the pain is worst, then expand as you see value.
Issues #3, #4, and #5 proposed additional capabilities for these layers. After first-principles evaluation, all were rejected because they failed the criteria in docs/methodology/FIRST_PRINCIPLES_REASSESSMENT.md:
Proposals that are workflow patterns:
listen= loop ofreceivecoordinate=decompose+delegate+synchronize+integratecompress=recall+integrate+transform+persist
Proposals that are parameter variations:
sample=searchwithlimitscan=searchwith exhaustive querybroadcast=sendwith multiple destinations
Proposals covered by existing capabilities:
observealready existsnegotiatecovered bysynchronizehandoffcovered bydelegatewith contextforgetcovered bypersist(ttl: 0)ormutate(delete)indexandassociatecovered by MODEL layer capabilities
The 36-capability model represents genuinely irreducible cognitive operations. Workflow patterns belong in schemas/workflow_catalog.yaml, not as atomic capabilities.
A formal specification for building reliable AI agent systems. It defines:
- 36 atomic capabilities with typed I/O contracts across 9 cognitive layers
- A workflow DSL for composing capabilities with safety semantics
- Schemas for grounded world state and trust-aware conflict resolution
- Validation tools and conformance tests
Grounded Agency is the design philosophy behind the standard. It emphasizes that every agent action should be:
- Grounded — backed by evidence, not hallucination
- Auditable — with provenance and lineage
- Safe — mutations require checkpoints
- Composable — typed contracts between capabilities
Most AI agent systems fail in production because:
- Composition is implicit (no contracts between capabilities)
- State is ungrounded (no provenance for claims)
- Conflict resolution is undefined (no trust model)
- Safety is retrofitted (no checkpoints or rollback)
This standard makes reliability structural, not optional.
- Agent developers building production systems
- Platform engineers designing agent frameworks
- Researchers studying agent architectures
- Organizations deploying AI agents in critical applications
The number 36 emerges from first-principles derivation:
- Foundation: Cognitive architectures (BDI, ReAct, SOAR) provide the theoretical basis
- 9 cognitive layers: PERCEIVE, UNDERSTAND, REASON, MODEL, SYNTHESIZE, EXECUTE, VERIFY, REMEMBER, COORDINATE
- Domain parameterization: Instead of 99 domain-specific skills (detect-anomaly, detect-entity), we use 36 atomic verbs with domain parameters (detect with domain: anomaly)
- Atomicity: Each capability is truly irreducible and serves a single purpose
For the full derivation, see docs/methodology/FIRST_PRINCIPLES_REASSESSMENT.md.
The original 99-capability model included many domain-specific variants. Through first-principles analysis, we discovered these could be unified:
| Old Model (99) | New Model (36) | Pattern |
|---|---|---|
| detect-anomaly, detect-entity, detect-person | detect (domain: anomaly/entity/person) | Domain parameterization |
| estimate-risk, estimate-impact | measure (metric: risk/impact) | Metric parameterization |
| forecast-risk, forecast-time | predict (horizon: risk/time) | Horizon parameterization |
The archived 99-capability model is in _archive/skills/ for reference.
The ontology is stable but extensible. A 37th capability may be added if:
- It cannot be expressed as a composition of existing capabilities
- It passes atomicity tests (irreducible, single purpose, typed contract)
- It's used in at least one reference workflow
- It fits clearly into exactly one layer
See Extension Governance for the RFC process.
Yes, through the governance process:
- Open a GitHub issue proposing the capability
- Create an RFC if the issue gains support
- Demonstrate the capability meets all criteria (non-composable, atomic, useful)
- After review, capability may be accepted
For most use cases, composing existing capabilities or parameterizing with domains is preferred.
Instead of many domain-specific capabilities, use the atomic capability with a domain parameter:
# Old approach (99 capabilities)
- capability: detect-anomaly
store_as: anomaly_out
# New approach (36 capabilities)
- capability: detect
domain: anomaly
store_as: anomaly_outThis keeps the ontology small while preserving expressiveness.
The analogy captures the design philosophy:
| Chemistry | Grounded Agency |
|---|---|
| ~118 elements | 36 capabilities |
| Atoms are irreducible | Capabilities are atomic |
| Molecules are compositions | Workflows are compositions |
| Element groups (metals, gases) | Capability layers (9 cognitive layers) |
What the analogy means: Capabilities are composable primitives. The goal is better workflows (molecules), not more capabilities (atoms).
What it doesn't mean: We don't claim physical law derivation or that 35 is fixed forever.
Those are frameworks for building agents. This is a specification for agent capabilities and their composition. Key differences:
| Aspect | Frameworks | This Standard |
|---|---|---|
| Focus | Implementation | Specification |
| Contracts | Implicit | Explicit I/O schemas |
| Safety | Optional | By construction |
| Validation | Runtime | Static + runtime |
| Provenance | Rare | Required |
The standard can be implemented within existing frameworks.
The standard is language-agnostic. The reference implementation uses Python, but the specification can be implemented in any language. The key artifacts are:
- JSON/YAML schemas
- Capability ontology
- Workflow definitions
# Validate against the specification
python tools/validate_workflows.py path/to/your/workflow.yaml
# Generate patch suggestions
python tools/validate_workflows.py --emit-patch| Level | What It Validates |
|---|---|
| L1 | Capability existence, prerequisites |
| L2 | Schema resolution, type inference |
| L3 | Binding types vs consumer contracts |
| L4 | Patch suggestions, coercion registry |
See CONFORMANCE.md for details.
Bindings reference outputs from earlier steps:
- capability: observe
store_as: observe_out
- capability: detect
domain: anomaly
input_bindings:
context: ${observe_out.observation} # References observe's outputTyped bindings add explicit type annotations:
observations: ${integrate_out.merged.observations: array<object>}Gates are conditional checks that can halt or redirect execution:
gates:
- when: ${checkpoint_out.checkpoint_id} == null
action: stop
message: "No checkpoint created. Do not mutate."- Before any mutation, create a checkpoint
- Execute the mutation
- Verify the outcome
- If verification fails, rollback to the checkpoint
The standard enforces this pattern: mutate requires checkpoint.
Yes. Define a new capability in your ontology extension:
{
"id": "my-custom-capability",
"layer": "UNDERSTAND",
"description": "What it does...",
"input_schema": { ... },
"output_schema": { ... },
"risk": "low",
"mutation": false
}Then create a corresponding skill in skills/my-custom-capability/SKILL.md.
Yes. Define a workflow in YAML:
my_workflow:
goal: What this workflow achieves
risk: medium
steps:
- capability: observe
purpose: First step
store_as: observe_out
- capability: plan
purpose: Create a plan
store_as: plan_out
- capability: checkpoint
purpose: Save state
store_as: checkpoint_out
- capability: mutate
purpose: Execute changes
store_as: resultSee TUTORIAL.md for a guided walkthrough.
The validator is in tools/validate_workflows.py. You can:
- Add new validation rules
- Extend the coercion registry
- Add custom patch suggestions
The standard focuses on validation and specification. Runtime execution depends on your agent framework. The validator ensures workflows are valid; execution is implementation-dependent.
Checkpoints enable rollback if something goes wrong. Without checkpoints:
- Failures are permanent
- Partial execution leaves inconsistent state
- Recovery requires manual intervention
The standard makes this protection automatic.
The standard addresses this through:
- Typed contracts (schema validation)
- Grounded claims (evidence requirements)
- Audit trails (traceability)
However, prompt injection defense is primarily a runtime concern. The standard provides the structure; implementation provides the defense.
Trust scores combine:
- Source authority: Configured weights per source
- Recency: Time decay with configurable half-life
- Field-specific authority: Sources may be authoritative for specific fields
See schemas/authority_trust_model.yaml for configuration.
Yes. The standard includes modality-specific domain profiles and grounding requirements:
- Vision: Spatial grounding with bounding boxes and confidence scores
- Audio: Temporal grounding with timestamps and speaker diarization
- Multimodal: Cross-modal consistency checks and fusion strategies
See the Modality Handling Guide for full details.
Modality profiles add source-type-specific trust weights and grounding requirements. For example, the vision profile requires spatial coordinates for all detections, while the audio profile requires temporal anchors.
- Follow the Quickstart Guide (10 min)
- Complete the Tutorial (30 min)
- Read the Specification
Yes. The standard is Apache 2.0 licensed. You can use, modify, and distribute it in commercial products.
- Read CONTRIBUTING.md
- Follow the RFC process in GOVERNANCE.md
- Submit PRs for issues or enhancements
- GitHub Issues: Report bugs or ask questions
- Discussions: Join the conversation
OpenAI function calling defines how to invoke tools. This standard defines:
- What capabilities exist (ontology)
- How they compose (workflow DSL)
- What contracts they satisfy (schemas)
Function calling is one way to execute capabilities; this standard defines the specification.
MCP defines how to connect tools to language models. This standard defines:
- Capability semantics and contracts
- Workflow composition and safety
- World state and trust models
They are complementary: MCP for connection, this standard for capability semantics.
The standard provides an interoperability layer with OASF. Of the 36 capabilities, 23 are fully mapped to OASF skill codes, 7 are partially mapped, and 6 use synthetic GA-extension codes. See OASF Coverage Report for the full mapping.
A2A defines how agents communicate with each other (inter-agent transport). This standard defines what agents can do (intra-agent capability semantics). They are complementary:
- A2A: "How do agents talk to each other?"
- Agent Capability Standard: "What can each agent do, and how do capabilities compose safely?"
The key distinction is spec vs. framework:
| Aspect | Frameworks (LangChain, AutoGPT, CrewAI) | This Standard |
|---|---|---|
| What it is | Implementation library | Specification |
| Contracts | Implicit (runtime discovery) | Explicit (typed I/O schemas) |
| Safety | Optional middleware | Structural (checkpoint required for mutation) |
| Validation | Runtime errors | Static analysis (4 conformance levels) |
| Provenance | Rarely tracked | Required (evidence anchors) |
The standard can be implemented within existing frameworks. It's not a replacement—it's a layer of reliability on top.
Still have questions? Open an issue.