diff --git a/README.md b/README.md
index 0aec036..7585f6f 100644
--- a/README.md
+++ b/README.md
@@ -2,9 +2,9 @@
A curated, implementation-first list of **agent harness engineering** resources, with GitHub projects as the primary focus.
-- Total entries: **171**
-- GitHub entries: **146 (85.4%)**
-- GitHub in project categories (excluding readings): **142/142 (100.0%)**
+- Total entries: **172**
+- GitHub entries: **147 (85.5%)**
+- GitHub in project categories (excluding readings): **143/143 (100.0%)**
- Categories: **9**
- Last verified: **2026-05-08**
- Language: [English](./README.md) | [中文](./README_zh.md)
@@ -51,7 +51,7 @@ A curated, implementation-first list of **agent harness engineering** resources,
| Evaluation Harnesses & Benchmarks | 21 |
| Observability & Reliability Operations | 14 |
| Guardrails, Security & Governance | 12 |
-| Reference Harness Implementations | 36 |
+| Reference Harness Implementations | 37 |
| Essential Readings & Ecosystem Maps | 29 |
## Catalog
@@ -66,41 +66,41 @@ Notes:
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| DeerFlow | [GitHub](https://github.com/bytedance/deer-flow) | [](https://github.com/bytedance/deer-flow) | long-horizon, memory, subagents | Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes. |
-| AutoGen | [GitHub](https://github.com/microsoft/autogen) | [](https://github.com/microsoft/autogen) | multi-agent, orchestration, framework | Programming framework for agentic AI with multi-agent interaction and orchestration. |
-| Agno | [GitHub](https://github.com/agno-agi/agno) | [](https://github.com/agno-agi/agno) | scale, runtime, management | Agent software runtime focused on running and managing agentic systems at scale. |
-| LangGraph | [GitHub](https://github.com/langchain-ai/langgraph) | [](https://github.com/langchain-ai/langgraph) | graph, workflow, runtime | Graph-based runtime for resilient stateful agents and deterministic workflow control. |
-| Semantic Kernel | [GitHub](https://github.com/microsoft/semantic-kernel) | [](https://github.com/microsoft/semantic-kernel) | enterprise, orchestration, plugins | Enterprise-grade agentic application framework with orchestration and plugin patterns. |
-| OpenAI Agents SDK (Python) | [GitHub](https://github.com/openai/openai-agents-python) | [](https://github.com/openai/openai-agents-python) | sdk, handoff, workflows | Lightweight framework for multi-agent workflows, handoffs, and production patterns. |
-| deepagents | [GitHub](https://github.com/langchain-ai/deepagents) | [](https://github.com/langchain-ai/deepagents) | runtime, orchestration, long-running | Open-source harness for long-running, tool-using agents with planning and subagent patterns. |
-| Archon | [GitHub](https://github.com/coleam00/Archon) | [](https://github.com/coleam00/Archon) | workflow-engine, worktrees, validation | Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates. |
-| Google ADK (Python) | [GitHub](https://github.com/google/adk-python) | [](https://github.com/google/adk-python) | toolkit, deployment, evaluation | Code-first toolkit to build, evaluate, and deploy advanced AI agents. |
-| PydanticAI | [GitHub](https://github.com/pydantic/pydantic-ai) | [](https://github.com/pydantic/pydantic-ai) | python, typing, schema | Type-safe Python framework for agents with strong schema contracts and tooling. |
-| Hive | [GitHub](https://github.com/aden-hive/hive) | [](https://github.com/aden-hive/hive) | harness, orchestration, runtime | Outcome-driven agent runtime harness with explicit control loops and orchestration blocks. |
-| Microsoft Agent Framework | [GitHub](https://github.com/microsoft/agent-framework) | [](https://github.com/microsoft/agent-framework) | multi-agent, workflows, observability | Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability. |
-| VoltAgent | [GitHub](https://github.com/VoltAgent/voltagent) | [](https://github.com/VoltAgent/voltagent) | typescript, platform, runtime | TypeScript agent engineering platform built around open runtime abstractions. |
-| mcp-agent | [GitHub](https://github.com/lastmile-ai/mcp-agent) | [](https://github.com/lastmile-ai/mcp-agent) | mcp, runtime, workflow | Practical agent framework centered on MCP tool ecosystems and workflow composition. |
-| Yao | [GitHub](https://github.com/YaoApp/yao) | [](https://github.com/YaoApp/yao) | single-binary, runtime, autonomous | Single-binary runtime for defining and running autonomous agents. |
-| Cloudflare Agents | [GitHub](https://github.com/cloudflare/agents) | [](https://github.com/cloudflare/agents) | platform, deployment, runtime | Platform runtime for building and deploying agents with production infrastructure primitives. |
-| Docker Agent | [GitHub](https://github.com/docker/docker-agent) | [](https://github.com/docker/docker-agent) | docker, runtime, container | Agent builder and runtime stack emphasizing container-native execution. |
-| NeMo Agent Toolkit | [GitHub](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | [](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | multi-agent, optimization, toolkit | Open toolkit for connecting and optimizing teams of AI agents. |
-| Scion | [GitHub](https://github.com/GoogleCloudPlatform/scion) | [](https://github.com/GoogleCloudPlatform/scion) | multi-agent, containers, orchestration | Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes. |
-| deepagentsjs | [GitHub](https://github.com/langchain-ai/deepagentsjs) | [](https://github.com/langchain-ai/deepagentsjs) | typescript, langgraph, subagents | TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks. |
-| hankweave | [GitHub](https://github.com/SouthBridgeAI/hankweave-runtime) | [](https://github.com/SouthBridgeAI/hankweave-runtime) | long-horizon, runtime, checkpoints | Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals. |
+| DeerFlow | [GitHub](https://github.com/bytedance/deer-flow) | [](https://github.com/bytedance/deer-flow) | long-horizon, memory, subagents | Long-horizon super-agent harness integrating memory, tools, subagents, and sandboxes. |
+| AutoGen | [GitHub](https://github.com/microsoft/autogen) | [](https://github.com/microsoft/autogen) | multi-agent, orchestration, framework | Programming framework for agentic AI with multi-agent interaction and orchestration. |
+| Agno | [GitHub](https://github.com/agno-agi/agno) | [](https://github.com/agno-agi/agno) | scale, runtime, management | Agent software runtime focused on running and managing agentic systems at scale. |
+| LangGraph | [GitHub](https://github.com/langchain-ai/langgraph) | [](https://github.com/langchain-ai/langgraph) | graph, workflow, runtime | Graph-based runtime for resilient stateful agents and deterministic workflow control. |
+| Semantic Kernel | [GitHub](https://github.com/microsoft/semantic-kernel) | [](https://github.com/microsoft/semantic-kernel) | enterprise, orchestration, plugins | Enterprise-grade agentic application framework with orchestration and plugin patterns. |
+| OpenAI Agents SDK (Python) | [GitHub](https://github.com/openai/openai-agents-python) | [](https://github.com/openai/openai-agents-python) | sdk, handoff, workflows | Lightweight framework for multi-agent workflows, handoffs, and production patterns. |
+| deepagents | [GitHub](https://github.com/langchain-ai/deepagents) | [](https://github.com/langchain-ai/deepagents) | runtime, orchestration, long-running | Open-source harness for long-running, tool-using agents with planning and subagent patterns. |
+| Archon | [GitHub](https://github.com/coleam00/Archon) | [](https://github.com/coleam00/Archon) | workflow-engine, worktrees, validation | Workflow engine for AI coding agents with YAML-defined phases, isolated worktrees, and validation gates. |
+| Google ADK (Python) | [GitHub](https://github.com/google/adk-python) | [](https://github.com/google/adk-python) | toolkit, deployment, evaluation | Code-first toolkit to build, evaluate, and deploy advanced AI agents. |
+| PydanticAI | [GitHub](https://github.com/pydantic/pydantic-ai) | [](https://github.com/pydantic/pydantic-ai) | python, typing, schema | Type-safe Python framework for agents with strong schema contracts and tooling. |
+| Hive | [GitHub](https://github.com/aden-hive/hive) | [](https://github.com/aden-hive/hive) | harness, orchestration, runtime | Outcome-driven agent runtime harness with explicit control loops and orchestration blocks. |
+| Microsoft Agent Framework | [GitHub](https://github.com/microsoft/agent-framework) | [](https://github.com/microsoft/agent-framework) | multi-agent, workflows, observability | Multi-language framework for building, orchestrating, and deploying AI agents with graph workflows and observability. |
+| VoltAgent | [GitHub](https://github.com/VoltAgent/voltagent) | [](https://github.com/VoltAgent/voltagent) | typescript, platform, runtime | TypeScript agent engineering platform built around open runtime abstractions. |
+| mcp-agent | [GitHub](https://github.com/lastmile-ai/mcp-agent) | [](https://github.com/lastmile-ai/mcp-agent) | mcp, runtime, workflow | Practical agent framework centered on MCP tool ecosystems and workflow composition. |
+| Yao | [GitHub](https://github.com/YaoApp/yao) | [](https://github.com/YaoApp/yao) | single-binary, runtime, autonomous | Single-binary runtime for defining and running autonomous agents. |
+| Cloudflare Agents | [GitHub](https://github.com/cloudflare/agents) | [](https://github.com/cloudflare/agents) | platform, deployment, runtime | Platform runtime for building and deploying agents with production infrastructure primitives. |
+| Docker Agent | [GitHub](https://github.com/docker/docker-agent) | [](https://github.com/docker/docker-agent) | docker, runtime, container | Agent builder and runtime stack emphasizing container-native execution. |
+| NeMo Agent Toolkit | [GitHub](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | [](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | multi-agent, optimization, toolkit | Open toolkit for connecting and optimizing teams of AI agents. |
+| Scion | [GitHub](https://github.com/GoogleCloudPlatform/scion) | [](https://github.com/GoogleCloudPlatform/scion) | multi-agent, containers, orchestration | Experimental multi-agent orchestration testbed that runs isolated agent harnesses in containers, worktrees, and remote runtimes. |
+| deepagentsjs | [GitHub](https://github.com/langchain-ai/deepagentsjs) | [](https://github.com/langchain-ai/deepagentsjs) | typescript, langgraph, subagents | TypeScript agent harness with built-in planning, filesystem tools, subagents, and LangGraph-native runtime hooks. |
+| hankweave | [GitHub](https://github.com/SouthBridgeAI/hankweave-runtime) | [](https://github.com/SouthBridgeAI/hankweave-runtime) | long-horizon, runtime, checkpoints | Headless-first long-horizon runtime that orchestrates existing agent harnesses with sentinels, loops, checkpoints, and event journals. |
### Context & Working-State Engineering
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| everything-claude-code | [GitHub](https://github.com/affaan-m/everything-claude-code) | [](https://github.com/affaan-m/everything-claude-code) | context, skills, harness-practices | Large open repository of harness practices around memory, skills, and context control for coding agents. |
-| claude-mem | [GitHub](https://github.com/thedotmack/claude-mem) | [](https://github.com/thedotmack/claude-mem) | memory, context, session | Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs. |
-| planning-with-files | [GitHub](https://github.com/OthmanAdi/planning-with-files) | [](https://github.com/OthmanAdi/planning-with-files) | planning, skills, persistence | Skill package for persistent file-based planning in coding-agent workflows. |
-| Agent Skills for Context Engineering | [GitHub](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | [](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | skills, context, production | Large skill library oriented around context engineering and production agents. |
-| Context-Engineering Handbook | [GitHub](https://github.com/davidkimai/Context-Engineering) | [](https://github.com/davidkimai/Context-Engineering) | context-engineering, handbook, practices | First-principles handbook focused on practical context engineering for agent systems. |
-| CCPM | [GitHub](https://github.com/automazeio/ccpm) | [](https://github.com/automazeio/ccpm) | planning, github-issues, parallel-execution | Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution. |
-| Trellis | [GitHub](https://github.com/mindfold-ai/Trellis) | [](https://github.com/mindfold-ai/Trellis) | specs, memory, workflow | Multi-platform coding-agent workflow framework with task context, project memory, and spec injection. |
-| Awesome Context Engineering | [GitHub](https://github.com/Meirtz/Awesome-Context-Engineering) | [](https://github.com/Meirtz/Awesome-Context-Engineering) | awesome-list, context, survey | Survey-style list for context engineering resources and frameworks. |
+| everything-claude-code | [GitHub](https://github.com/affaan-m/everything-claude-code) | [](https://github.com/affaan-m/everything-claude-code) | context, skills, harness-practices | Large open repository of harness practices around memory, skills, and context control for coding agents. |
+| claude-mem | [GitHub](https://github.com/thedotmack/claude-mem) | [](https://github.com/thedotmack/claude-mem) | memory, context, session | Plugin-style memory layer that captures session history and reinjects relevant context into future coding runs. |
+| planning-with-files | [GitHub](https://github.com/OthmanAdi/planning-with-files) | [](https://github.com/OthmanAdi/planning-with-files) | planning, skills, persistence | Skill package for persistent file-based planning in coding-agent workflows. |
+| Agent Skills for Context Engineering | [GitHub](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | [](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | skills, context, production | Large skill library oriented around context engineering and production agents. |
+| Context-Engineering Handbook | [GitHub](https://github.com/davidkimai/Context-Engineering) | [](https://github.com/davidkimai/Context-Engineering) | context-engineering, handbook, practices | First-principles handbook focused on practical context engineering for agent systems. |
+| CCPM | [GitHub](https://github.com/automazeio/ccpm) | [](https://github.com/automazeio/ccpm) | planning, github-issues, parallel-execution | Spec-driven project-manager skill that turns PRDs and GitHub issues into persistent context and parallel agent execution. |
+| Trellis | [GitHub](https://github.com/mindfold-ai/Trellis) | [](https://github.com/mindfold-ai/Trellis) | specs, memory, workflow | Multi-platform coding-agent workflow framework with task context, project memory, and spec injection. |
+| Awesome Context Engineering | [GitHub](https://github.com/Meirtz/Awesome-Context-Engineering) | [](https://github.com/Meirtz/Awesome-Context-Engineering) | awesome-list, context, survey | Survey-style list for context engineering resources and frameworks. |
| context-space | [GitHub](https://github.com/context-space/context-space) | [](https://github.com/context-space/context-space) | context, infrastructure, mcp | Infrastructure project focused on context engineering building blocks and MCP-centric integrations. |
@@ -108,67 +108,67 @@ Notes:
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| Daytona | [GitHub](https://github.com/daytonaio/daytona) | [](https://github.com/daytonaio/daytona) | sandbox, execution, infra | Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs. |
-| CUA | [GitHub](https://github.com/trycua/cua) | [](https://github.com/trycua/cua) | computer-use, sandbox, infra | Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support. |
-| E2B | [GitHub](https://github.com/e2b-dev/E2B) | [](https://github.com/e2b-dev/E2B) | cloud-sandbox, execution, enterprise | Secure cloud environments with real tools for production-grade agent execution. |
-| Browser Harness | [GitHub](https://github.com/browser-use/browser-harness) | [](https://github.com/browser-use/browser-harness) | browser, cdp, self-healing | Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight. |
-| OpenSandbox | [GitHub](https://github.com/alibaba/OpenSandbox) | [](https://github.com/alibaba/OpenSandbox) | sandbox, security, runtime | Secure and extensible sandbox runtime built for agent workloads. |
-| agent-infra sandbox | [GitHub](https://github.com/agent-infra/sandbox) | [](https://github.com/agent-infra/sandbox) | all-in-one, browser, shell | All-in-one sandbox combining browser, shell, files, MCP, and IDE server. |
-| Judge0 | [GitHub](https://github.com/judge0/judge0) | [](https://github.com/judge0/judge0) | code-execution, sandbox, backend | Scalable sandboxed code execution system usable as an agent execution backend. |
-| Agent Sandbox | [GitHub](https://github.com/kubernetes-sigs/agent-sandbox) | [](https://github.com/kubernetes-sigs/agent-sandbox) | kubernetes, sandbox, stateful | Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support. |
-| stakpak/agent | [GitHub](https://github.com/stakpak/agent) | [](https://github.com/stakpak/agent) | always-on, autonomous, ops | Always-on open agent that runs on your machines with autonomous operational loops. |
+| Daytona | [GitHub](https://github.com/daytonaio/daytona) | [](https://github.com/daytonaio/daytona) | sandbox, execution, infra | Secure and elastic sandbox infrastructure for running AI-generated code with file, Git, LSP, and execution APIs. |
+| CUA | [GitHub](https://github.com/trycua/cua) | [](https://github.com/trycua/cua) | computer-use, sandbox, infra | Infrastructure stack for computer-use agents with sandbox, SDK, and benchmark support. |
+| E2B | [GitHub](https://github.com/e2b-dev/E2B) | [](https://github.com/e2b-dev/E2B) | cloud-sandbox, execution, enterprise | Secure cloud environments with real tools for production-grade agent execution. |
+| Browser Harness | [GitHub](https://github.com/browser-use/browser-harness) | [](https://github.com/browser-use/browser-harness) | browser, cdp, self-healing | Thin editable CDP harness that connects LLMs directly to real browsers and lets agents extend helpers in flight. |
+| OpenSandbox | [GitHub](https://github.com/alibaba/OpenSandbox) | [](https://github.com/alibaba/OpenSandbox) | sandbox, security, runtime | Secure and extensible sandbox runtime built for agent workloads. |
+| agent-infra sandbox | [GitHub](https://github.com/agent-infra/sandbox) | [](https://github.com/agent-infra/sandbox) | all-in-one, browser, shell | All-in-one sandbox combining browser, shell, files, MCP, and IDE server. |
+| Judge0 | [GitHub](https://github.com/judge0/judge0) | [](https://github.com/judge0/judge0) | code-execution, sandbox, backend | Scalable sandboxed code execution system usable as an agent execution backend. |
+| Agent Sandbox | [GitHub](https://github.com/kubernetes-sigs/agent-sandbox) | [](https://github.com/kubernetes-sigs/agent-sandbox) | kubernetes, sandbox, stateful | Kubernetes-native sandbox control plane for isolated, stateful agent runtimes with stable identity, persistence, and warm-pool support. |
+| stakpak/agent | [GitHub](https://github.com/stakpak/agent) | [](https://github.com/stakpak/agent) | always-on, autonomous, ops | Always-on open agent that runs on your machines with autonomous operational loops. |
| OSS-Fuzz Gen | [GitHub](https://github.com/google/oss-fuzz-gen) | [](https://github.com/google/oss-fuzz-gen) | fuzzing, security, execution | LLM-powered fuzzing workflows integrated with controlled execution contexts. |
| E2B Desktop Sandbox | [GitHub](https://github.com/e2b-dev/desktop) | [](https://github.com/e2b-dev/desktop) | desktop, sandbox, computer-use | Secure virtual desktop sandbox for computer-use agents with SDK control and screen streaming. |
-| Tensorlake | [GitHub](https://github.com/tensorlakeai/tensorlake) | [](https://github.com/tensorlakeai/tensorlake) | microvm, sandbox, orchestration | Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration. |
-| Arrakis | [GitHub](https://github.com/abshkbh/arrakis) | [](https://github.com/abshkbh/arrakis) | sandbox, microvm, snapshots | Self-hosted sandbox substrate with MicroVM isolation, snapshot restore, and REST, SDK, and MCP interfaces for agent code execution and computer use. |
-| AgentScope Runtime | [GitHub](https://github.com/agentscope-ai/agentscope-runtime) | [](https://github.com/agentscope-ai/agentscope-runtime) | runtime, sandbox, deployment | Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services. |
-| SWE-ReX | [GitHub](https://github.com/SWE-agent/SWE-ReX) | [](https://github.com/SWE-agent/SWE-ReX) | sandbox, execution, coding-agent | Sandboxed execution infrastructure for AI coding agents at local and cloud scale. |
-| sandboxed.sh | [GitHub](https://github.com/Th0rgal/sandboxed.sh) | [](https://github.com/Th0rgal/sandboxed.sh) | self-hosted, isolation, orchestrator | Self-hosted orchestrator running coding agents inside isolated Linux workspaces. |
+| Tensorlake | [GitHub](https://github.com/tensorlakeai/tensorlake) | [](https://github.com/tensorlakeai/tensorlake) | microvm, sandbox, orchestration | Serverless runtime for agent sandboxes with MicroVM isolation, snapshots, suspend-resume, and background orchestration. |
+| Arrakis | [GitHub](https://github.com/abshkbh/arrakis) | [](https://github.com/abshkbh/arrakis) | sandbox, microvm, snapshots | Self-hosted sandbox substrate with MicroVM isolation, snapshot restore, and REST, SDK, and MCP interfaces for agent code execution and computer use. |
+| AgentScope Runtime | [GitHub](https://github.com/agentscope-ai/agentscope-runtime) | [](https://github.com/agentscope-ai/agentscope-runtime) | runtime, sandbox, deployment | Production runtime for agent apps with secure tool sandboxes, deployment APIs, observability, and state services. |
+| SWE-ReX | [GitHub](https://github.com/SWE-agent/SWE-ReX) | [](https://github.com/SWE-agent/SWE-ReX) | sandbox, execution, coding-agent | Sandboxed execution infrastructure for AI coding agents at local and cloud scale. |
+| sandboxed.sh | [GitHub](https://github.com/Th0rgal/sandboxed.sh) | [](https://github.com/Th0rgal/sandboxed.sh) | self-hosted, isolation, orchestrator | Self-hosted orchestrator running coding agents inside isolated Linux workspaces. |
| Capsule | [GitHub](https://github.com/capsulerun/capsule) | [](https://github.com/capsulerun/capsule) | wasm, sandbox, task-runtime | Durable runtime that coordinates agent tasks inside isolated WebAssembly sandboxes with retries and lifecycle tracking. |
-| terminal-bench-env | [GitHub](https://github.com/ucsb-mlsec/terminal-bench-env) | [](https://github.com/ucsb-mlsec/terminal-bench-env) | terminal, benchmark-env, sandbox | Environment layer for terminal-agent benchmark execution. |
+| terminal-bench-env | [GitHub](https://github.com/ucsb-mlsec/terminal-bench-env) | [](https://github.com/ucsb-mlsec/terminal-bench-env) | terminal, benchmark-env, sandbox | Environment layer for terminal-agent benchmark execution. |
### Protocols, Tool Interfaces & Agent Contracts
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| GitHub Spec Kit | [GitHub](https://github.com/github/spec-kit) | [](https://github.com/github/spec-kit) | spec-driven, workflows, tooling | Toolkit for spec-driven development to guide deterministic agent execution. |
-| MCP Servers | [GitHub](https://github.com/modelcontextprotocol/servers) | [](https://github.com/modelcontextprotocol/servers) | mcp, servers, implementations | Official collection of MCP server implementations across tools and domains. |
-| AGENTS.md | [GitHub](https://github.com/agentsmd/agents.md) | [](https://github.com/agentsmd/agents.md) | spec, agent-file, instructions | Open format for repository-local instructions that coding agents can follow. |
-| Model Context Protocol | [GitHub](https://github.com/modelcontextprotocol/modelcontextprotocol) | [](https://github.com/modelcontextprotocol/modelcontextprotocol) | mcp, protocol, interoperability | Core specification and docs for MCP-based tool and context interoperability. |
-| directories (rules and MCP indexes) | [GitHub](https://github.com/leerob/directories) | [](https://github.com/leerob/directories) | directories, mcp, rules | Curated directories of agent rules and MCP servers for tool discovery. |
-| LangChain MCP Adapters | [GitHub](https://github.com/langchain-ai/langchain-mcp-adapters) | [](https://github.com/langchain-ai/langchain-mcp-adapters) | mcp, adapters, integration | Adapters connecting LangChain components with MCP servers. |
-| Microsoft MCP Servers | [GitHub](https://github.com/microsoft/mcp) | [](https://github.com/microsoft/mcp) | mcp, enterprise, servers | Microsoft's official MCP server catalog for enterprise data and tools. |
-| ACPX | [GitHub](https://github.com/openclaw/acpx) | [](https://github.com/openclaw/acpx) | acp, client, sessions | Headless CLI client for stateful Agent Client Protocol sessions. |
-| Microsoft Learn MCP | [GitHub](https://github.com/MicrosoftDocs/mcp) | [](https://github.com/MicrosoftDocs/mcp) | mcp, docs, grounding | MCP server and CLI for grounding agents with Microsoft documentation sources. |
+| GitHub Spec Kit | [GitHub](https://github.com/github/spec-kit) | [](https://github.com/github/spec-kit) | spec-driven, workflows, tooling | Toolkit for spec-driven development to guide deterministic agent execution. |
+| MCP Servers | [GitHub](https://github.com/modelcontextprotocol/servers) | [](https://github.com/modelcontextprotocol/servers) | mcp, servers, implementations | Official collection of MCP server implementations across tools and domains. |
+| AGENTS.md | [GitHub](https://github.com/agentsmd/agents.md) | [](https://github.com/agentsmd/agents.md) | spec, agent-file, instructions | Open format for repository-local instructions that coding agents can follow. |
+| Model Context Protocol | [GitHub](https://github.com/modelcontextprotocol/modelcontextprotocol) | [](https://github.com/modelcontextprotocol/modelcontextprotocol) | mcp, protocol, interoperability | Core specification and docs for MCP-based tool and context interoperability. |
+| directories (rules and MCP indexes) | [GitHub](https://github.com/leerob/directories) | [](https://github.com/leerob/directories) | directories, mcp, rules | Curated directories of agent rules and MCP servers for tool discovery. |
+| LangChain MCP Adapters | [GitHub](https://github.com/langchain-ai/langchain-mcp-adapters) | [](https://github.com/langchain-ai/langchain-mcp-adapters) | mcp, adapters, integration | Adapters connecting LangChain components with MCP servers. |
+| Microsoft MCP Servers | [GitHub](https://github.com/microsoft/mcp) | [](https://github.com/microsoft/mcp) | mcp, enterprise, servers | Microsoft's official MCP server catalog for enterprise data and tools. |
+| ACPX | [GitHub](https://github.com/openclaw/acpx) | [](https://github.com/openclaw/acpx) | acp, client, sessions | Headless CLI client for stateful Agent Client Protocol sessions. |
+| Microsoft Learn MCP | [GitHub](https://github.com/MicrosoftDocs/mcp) | [](https://github.com/MicrosoftDocs/mcp) | mcp, docs, grounding | MCP server and CLI for grounding agents with Microsoft documentation sources. |
| IBM MCP | [GitHub](https://github.com/IBM/mcp) | [](https://github.com/IBM/mcp) | mcp, clients, tooling | IBM collection of MCP servers, clients, and developer tooling. |
-| AGENT.md | [GitHub](https://github.com/agentmd/agent.md) | [](https://github.com/agentmd/agent.md) | standard, agent-file, interoperability | Standardized machine-readable file format for agentic coding tools. |
+| AGENT.md | [GitHub](https://github.com/agentmd/agent.md) | [](https://github.com/agentmd/agent.md) | standard, agent-file, interoperability | Standardized machine-readable file format for agentic coding tools. |
### Evaluation Harnesses & Benchmarks
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| Promptfoo | [GitHub](https://github.com/promptfoo/promptfoo) | [](https://github.com/promptfoo/promptfoo) | eval, red-team, ci | Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool. |
-| DeepEval | [GitHub](https://github.com/confident-ai/deepeval) | [](https://github.com/confident-ai/deepeval) | evaluation, framework, testing | LLM evaluation framework supporting agent and workflow quality testing. |
-| RAGAS | [GitHub](https://github.com/vibrantlabsai/ragas) | [](https://github.com/vibrantlabsai/ragas) | rag, metrics, evaluation | Open evaluation toolkit for LLM and RAG quality metrics. |
-| lm-evaluation-harness | [GitHub](https://github.com/EleutherAI/lm-evaluation-harness) | [](https://github.com/EleutherAI/lm-evaluation-harness) | benchmark, harness, llm | Popular benchmark harness for consistent LLM evaluation across tasks. |
-| SWE-bench | [GitHub](https://github.com/SWE-bench/SWE-bench) | [](https://github.com/SWE-bench/SWE-bench) | benchmark, swe, evaluation | Standard benchmark for evaluating issue-fixing software engineering agents. |
-| verifiers | [GitHub](https://github.com/PrimeIntellect-ai/verifiers) | [](https://github.com/PrimeIntellect-ai/verifiers) | verifier, rl, evaluation | Library for RL environments and verifier-based evaluation loops. |
-| AgentBench | [GitHub](https://github.com/THUDM/AgentBench) | [](https://github.com/THUDM/AgentBench) | benchmark, cross-domain, agent | Cross-environment benchmark for evaluating LLM agents as tool-using systems. |
-| LangWatch | [GitHub](https://github.com/langwatch/langwatch) | [](https://github.com/langwatch/langwatch) | simulation, evaluation, testing | End-to-end platform for agent simulations, evaluation loops, and production testing. |
-| EvalScope | [GitHub](https://github.com/modelscope/evalscope) | [](https://github.com/modelscope/evalscope) | benchmark, framework, llm | Customizable framework for large-model benchmarking and performance evaluation. |
-| Terminal-Bench | [GitHub](https://github.com/harbor-framework/terminal-bench) | [](https://github.com/harbor-framework/terminal-bench) | terminal, benchmark, long-horizon | Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks. |
-| Harbor | [GitHub](https://github.com/harbor-framework/harbor) | [](https://github.com/harbor-framework/harbor) | evaluation, harness, rl-env | Framework for running agent evaluations and constructing RL-style environments. |
-| tau2-bench | [GitHub](https://github.com/sierra-research/tau2-bench) | [](https://github.com/sierra-research/tau2-bench) | tool-use, interaction, benchmark | Tool-agent-user interaction benchmark emphasizing multi-step execution quality. |
-| NeMo Gym | [GitHub](https://github.com/NVIDIA-NeMo/Gym) | [](https://github.com/NVIDIA-NeMo/Gym) | rl-env, training, evaluation | Toolkit for building RL environments suitable for LLM/agent training and eval. |
+| Promptfoo | [GitHub](https://github.com/promptfoo/promptfoo) | [](https://github.com/promptfoo/promptfoo) | eval, red-team, ci | Config-driven prompt/agent/RAG testing, comparison, and red-team evaluation tool. |
+| DeepEval | [GitHub](https://github.com/confident-ai/deepeval) | [](https://github.com/confident-ai/deepeval) | evaluation, framework, testing | LLM evaluation framework supporting agent and workflow quality testing. |
+| RAGAS | [GitHub](https://github.com/vibrantlabsai/ragas) | [](https://github.com/vibrantlabsai/ragas) | rag, metrics, evaluation | Open evaluation toolkit for LLM and RAG quality metrics. |
+| lm-evaluation-harness | [GitHub](https://github.com/EleutherAI/lm-evaluation-harness) | [](https://github.com/EleutherAI/lm-evaluation-harness) | benchmark, harness, llm | Popular benchmark harness for consistent LLM evaluation across tasks. |
+| SWE-bench | [GitHub](https://github.com/SWE-bench/SWE-bench) | [](https://github.com/SWE-bench/SWE-bench) | benchmark, swe, evaluation | Standard benchmark for evaluating issue-fixing software engineering agents. |
+| verifiers | [GitHub](https://github.com/PrimeIntellect-ai/verifiers) | [](https://github.com/PrimeIntellect-ai/verifiers) | verifier, rl, evaluation | Library for RL environments and verifier-based evaluation loops. |
+| AgentBench | [GitHub](https://github.com/THUDM/AgentBench) | [](https://github.com/THUDM/AgentBench) | benchmark, cross-domain, agent | Cross-environment benchmark for evaluating LLM agents as tool-using systems. |
+| LangWatch | [GitHub](https://github.com/langwatch/langwatch) | [](https://github.com/langwatch/langwatch) | simulation, evaluation, testing | End-to-end platform for agent simulations, evaluation loops, and production testing. |
+| EvalScope | [GitHub](https://github.com/modelscope/evalscope) | [](https://github.com/modelscope/evalscope) | benchmark, framework, llm | Customizable framework for large-model benchmarking and performance evaluation. |
+| Terminal-Bench | [GitHub](https://github.com/harbor-framework/terminal-bench) | [](https://github.com/harbor-framework/terminal-bench) | terminal, benchmark, long-horizon | Terminal-native benchmark suite for long-horizon, verification-heavy agent tasks. |
+| Harbor | [GitHub](https://github.com/harbor-framework/harbor) | [](https://github.com/harbor-framework/harbor) | evaluation, harness, rl-env | Framework for running agent evaluations and constructing RL-style environments. |
+| tau2-bench | [GitHub](https://github.com/sierra-research/tau2-bench) | [](https://github.com/sierra-research/tau2-bench) | tool-use, interaction, benchmark | Tool-agent-user interaction benchmark emphasizing multi-step execution quality. |
+| NeMo Gym | [GitHub](https://github.com/NVIDIA-NeMo/Gym) | [](https://github.com/NVIDIA-NeMo/Gym) | rl-env, training, evaluation | Toolkit for building RL environments suitable for LLM/agent training and eval. |
| TheAgentCompany | [GitHub](https://github.com/TheAgentCompany/TheAgentCompany) | [](https://github.com/TheAgentCompany/TheAgentCompany) | benchmark, workplace, multi-step | Agent benchmark with simulated software-company tasks for evaluating multi-step workplace autonomy. |
-| auto-harness | [GitHub](https://github.com/neosigmaai/auto-harness) | [](https://github.com/neosigmaai/auto-harness) | optimization, regression, evals | Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight. |
-| Inspect Evals | [GitHub](https://github.com/UKGovernmentBEIS/inspect_evals) | [](https://github.com/UKGovernmentBEIS/inspect_evals) | inspect, eval-suite, reproducibility | Evaluation suite collection for Inspect AI workflows. |
-| SWE-Bench Pro | [GitHub](https://github.com/scaleapi/SWE-bench_Pro-os) | [](https://github.com/scaleapi/SWE-bench_Pro-os) | swe, benchmark, long-horizon | Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents. |
+| auto-harness | [GitHub](https://github.com/neosigmaai/auto-harness) | [](https://github.com/neosigmaai/auto-harness) | optimization, regression, evals | Benchmark-gated optimization loop that mines failures, edits agent code, and guards against regressions overnight. |
+| Inspect Evals | [GitHub](https://github.com/UKGovernmentBEIS/inspect_evals) | [](https://github.com/UKGovernmentBEIS/inspect_evals) | inspect, eval-suite, reproducibility | Evaluation suite collection for Inspect AI workflows. |
+| SWE-Bench Pro | [GitHub](https://github.com/scaleapi/SWE-bench_Pro-os) | [](https://github.com/scaleapi/SWE-bench_Pro-os) | swe, benchmark, long-horizon | Long-horizon software-engineering benchmark with reproducible Docker-based evaluation for issue-driven coding agents. |
| Agent Evaluation | [GitHub](https://github.com/awslabs/agent-evaluation) | [](https://github.com/awslabs/agent-evaluation) | evaluation, testing, ci | AWS framework for testing virtual agents with evaluator-driven multi-turn conversations, hooks, and CI-friendly workflows. |
-| WorkArena | [GitHub](https://github.com/ServiceNow/WorkArena) | [](https://github.com/ServiceNow/WorkArena) | browser, benchmark, enterprise | Browser benchmark for practical enterprise-like knowledge work tasks. |
-| OpenHands Benchmarks | [GitHub](https://github.com/OpenHands/benchmarks) | [](https://github.com/OpenHands/benchmarks) | openhands, eval, harness | Evaluation harness and benchmark definitions for OpenHands systems. |
+| WorkArena | [GitHub](https://github.com/ServiceNow/WorkArena) | [](https://github.com/ServiceNow/WorkArena) | browser, benchmark, enterprise | Browser benchmark for practical enterprise-like knowledge work tasks. |
+| OpenHands Benchmarks | [GitHub](https://github.com/OpenHands/benchmarks) | [](https://github.com/OpenHands/benchmarks) | openhands, eval, harness | Evaluation harness and benchmark definitions for OpenHands systems. |
| WebArena-Verified | [GitHub](https://github.com/ServiceNow/webarena-verified) | [](https://github.com/ServiceNow/webarena-verified) | web-agent, benchmark, deterministic | Verified web-agent benchmark with deterministic evaluators. |
@@ -176,90 +176,91 @@ Notes:
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| Langfuse | [GitHub](https://github.com/langfuse/langfuse) | [](https://github.com/langfuse/langfuse) | llmops, tracing, metrics | Open-source LLM engineering platform for traces, metrics, prompts, and evals. |
-| MLflow | [GitHub](https://github.com/mlflow/mlflow) | [](https://github.com/mlflow/mlflow) | platform, monitoring, evaluation | Broad AI engineering platform with monitoring and evaluation support for agents. |
-| Opik | [GitHub](https://github.com/comet-ml/opik) | [](https://github.com/comet-ml/opik) | monitoring, eval, tracing | End-to-end debug/eval/monitoring stack for LLM apps and agent workflows. |
-| RagaAI Catalyst | [GitHub](https://github.com/raga-ai-hub/RagaAI-Catalyst) | [](https://github.com/raga-ai-hub/RagaAI-Catalyst) | agentops, analytics, monitoring | Agent observability and monitoring framework with timeline and graph analytics. |
-| TensorZero | [GitHub](https://github.com/tensorzero/tensorzero) | [](https://github.com/tensorzero/tensorzero) | llmops, gateway, optimization | Open LLMOps stack unifying gateway, observability, evaluation, and optimization. |
-| Arize Phoenix | [GitHub](https://github.com/Arize-ai/phoenix) | [](https://github.com/Arize-ai/phoenix) | observability, tracing, evaluation | Open platform for AI observability, tracing, and evaluation analytics. |
-| OpenLLMetry | [GitHub](https://github.com/traceloop/openllmetry) | [](https://github.com/traceloop/openllmetry) | opentelemetry, instrumentation, tracing | OpenTelemetry-based instrumentation for GenAI and LLM applications. |
-| Helicone | [GitHub](https://github.com/Helicone/helicone) | [](https://github.com/Helicone/helicone) | monitoring, traffic, production | Lightweight platform for monitoring and evaluating LLM traffic in production. |
-| AgentOps SDK | [GitHub](https://github.com/AgentOps-AI/agentops) | [](https://github.com/AgentOps-AI/agentops) | agentops, monitoring, cost | Monitoring and benchmarking SDK for agent workflows with cost and trace tracking. |
-| Latitude | [GitHub](https://github.com/latitude-dev/latitude-llm) | [](https://github.com/latitude-dev/latitude-llm) | platform, eval, observability | Open-source agent engineering platform with eval and observability capabilities. |
-| Laminar | [GitHub](https://github.com/lmnr-ai/lmnr) | [](https://github.com/lmnr-ai/lmnr) | observability, tracing, evals | Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards. |
-| claude-code-reverse | [GitHub](https://github.com/Yuyz0112/claude-code-reverse) | [](https://github.com/Yuyz0112/claude-code-reverse) | trace, visualization, debugging | Tooling to visualize and inspect Claude Code LLM interaction traces. |
-| OpenInference | [GitHub](https://github.com/Arize-ai/openinference) | [](https://github.com/Arize-ai/openinference) | spec, instrumentation, observability | Open instrumentation specification and tooling for AI observability. |
-| Future AGI | [GitHub](https://github.com/future-agi/future-agi) | [](https://github.com/future-agi/future-agi) | observability, evaluation, guardrails | Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations. |
+| Langfuse | [GitHub](https://github.com/langfuse/langfuse) | [](https://github.com/langfuse/langfuse) | llmops, tracing, metrics | Open-source LLM engineering platform for traces, metrics, prompts, and evals. |
+| MLflow | [GitHub](https://github.com/mlflow/mlflow) | [](https://github.com/mlflow/mlflow) | platform, monitoring, evaluation | Broad AI engineering platform with monitoring and evaluation support for agents. |
+| Opik | [GitHub](https://github.com/comet-ml/opik) | [](https://github.com/comet-ml/opik) | monitoring, eval, tracing | End-to-end debug/eval/monitoring stack for LLM apps and agent workflows. |
+| RagaAI Catalyst | [GitHub](https://github.com/raga-ai-hub/RagaAI-Catalyst) | [](https://github.com/raga-ai-hub/RagaAI-Catalyst) | agentops, analytics, monitoring | Agent observability and monitoring framework with timeline and graph analytics. |
+| TensorZero | [GitHub](https://github.com/tensorzero/tensorzero) | [](https://github.com/tensorzero/tensorzero) | llmops, gateway, optimization | Open LLMOps stack unifying gateway, observability, evaluation, and optimization. |
+| Arize Phoenix | [GitHub](https://github.com/Arize-ai/phoenix) | [](https://github.com/Arize-ai/phoenix) | observability, tracing, evaluation | Open platform for AI observability, tracing, and evaluation analytics. |
+| OpenLLMetry | [GitHub](https://github.com/traceloop/openllmetry) | [](https://github.com/traceloop/openllmetry) | opentelemetry, instrumentation, tracing | OpenTelemetry-based instrumentation for GenAI and LLM applications. |
+| Helicone | [GitHub](https://github.com/Helicone/helicone) | [](https://github.com/Helicone/helicone) | monitoring, traffic, production | Lightweight platform for monitoring and evaluating LLM traffic in production. |
+| AgentOps SDK | [GitHub](https://github.com/AgentOps-AI/agentops) | [](https://github.com/AgentOps-AI/agentops) | agentops, monitoring, cost | Monitoring and benchmarking SDK for agent workflows with cost and trace tracking. |
+| Latitude | [GitHub](https://github.com/latitude-dev/latitude-llm) | [](https://github.com/latitude-dev/latitude-llm) | platform, eval, observability | Open-source agent engineering platform with eval and observability capabilities. |
+| Laminar | [GitHub](https://github.com/lmnr-ai/lmnr) | [](https://github.com/lmnr-ai/lmnr) | observability, tracing, evals | Agent-focused observability stack with tracing, evaluation runs, monitoring, and dashboards. |
+| claude-code-reverse | [GitHub](https://github.com/Yuyz0112/claude-code-reverse) | [](https://github.com/Yuyz0112/claude-code-reverse) | trace, visualization, debugging | Tooling to visualize and inspect Claude Code LLM interaction traces. |
+| OpenInference | [GitHub](https://github.com/Arize-ai/openinference) | [](https://github.com/Arize-ai/openinference) | spec, instrumentation, observability | Open instrumentation specification and tooling for AI observability. |
+| Future AGI | [GitHub](https://github.com/future-agi/future-agi) | [](https://github.com/future-agi/future-agi) | observability, evaluation, guardrails | Self-hostable platform that closes the loop across agent tracing, evaluation, simulation, guardrails, and gateway operations. |
### Guardrails, Security & Governance
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| LiteLLM | [GitHub](https://github.com/BerriAI/litellm) | [](https://github.com/BerriAI/litellm) | gateway, proxy, guardrails | Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails. |
-| Kong | [GitHub](https://github.com/Kong/kong) | [](https://github.com/Kong/kong) | gateway, policy, infra | API and AI gateway infrastructure useful for policy enforcement in agent systems. |
-| Portkey Gateway | [GitHub](https://github.com/Portkey-AI/gateway) | [](https://github.com/Portkey-AI/gateway) | gateway, guardrails, routing | AI gateway with routing and guardrails for multi-model production traffic. |
-| CAI (Cybersecurity AI) | [GitHub](https://github.com/aliasrobotics/cai) | [](https://github.com/aliasrobotics/cai) | security, governance, framework | Security-focused agent framework for offensive/defensive AI workflows. |
-| OpenAI Realtime Agents | [GitHub](https://github.com/openai/openai-realtime-agents) | [](https://github.com/openai/openai-realtime-agents) | realtime, orchestration, control | Advanced agentic realtime patterns with structured control and interaction loops. |
-| Plano | [GitHub](https://github.com/katanemo/plano) | [](https://github.com/katanemo/plano) | proxy, safety, data-plane | AI-native proxy and data plane with orchestration, safety, and observability. |
-| OpenAI CS Agents Demo | [GitHub](https://github.com/openai/openai-cs-agents-demo) | [](https://github.com/openai/openai-cs-agents-demo) | demo, handoffs, governance | Customer-service multi-agent demo highlighting handoffs and guardrail-like control points. |
-| ContextForge | [GitHub](https://github.com/IBM/mcp-context-forge) | [](https://github.com/IBM/mcp-context-forge) | gateway, governance, observability | Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability. |
-| Archestra | [GitHub](https://github.com/archestra-ai/archestra) | [](https://github.com/archestra-ai/archestra) | enterprise, guardrails, governance | Enterprise AI platform with guardrails, MCP registry, and orchestration services. |
-| Tracecat | [GitHub](https://github.com/TracecatHQ/tracecat) | [](https://github.com/TracecatHQ/tracecat) | security, automation, policy | AI automation platform for security teams with policy and workflow controls. |
-| AgentGateway | [GitHub](https://github.com/agentgateway/agentgateway) | [](https://github.com/agentgateway/agentgateway) | gateway, mcp, proxy | Agentic proxy gateway for AI agents and MCP server ecosystems. |
-| Haft | [GitHub](https://github.com/m0n0x41d/haft) | [](https://github.com/m0n0x41d/haft) | governance, decisions, mcp | Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute. |
+| LiteLLM | [GitHub](https://github.com/BerriAI/litellm) | [](https://github.com/BerriAI/litellm) | gateway, proxy, guardrails | Unified LLM gateway/proxy with cost tracking, load balancing, and guardrails. |
+| Kong | [GitHub](https://github.com/Kong/kong) | [](https://github.com/Kong/kong) | gateway, policy, infra | API and AI gateway infrastructure useful for policy enforcement in agent systems. |
+| Portkey Gateway | [GitHub](https://github.com/Portkey-AI/gateway) | [](https://github.com/Portkey-AI/gateway) | gateway, guardrails, routing | AI gateway with routing and guardrails for multi-model production traffic. |
+| CAI (Cybersecurity AI) | [GitHub](https://github.com/aliasrobotics/cai) | [](https://github.com/aliasrobotics/cai) | security, governance, framework | Security-focused agent framework for offensive/defensive AI workflows. |
+| OpenAI Realtime Agents | [GitHub](https://github.com/openai/openai-realtime-agents) | [](https://github.com/openai/openai-realtime-agents) | realtime, orchestration, control | Advanced agentic realtime patterns with structured control and interaction loops. |
+| Plano | [GitHub](https://github.com/katanemo/plano) | [](https://github.com/katanemo/plano) | proxy, safety, data-plane | AI-native proxy and data plane with orchestration, safety, and observability. |
+| OpenAI CS Agents Demo | [GitHub](https://github.com/openai/openai-cs-agents-demo) | [](https://github.com/openai/openai-cs-agents-demo) | demo, handoffs, governance | Customer-service multi-agent demo highlighting handoffs and guardrail-like control points. |
+| ContextForge | [GitHub](https://github.com/IBM/mcp-context-forge) | [](https://github.com/IBM/mcp-context-forge) | gateway, governance, observability | Registry and proxy layer that unifies MCP, A2A, and REST/gRPC endpoints with centralized governance and observability. |
+| Archestra | [GitHub](https://github.com/archestra-ai/archestra) | [](https://github.com/archestra-ai/archestra) | enterprise, guardrails, governance | Enterprise AI platform with guardrails, MCP registry, and orchestration services. |
+| Tracecat | [GitHub](https://github.com/TracecatHQ/tracecat) | [](https://github.com/TracecatHQ/tracecat) | security, automation, policy | AI automation platform for security teams with policy and workflow controls. |
+| AgentGateway | [GitHub](https://github.com/agentgateway/agentgateway) | [](https://github.com/agentgateway/agentgateway) | gateway, mcp, proxy | Agentic proxy gateway for AI agents and MCP server ecosystems. |
+| Haft | [GitHub](https://github.com/m0n0x41d/haft) | [](https://github.com/m0n0x41d/haft) | governance, decisions, mcp | Decision-governance harness that records falsifiable contracts, evidence, and commissions before agents execute. |
### Reference Harness Implementations
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| OpenCode | [GitHub](https://github.com/anomalyco/opencode) | [](https://github.com/anomalyco/opencode) | terminal, coding-agent, subagents | Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime. |
-| Claude Code | [GitHub](https://github.com/anthropics/claude-code) | [](https://github.com/anthropics/claude-code) | terminal, coding-agent, git-workflows | Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language. |
-| Gemini CLI | [GitHub](https://github.com/google-gemini/gemini-cli) | [](https://github.com/google-gemini/gemini-cli) | terminal, coding-agent, mcp | Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls. |
-| Codex CLI | [GitHub](https://github.com/openai/codex) | [](https://github.com/openai/codex) | terminal, coding-agent, local-execution | Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks. |
-| OpenHands | [GitHub](https://github.com/OpenHands/OpenHands) | [](https://github.com/OpenHands/OpenHands) | coding-agent, software-engineering, repo | Open-source AI software engineer focused on repo-level coding task execution. |
-| learn-claude-code | [GitHub](https://github.com/shareAI-lab/learn-claude-code) | [](https://github.com/shareAI-lab/learn-claude-code) | tutorial, harness, claude-code | Hands-on harness tutorial for building Claude Code-like systems from scratch. |
-| OpenManus | [GitHub](https://github.com/FoundationAgents/OpenManus) | [](https://github.com/FoundationAgents/OpenManus) | general-agent, autonomy, workflows | Open foundation for broad autonomous agent workflows with coding-heavy use cases. |
-| pi | [GitHub](https://github.com/earendil-works/pi) | [](https://github.com/earendil-works/pi) | coding-agent, runtime, monorepo | Agent harness monorepo combining a coding-agent CLI, shared runtime, and multi-provider LLM stack. |
-| aider | [GitHub](https://github.com/Aider-AI/aider) | [](https://github.com/Aider-AI/aider) | terminal, repo-map, testing | Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops. |
-| Claude Code Plugins: Orchestration and Automation | [GitHub](https://github.com/wshobson/agents) | [](https://github.com/wshobson/agents) | claude-code, plugins, orchestration | Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators. |
-| CLI-Anything | [GitHub](https://github.com/HKUDS/CLI-Anything) | [](https://github.com/HKUDS/CLI-Anything) | cli, tool-use, automation | CLI agent system that unifies command-line tool usage in agent loops. |
-| NanoClaw | [GitHub](https://github.com/qwibitai/nanoclaw) | [](https://github.com/qwibitai/nanoclaw) | containers, claude-sdk, scheduling | Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization. |
-| Qwen Code | [GitHub](https://github.com/QwenLM/qwen-code) | [](https://github.com/QwenLM/qwen-code) | terminal, coding-agent, cli | Terminal-native open-source coding agent tuned for practical dev loops. |
-| SuperClaude Framework | [GitHub](https://github.com/SuperClaude-Org/SuperClaude_Framework) | [](https://github.com/SuperClaude-Org/SuperClaude_Framework) | config, personas, workflow | Configuration framework adding commands, personas, and method templates to coding agents. |
-| Devika | [GitHub](https://github.com/stitionai/devika) | [](https://github.com/stitionai/devika) | assistant, planning, coding | Open-source coding assistant system for planning and implementing development tasks. |
-| SWE-agent | [GitHub](https://github.com/SWE-agent/SWE-agent) | [](https://github.com/SWE-agent/SWE-agent) | swe, issue-fixing, tooling | Research-grade coding agent that resolves GitHub issues with explicit tooling loops. |
-| cmux | [GitHub](https://github.com/manaflow-ai/cmux) | [](https://github.com/manaflow-ai/cmux) | macos, workspace, browser | Native macOS terminal and browser workspace for AI coding agents with notifications, split panes, and scriptable control. |
-| Aperant | [GitHub](https://github.com/AndyMik90/Aperant) | [](https://github.com/AndyMik90/Aperant) | coding-agent, parallel, memory | Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory. |
-| Eigent | [GitHub](https://github.com/eigent-ai/eigent) | [](https://github.com/eigent-ai/eigent) | desktop, cowork, productivity | Open-source desktop cowork agent for autonomous task execution and productivity. |
-| IronClaw | [GitHub](https://github.com/nearai/ironclaw) | [](https://github.com/nearai/ironclaw) | security, wasm, routines | Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory. |
-| OpenHarness | [GitHub](https://github.com/HKUDS/OpenHarness) | [](https://github.com/HKUDS/OpenHarness) | tool-use, memory, multi-agent | Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination. |
-| Superset | [GitHub](https://github.com/superset-sh/superset) | [](https://github.com/superset-sh/superset) | worktrees, desktop, parallel | Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace. |
-| GitHub Copilot CLI | [GitHub](https://github.com/github/copilot-cli) | [](https://github.com/github/copilot-cli) | terminal, coding-agent, mcp | Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context. |
-| Open SWE | [GitHub](https://github.com/langchain-ai/open-swe) | [](https://github.com/langchain-ai/open-swe) | async, coding-agent, swe | Asynchronous open-source coding agent focused on software issue workflows. |
-| Paseo | [GitHub](https://github.com/getpaseo/paseo) | [](https://github.com/getpaseo/paseo) | coding-agent, daemon, multi-device | Multi-device coding-agent daemon and client stack for orchestrating local agents, parallel runs, and cross-provider workflows. |
-| 1Code | [GitHub](https://github.com/21st-dev/1code) | [](https://github.com/21st-dev/1code) | coding-agent, orchestration, worktrees | Desktop-first coding-agent orchestrator with worktree isolation, background sandboxes, MCP tooling, and automation triggers. |
-| OSAURUS | [GitHub](https://github.com/osaurus-ai/osaurus) | [](https://github.com/osaurus-ai/osaurus) | macos, local-first, memory | Native macOS harness for autonomous coding agents with persistent memory. |
-| holaOS | [GitHub](https://github.com/holaboss-ai/holaOS) | [](https://github.com/holaboss-ai/holaOS) | long-horizon, desktop, durable-state | Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state. |
-| HiClaw | [GitHub](https://github.com/agentscope-ai/HiClaw) | [](https://github.com/agentscope-ai/HiClaw) | multi-agent, human-in-the-loop, shared-state | Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms. |
-| mini-swe-agent | [GitHub](https://github.com/SWE-agent/mini-swe-agent) | [](https://github.com/SWE-agent/mini-swe-agent) | minimal, swe, coding-agent | Minimal coding agent implementation with strong benchmark competitiveness. |
-| oh-my-pi | [GitHub](https://github.com/can1357/oh-my-pi) | [](https://github.com/can1357/oh-my-pi) | terminal, lsp, subagents | Terminal AI coding agent with edit safety, LSP integration, and subagent support. |
-| TinyAGI | [GitHub](https://github.com/TinyAGI/tinyagi) | [](https://github.com/TinyAGI/tinyagi) | team-orchestration, autonomous, workflows | Team-style agent orchestrator for one-person-company style autonomous workflows. |
+| OpenCode | [GitHub](https://github.com/anomalyco/opencode) | [](https://github.com/anomalyco/opencode) | terminal, coding-agent, subagents | Open-source coding agent with built-in plan/build roles, subagents, LSP support, and a client-server runtime. |
+| Claude Code | [GitHub](https://github.com/anthropics/claude-code) | [](https://github.com/anthropics/claude-code) | terminal, coding-agent, git-workflows | Official terminal coding agent that understands codebases and executes editing, debugging, and Git workflows through natural language. |
+| Gemini CLI | [GitHub](https://github.com/google-gemini/gemini-cli) | [](https://github.com/google-gemini/gemini-cli) | terminal, coding-agent, mcp | Open-source terminal agent with built-in tools, MCP support, checkpointing, and sandboxing controls. |
+| Codex CLI | [GitHub](https://github.com/openai/codex) | [](https://github.com/openai/codex) | terminal, coding-agent, local-execution | Terminal-native coding agent that runs locally and exposes practical agent workflows for software tasks. |
+| OpenHands | [GitHub](https://github.com/OpenHands/OpenHands) | [](https://github.com/OpenHands/OpenHands) | coding-agent, software-engineering, repo | Open-source AI software engineer focused on repo-level coding task execution. |
+| learn-claude-code | [GitHub](https://github.com/shareAI-lab/learn-claude-code) | [](https://github.com/shareAI-lab/learn-claude-code) | tutorial, harness, claude-code | Hands-on harness tutorial for building Claude Code-like systems from scratch. |
+| OpenManus | [GitHub](https://github.com/FoundationAgents/OpenManus) | [](https://github.com/FoundationAgents/OpenManus) | general-agent, autonomy, workflows | Open foundation for broad autonomous agent workflows with coding-heavy use cases. |
+| pi | [GitHub](https://github.com/earendil-works/pi) | [](https://github.com/earendil-works/pi) | coding-agent, runtime, monorepo | Agent harness monorepo combining a coding-agent CLI, shared runtime, and multi-provider LLM stack. |
+| aider | [GitHub](https://github.com/Aider-AI/aider) | [](https://github.com/Aider-AI/aider) | terminal, repo-map, testing | Terminal coding assistant with repo mapping, git-aware edits, and built-in lint/test feedback loops. |
+| Claude Code Plugins: Orchestration and Automation | [GitHub](https://github.com/wshobson/agents) | [](https://github.com/wshobson/agents) | claude-code, plugins, orchestration | Production-ready Claude Code plugin marketplace bundling agents, skills, tools, and multi-agent workflow orchestrators. |
+| CLI-Anything | [GitHub](https://github.com/HKUDS/CLI-Anything) | [](https://github.com/HKUDS/CLI-Anything) | cli, tool-use, automation | CLI agent system that unifies command-line tool usage in agent loops. |
+| NanoClaw | [GitHub](https://github.com/qwibitai/nanoclaw) | [](https://github.com/qwibitai/nanoclaw) | containers, claude-sdk, scheduling | Container-isolated Claude agent harness with channel routing, scheduled jobs, per-group memory, and small-codebase customization. |
+| Qwen Code | [GitHub](https://github.com/QwenLM/qwen-code) | [](https://github.com/QwenLM/qwen-code) | terminal, coding-agent, cli | Terminal-native open-source coding agent tuned for practical dev loops. |
+| SuperClaude Framework | [GitHub](https://github.com/SuperClaude-Org/SuperClaude_Framework) | [](https://github.com/SuperClaude-Org/SuperClaude_Framework) | config, personas, workflow | Configuration framework adding commands, personas, and method templates to coding agents. |
+| Devika | [GitHub](https://github.com/stitionai/devika) | [](https://github.com/stitionai/devika) | assistant, planning, coding | Open-source coding assistant system for planning and implementing development tasks. |
+| SWE-agent | [GitHub](https://github.com/SWE-agent/SWE-agent) | [](https://github.com/SWE-agent/SWE-agent) | swe, issue-fixing, tooling | Research-grade coding agent that resolves GitHub issues with explicit tooling loops. |
+| cmux | [GitHub](https://github.com/manaflow-ai/cmux) | [](https://github.com/manaflow-ai/cmux) | macos, workspace, browser | Native macOS terminal and browser workspace for AI coding agents with notifications, split panes, and scriptable control. |
+| Aperant | [GitHub](https://github.com/AndyMik90/Aperant) | [](https://github.com/AndyMik90/Aperant) | coding-agent, parallel, memory | Autonomous multi-agent coding framework with parallel execution, isolated workspaces, QA loops, and persistent memory. |
+| Eigent | [GitHub](https://github.com/eigent-ai/eigent) | [](https://github.com/eigent-ai/eigent) | desktop, cowork, productivity | Open-source desktop cowork agent for autonomous task execution and productivity. |
+| OpenHarness | [GitHub](https://github.com/HKUDS/OpenHarness) | [](https://github.com/HKUDS/OpenHarness) | tool-use, memory, multi-agent | Open agent harness implementation covering tool use, skills, memory, permissions, and multi-agent coordination. |
+| IronClaw | [GitHub](https://github.com/nearai/ironclaw) | [](https://github.com/nearai/ironclaw) | security, wasm, routines | Security-first personal agent harness with WASM sandboxing, routines, tool plugins, and persistent memory. |
+| Superset | [GitHub](https://github.com/superset-sh/superset) | [](https://github.com/superset-sh/superset) | worktrees, desktop, parallel | Worktree-based desktop orchestrator for running and reviewing parallel CLI coding agents from one workspace. |
+| GitHub Copilot CLI | [GitHub](https://github.com/github/copilot-cli) | [](https://github.com/github/copilot-cli) | terminal, coding-agent, mcp | Official terminal coding agent built on GitHub's Copilot harness with MCP extensibility, approval controls, and GitHub-native context. |
+| Open SWE | [GitHub](https://github.com/langchain-ai/open-swe) | [](https://github.com/langchain-ai/open-swe) | async, coding-agent, swe | Asynchronous open-source coding agent focused on software issue workflows. |
+| Paseo | [GitHub](https://github.com/getpaseo/paseo) | [](https://github.com/getpaseo/paseo) | coding-agent, daemon, multi-device | Multi-device coding-agent daemon and client stack for orchestrating local agents, parallel runs, and cross-provider workflows. |
+| 1Code | [GitHub](https://github.com/21st-dev/1code) | [](https://github.com/21st-dev/1code) | coding-agent, orchestration, worktrees | Desktop-first coding-agent orchestrator with worktree isolation, background sandboxes, MCP tooling, and automation triggers. |
+| holaOS | [GitHub](https://github.com/holaboss-ai/holaOS) | [](https://github.com/holaboss-ai/holaOS) | long-horizon, desktop, durable-state | Desktop-first long-horizon agent environment with runtime, memory, tools, apps, and durable state. |
+| OSAURUS | [GitHub](https://github.com/osaurus-ai/osaurus) | [](https://github.com/osaurus-ai/osaurus) | macos, local-first, memory | Native macOS harness for autonomous coding agents with persistent memory. |
+| HiClaw | [GitHub](https://github.com/agentscope-ai/HiClaw) | [](https://github.com/agentscope-ai/HiClaw) | multi-agent, human-in-the-loop, shared-state | Collaborative multi-agent OS with manager-worker coordination, shared state, and human-in-the-loop oversight via Matrix rooms. |
+| mini-swe-agent | [GitHub](https://github.com/SWE-agent/mini-swe-agent) | [](https://github.com/SWE-agent/mini-swe-agent) | minimal, swe, coding-agent | Minimal coding agent implementation with strong benchmark competitiveness. |
+| oh-my-pi | [GitHub](https://github.com/can1357/oh-my-pi) | [](https://github.com/can1357/oh-my-pi) | terminal, lsp, subagents | Terminal AI coding agent with edit safety, LSP integration, and subagent support. |
+| TinyAGI | [GitHub](https://github.com/TinyAGI/tinyagi) | [](https://github.com/TinyAGI/tinyagi) | team-orchestration, autonomous, workflows | Team-style agent orchestrator for one-person-company style autonomous workflows. |
| Devon | [GitHub](https://github.com/entropy-research/Devon) | [](https://github.com/entropy-research/Devon) | pair-programming, coding-agent, autonomous | Open-source pair programmer agent with autonomous coding execution patterns. |
-| Open Claude Cowork | [GitHub](https://github.com/DevAgentForge/Open-Claude-Cowork) | [](https://github.com/DevAgentForge/Open-Claude-Cowork) | desktop, ui, orchestration | Desktop coding cowork assistant that turns agent orchestration into GUI workflows. |
-| Amazon Bedrock AgentCore Samples | [GitHub](https://github.com/awslabs/agentcore-samples) | [](https://github.com/awslabs/agentcore-samples) | aws, runtime, operations | Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers. |
-| mini-coding-agent | [GitHub](https://github.com/rasbt/mini-coding-agent) | [](https://github.com/rasbt/mini-coding-agent) | coding-agent, minimal, approvals | Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts. |
+| Open Claude Cowork | [GitHub](https://github.com/DevAgentForge/Open-Claude-Cowork) | [](https://github.com/DevAgentForge/Open-Claude-Cowork) | desktop, ui, orchestration | Desktop coding cowork assistant that turns agent orchestration into GUI workflows. |
+| Amazon Bedrock AgentCore Samples | [GitHub](https://github.com/awslabs/agentcore-samples) | [](https://github.com/awslabs/agentcore-samples) | aws, runtime, operations | Official sample suite for deploying and operating agents with runtime, gateway, memory, observability, evaluation, and policy layers. |
+| mini-coding-agent | [GitHub](https://github.com/rasbt/mini-coding-agent) | [](https://github.com/rasbt/mini-coding-agent) | coding-agent, minimal, approvals | Minimal coding agent harness illustrating approvals, memory, bounded delegation, and durable transcripts. |
+| AgentPlane | [GitHub](https://github.com/basilisk-labs/agentplane) | [](https://github.com/basilisk-labs/agentplane) | coding-agent, git-native, workflow-control | Local-first Git-native CLI harness for auditable coding-agent work with task, plan, verification, and finish records. |
### Essential Readings & Ecosystem Maps
| Project | Link | Stars | Tags | Summary |
| --- | --- | --- | --- | --- |
-| awesome-claude-code | [GitHub](https://github.com/hesreallyhim/awesome-claude-code) | [](https://github.com/hesreallyhim/awesome-claude-code) | awesome-list, claude-code, skills | Community collection of Claude Code skills, hooks, and orchestrator tooling. |
-| awesome-agentic-patterns | [GitHub](https://github.com/nibzard/awesome-agentic-patterns) | [](https://github.com/nibzard/awesome-agentic-patterns) | awesome-list, patterns, design | Catalog of reusable agentic design patterns and implementation motifs. |
-| awesome-mcp-servers | [GitHub](https://github.com/wong2/awesome-mcp-servers) | [](https://github.com/wong2/awesome-mcp-servers) | awesome-list, mcp, tools | Curated MCP server index for tool interoperability in agent systems. |
-| awesome-harness-engineering | [GitHub](https://github.com/walkinglabs/awesome-harness-engineering) | [](https://github.com/walkinglabs/awesome-harness-engineering) | awesome-list, curation, harness | Curated list focused on harness engineering articles, benchmarks, and implementations. |
+| awesome-claude-code | [GitHub](https://github.com/hesreallyhim/awesome-claude-code) | [](https://github.com/hesreallyhim/awesome-claude-code) | awesome-list, claude-code, skills | Community collection of Claude Code skills, hooks, and orchestrator tooling. |
+| awesome-agentic-patterns | [GitHub](https://github.com/nibzard/awesome-agentic-patterns) | [](https://github.com/nibzard/awesome-agentic-patterns) | awesome-list, patterns, design | Catalog of reusable agentic design patterns and implementation motifs. |
+| awesome-mcp-servers | [GitHub](https://github.com/wong2/awesome-mcp-servers) | [](https://github.com/wong2/awesome-mcp-servers) | awesome-list, mcp, tools | Curated MCP server index for tool interoperability in agent systems. |
+| awesome-harness-engineering | [GitHub](https://github.com/walkinglabs/awesome-harness-engineering) | [](https://github.com/walkinglabs/awesome-harness-engineering) | awesome-list, curation, harness | Curated list focused on harness engineering articles, benchmarks, and implementations. |
| 12 Factor Agents | [Reference](https://www.humanlayer.dev/blog/12-factor-agents) | - | reading, operations, principles | Operations-oriented principles for building maintainable production agents. |
| Agent Frameworks, Runtimes, and Harnesses, oh my! | [Reference](https://blog.langchain.com/agent-frameworks-runtimes-and-harnesses-oh-my/) | - | reading, langchain, architecture | Clear decomposition of framework vs runtime vs harness responsibilities. |
| An open-source spec for Codex orchestration: Symphony. | [Reference](https://openai.com/index/open-source-codex-orchestration-symphony/) | - | reading, openai, orchestration | OpenAI's orchestration write-up on turning issue trackers into always-on control planes for coding agents. |
diff --git a/README_zh.md b/README_zh.md
index 8b4145b..1ba17e0 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -2,9 +2,9 @@
一个面向 **Agent Harness Engineering** 的工程实践清单,优先收录可直接落地的 GitHub 项目。
-- 当前条目数: **171**
-- GitHub 条目: **146 (85.4%)**
-- 项目分类 GitHub 占比(不含阅读类): **142/142 (100.0%)**
+- 当前条目数: **172**
+- GitHub 条目: **147 (85.5%)**
+- 项目分类 GitHub 占比(不含阅读类): **143/143 (100.0%)**
- 分类数量: **9**
- 最近核对日期: **2026-05-08**
- 语言: [English](./README.md) | [中文](./README_zh.md)
@@ -51,7 +51,7 @@
| Evaluation Harnesses & Benchmarks | 21 |
| Observability & Reliability Operations | 14 |
| Guardrails, Security & Governance | 12 |
-| Reference Harness Implementations | 36 |
+| Reference Harness Implementations | 37 |
| Essential Readings & Ecosystem Maps | 29 |
## 项目清单
@@ -66,41 +66,41 @@
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| DeerFlow | [GitHub](https://github.com/bytedance/deer-flow) | [](https://github.com/bytedance/deer-flow) | long-horizon, memory, subagents | 面向长任务的 SuperAgent harness,整合记忆、工具、子代理与沙箱。 |
-| AutoGen | [GitHub](https://github.com/microsoft/autogen) | [](https://github.com/microsoft/autogen) | multi-agent, orchestration, framework | 支持多代理交互与编排的 agentic AI 编程框架。 |
-| Agno | [GitHub](https://github.com/agno-agi/agno) | [](https://github.com/agno-agi/agno) | scale, runtime, management | 面向规模化运行与管理的 agent 软件运行时。 |
-| LangGraph | [GitHub](https://github.com/langchain-ai/langgraph) | [](https://github.com/langchain-ai/langgraph) | graph, workflow, runtime | 图结构运行时,用于构建具备状态管理与确定性流程控制的可靠代理。 |
-| Semantic Kernel | [GitHub](https://github.com/microsoft/semantic-kernel) | [](https://github.com/microsoft/semantic-kernel) | enterprise, orchestration, plugins | 面向企业应用的 agentic 框架,支持编排与插件化扩展。 |
-| OpenAI Agents SDK (Python) | [GitHub](https://github.com/openai/openai-agents-python) | [](https://github.com/openai/openai-agents-python) | sdk, handoff, workflows | 轻量级多代理工作流框架,支持交接、编排和生产化模式。 |
-| deepagents | [GitHub](https://github.com/langchain-ai/deepagents) | [](https://github.com/langchain-ai/deepagents) | runtime, orchestration, long-running | 面向长时任务的开源 harness,支持规划、工具调用与子代理协作模式。 |
-| Archon | [GitHub](https://github.com/coleam00/Archon) | [](https://github.com/coleam00/Archon) | workflow-engine, worktrees, validation | 面向 AI 编码代理的工作流引擎,提供 YAML 定义阶段、隔离 worktree 与校验门禁。 |
-| Google ADK (Python) | [GitHub](https://github.com/google/adk-python) | [](https://github.com/google/adk-python) | toolkit, deployment, evaluation | 代码优先的工具包,用于构建、评估和部署复杂 AI 代理。 |
-| PydanticAI | [GitHub](https://github.com/pydantic/pydantic-ai) | [](https://github.com/pydantic/pydantic-ai) | python, typing, schema | 强调类型与结构化约束的 Python agent 框架,适合稳定化 harness 开发。 |
-| Hive | [GitHub](https://github.com/aden-hive/hive) | [](https://github.com/aden-hive/hive) | harness, orchestration, runtime | 以结果驱动的 agent runtime harness,强调控制回路与编排模块。 |
-| Microsoft Agent Framework | [GitHub](https://github.com/microsoft/agent-framework) | [](https://github.com/microsoft/agent-framework) | multi-agent, workflows, observability | 多语言代理框架,支持图工作流、编排、部署与可观测能力。 |
-| VoltAgent | [GitHub](https://github.com/VoltAgent/voltagent) | [](https://github.com/VoltAgent/voltagent) | typescript, platform, runtime | 基于 TypeScript 的 agent 工程平台,提供开放运行时抽象。 |
-| mcp-agent | [GitHub](https://github.com/lastmile-ai/mcp-agent) | [](https://github.com/lastmile-ai/mcp-agent) | mcp, runtime, workflow | 以 MCP 工具体系为核心的实用 agent 框架,强调工作流组合。 |
-| Yao | [GitHub](https://github.com/YaoApp/yao) | [](https://github.com/YaoApp/yao) | single-binary, runtime, autonomous | 单二进制运行时,用于定义并运行自治代理。 |
-| Cloudflare Agents | [GitHub](https://github.com/cloudflare/agents) | [](https://github.com/cloudflare/agents) | platform, deployment, runtime | 提供面向生产基础设施的 agent 构建与部署运行时。 |
-| Docker Agent | [GitHub](https://github.com/docker/docker-agent) | [](https://github.com/docker/docker-agent) | docker, runtime, container | 强调容器原生执行的 agent 构建与运行时栈。 |
-| NeMo Agent Toolkit | [GitHub](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | [](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | multi-agent, optimization, toolkit | 用于连接与优化多代理协作的开源工具包。 |
-| Scion | [GitHub](https://github.com/GoogleCloudPlatform/scion) | [](https://github.com/GoogleCloudPlatform/scion) | multi-agent, containers, orchestration | 实验性多代理编排测试平台,可在容器、git worktree 与远程运行时中隔离运行各类 agent harness。 |
-| deepagentsjs | [GitHub](https://github.com/langchain-ai/deepagentsjs) | [](https://github.com/langchain-ai/deepagentsjs) | typescript, langgraph, subagents | 基于 TypeScript 的 agent harness,内置规划、文件系统工具、子代理与 LangGraph 原生运行时能力。 |
-| hankweave | [GitHub](https://github.com/SouthBridgeAI/hankweave-runtime) | [](https://github.com/SouthBridgeAI/hankweave-runtime) | long-horizon, runtime, checkpoints | 面向长任务的无界面运行时,可编排现有 agent harness,并提供 sentinels、循环、检查点与事件日志。 |
+| DeerFlow | [GitHub](https://github.com/bytedance/deer-flow) | [](https://github.com/bytedance/deer-flow) | long-horizon, memory, subagents | 面向长任务的 SuperAgent harness,整合记忆、工具、子代理与沙箱。 |
+| AutoGen | [GitHub](https://github.com/microsoft/autogen) | [](https://github.com/microsoft/autogen) | multi-agent, orchestration, framework | 支持多代理交互与编排的 agentic AI 编程框架。 |
+| Agno | [GitHub](https://github.com/agno-agi/agno) | [](https://github.com/agno-agi/agno) | scale, runtime, management | 面向规模化运行与管理的 agent 软件运行时。 |
+| LangGraph | [GitHub](https://github.com/langchain-ai/langgraph) | [](https://github.com/langchain-ai/langgraph) | graph, workflow, runtime | 图结构运行时,用于构建具备状态管理与确定性流程控制的可靠代理。 |
+| Semantic Kernel | [GitHub](https://github.com/microsoft/semantic-kernel) | [](https://github.com/microsoft/semantic-kernel) | enterprise, orchestration, plugins | 面向企业应用的 agentic 框架,支持编排与插件化扩展。 |
+| OpenAI Agents SDK (Python) | [GitHub](https://github.com/openai/openai-agents-python) | [](https://github.com/openai/openai-agents-python) | sdk, handoff, workflows | 轻量级多代理工作流框架,支持交接、编排和生产化模式。 |
+| deepagents | [GitHub](https://github.com/langchain-ai/deepagents) | [](https://github.com/langchain-ai/deepagents) | runtime, orchestration, long-running | 面向长时任务的开源 harness,支持规划、工具调用与子代理协作模式。 |
+| Archon | [GitHub](https://github.com/coleam00/Archon) | [](https://github.com/coleam00/Archon) | workflow-engine, worktrees, validation | 面向 AI 编码代理的工作流引擎,提供 YAML 定义阶段、隔离 worktree 与校验门禁。 |
+| Google ADK (Python) | [GitHub](https://github.com/google/adk-python) | [](https://github.com/google/adk-python) | toolkit, deployment, evaluation | 代码优先的工具包,用于构建、评估和部署复杂 AI 代理。 |
+| PydanticAI | [GitHub](https://github.com/pydantic/pydantic-ai) | [](https://github.com/pydantic/pydantic-ai) | python, typing, schema | 强调类型与结构化约束的 Python agent 框架,适合稳定化 harness 开发。 |
+| Hive | [GitHub](https://github.com/aden-hive/hive) | [](https://github.com/aden-hive/hive) | harness, orchestration, runtime | 以结果驱动的 agent runtime harness,强调控制回路与编排模块。 |
+| Microsoft Agent Framework | [GitHub](https://github.com/microsoft/agent-framework) | [](https://github.com/microsoft/agent-framework) | multi-agent, workflows, observability | 多语言代理框架,支持图工作流、编排、部署与可观测能力。 |
+| VoltAgent | [GitHub](https://github.com/VoltAgent/voltagent) | [](https://github.com/VoltAgent/voltagent) | typescript, platform, runtime | 基于 TypeScript 的 agent 工程平台,提供开放运行时抽象。 |
+| mcp-agent | [GitHub](https://github.com/lastmile-ai/mcp-agent) | [](https://github.com/lastmile-ai/mcp-agent) | mcp, runtime, workflow | 以 MCP 工具体系为核心的实用 agent 框架,强调工作流组合。 |
+| Yao | [GitHub](https://github.com/YaoApp/yao) | [](https://github.com/YaoApp/yao) | single-binary, runtime, autonomous | 单二进制运行时,用于定义并运行自治代理。 |
+| Cloudflare Agents | [GitHub](https://github.com/cloudflare/agents) | [](https://github.com/cloudflare/agents) | platform, deployment, runtime | 提供面向生产基础设施的 agent 构建与部署运行时。 |
+| Docker Agent | [GitHub](https://github.com/docker/docker-agent) | [](https://github.com/docker/docker-agent) | docker, runtime, container | 强调容器原生执行的 agent 构建与运行时栈。 |
+| NeMo Agent Toolkit | [GitHub](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | [](https://github.com/NVIDIA/NeMo-Agent-Toolkit) | multi-agent, optimization, toolkit | 用于连接与优化多代理协作的开源工具包。 |
+| Scion | [GitHub](https://github.com/GoogleCloudPlatform/scion) | [](https://github.com/GoogleCloudPlatform/scion) | multi-agent, containers, orchestration | 实验性多代理编排测试平台,可在容器、git worktree 与远程运行时中隔离运行各类 agent harness。 |
+| deepagentsjs | [GitHub](https://github.com/langchain-ai/deepagentsjs) | [](https://github.com/langchain-ai/deepagentsjs) | typescript, langgraph, subagents | 基于 TypeScript 的 agent harness,内置规划、文件系统工具、子代理与 LangGraph 原生运行时能力。 |
+| hankweave | [GitHub](https://github.com/SouthBridgeAI/hankweave-runtime) | [](https://github.com/SouthBridgeAI/hankweave-runtime) | long-horizon, runtime, checkpoints | 面向长任务的无界面运行时,可编排现有 agent harness,并提供 sentinels、循环、检查点与事件日志。 |
### Context & Working-State Engineering
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| everything-claude-code | [GitHub](https://github.com/affaan-m/everything-claude-code) | [](https://github.com/affaan-m/everything-claude-code) | context, skills, harness-practices | 大型开源实践库,聚焦编码代理的记忆、技能与上下文控制策略。 |
-| claude-mem | [GitHub](https://github.com/thedotmack/claude-mem) | [](https://github.com/thedotmack/claude-mem) | memory, context, session | 插件化记忆层,可记录会话历史并在后续编码任务中注入相关上下文。 |
-| planning-with-files | [GitHub](https://github.com/OthmanAdi/planning-with-files) | [](https://github.com/OthmanAdi/planning-with-files) | planning, skills, persistence | 用于编码代理工作流的持久化文件规划技能包。 |
-| Agent Skills for Context Engineering | [GitHub](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | [](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | skills, context, production | 面向上下文工程与生产代理的大型技能库。 |
-| Context-Engineering Handbook | [GitHub](https://github.com/davidkimai/Context-Engineering) | [](https://github.com/davidkimai/Context-Engineering) | context-engineering, handbook, practices | 面向代理系统的第一性原理上下文工程手册,强调实践落地。 |
-| CCPM | [GitHub](https://github.com/automazeio/ccpm) | [](https://github.com/automazeio/ccpm) | planning, github-issues, parallel-execution | 规格驱动的项目管理技能,将 PRD 与 GitHub issue 转化为持久上下文和并行代理执行流程。 |
-| Trellis | [GitHub](https://github.com/mindfold-ai/Trellis) | [](https://github.com/mindfold-ai/Trellis) | specs, memory, workflow | 面向多平台编码代理的工作流框架,提供任务上下文、项目记忆与规范注入。 |
-| Awesome Context Engineering | [GitHub](https://github.com/Meirtz/Awesome-Context-Engineering) | [](https://github.com/Meirtz/Awesome-Context-Engineering) | awesome-list, context, survey | 面向上下文工程的综述型清单,覆盖资源与框架。 |
+| everything-claude-code | [GitHub](https://github.com/affaan-m/everything-claude-code) | [](https://github.com/affaan-m/everything-claude-code) | context, skills, harness-practices | 大型开源实践库,聚焦编码代理的记忆、技能与上下文控制策略。 |
+| claude-mem | [GitHub](https://github.com/thedotmack/claude-mem) | [](https://github.com/thedotmack/claude-mem) | memory, context, session | 插件化记忆层,可记录会话历史并在后续编码任务中注入相关上下文。 |
+| planning-with-files | [GitHub](https://github.com/OthmanAdi/planning-with-files) | [](https://github.com/OthmanAdi/planning-with-files) | planning, skills, persistence | 用于编码代理工作流的持久化文件规划技能包。 |
+| Agent Skills for Context Engineering | [GitHub](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | [](https://github.com/muratcankoylan/Agent-Skills-for-Context-Engineering) | skills, context, production | 面向上下文工程与生产代理的大型技能库。 |
+| Context-Engineering Handbook | [GitHub](https://github.com/davidkimai/Context-Engineering) | [](https://github.com/davidkimai/Context-Engineering) | context-engineering, handbook, practices | 面向代理系统的第一性原理上下文工程手册,强调实践落地。 |
+| CCPM | [GitHub](https://github.com/automazeio/ccpm) | [](https://github.com/automazeio/ccpm) | planning, github-issues, parallel-execution | 规格驱动的项目管理技能,将 PRD 与 GitHub issue 转化为持久上下文和并行代理执行流程。 |
+| Trellis | [GitHub](https://github.com/mindfold-ai/Trellis) | [](https://github.com/mindfold-ai/Trellis) | specs, memory, workflow | 面向多平台编码代理的工作流框架,提供任务上下文、项目记忆与规范注入。 |
+| Awesome Context Engineering | [GitHub](https://github.com/Meirtz/Awesome-Context-Engineering) | [](https://github.com/Meirtz/Awesome-Context-Engineering) | awesome-list, context, survey | 面向上下文工程的综述型清单,覆盖资源与框架。 |
| context-space | [GitHub](https://github.com/context-space/context-space) | [](https://github.com/context-space/context-space) | context, infrastructure, mcp | 聚焦上下文工程基础设施的项目,强调 MCP 生态集成能力。 |
@@ -108,67 +108,67 @@
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| Daytona | [GitHub](https://github.com/daytonaio/daytona) | [](https://github.com/daytonaio/daytona) | sandbox, execution, infra | 面向 AI 生成代码的安全弹性沙箱基础设施,提供文件、Git、LSP 与执行 API。 |
-| CUA | [GitHub](https://github.com/trycua/cua) | [](https://github.com/trycua/cua) | computer-use, sandbox, infra | 面向计算机操作代理的基础设施栈,包含沙箱、SDK 与基准支持。 |
-| E2B | [GitHub](https://github.com/e2b-dev/E2B) | [](https://github.com/e2b-dev/E2B) | cloud-sandbox, execution, enterprise | 提供真实工具的安全云端环境,面向生产级代理执行。 |
-| Browser Harness | [GitHub](https://github.com/browser-use/browser-harness) | [](https://github.com/browser-use/browser-harness) | browser, cdp, self-healing | 轻量可编辑的 CDP harness,可将 LLM 直接接入真实浏览器,并允许代理在运行中扩展辅助能力。 |
-| OpenSandbox | [GitHub](https://github.com/alibaba/OpenSandbox) | [](https://github.com/alibaba/OpenSandbox) | sandbox, security, runtime | 面向代理工作负载的安全可扩展沙箱运行时。 |
-| agent-infra sandbox | [GitHub](https://github.com/agent-infra/sandbox) | [](https://github.com/agent-infra/sandbox) | all-in-one, browser, shell | 集成浏览器、Shell、文件、MCP 与 IDE 服务的一体化沙箱。 |
-| Judge0 | [GitHub](https://github.com/judge0/judge0) | [](https://github.com/judge0/judge0) | code-execution, sandbox, backend | 可扩展的沙箱代码执行系统,可作为代理执行后端。 |
-| Agent Sandbox | [GitHub](https://github.com/kubernetes-sigs/agent-sandbox) | [](https://github.com/kubernetes-sigs/agent-sandbox) | kubernetes, sandbox, stateful | 面向隔离且有状态 agent runtime 的 Kubernetes 原生沙箱控制平面,提供稳定身份、持久化与预热池能力。 |
-| stakpak/agent | [GitHub](https://github.com/stakpak/agent) | [](https://github.com/stakpak/agent) | always-on, autonomous, ops | 常驻机器运行的开源自治代理,强调持续运维闭环。 |
+| Daytona | [GitHub](https://github.com/daytonaio/daytona) | [](https://github.com/daytonaio/daytona) | sandbox, execution, infra | 面向 AI 生成代码的安全弹性沙箱基础设施,提供文件、Git、LSP 与执行 API。 |
+| CUA | [GitHub](https://github.com/trycua/cua) | [](https://github.com/trycua/cua) | computer-use, sandbox, infra | 面向计算机操作代理的基础设施栈,包含沙箱、SDK 与基准支持。 |
+| E2B | [GitHub](https://github.com/e2b-dev/E2B) | [](https://github.com/e2b-dev/E2B) | cloud-sandbox, execution, enterprise | 提供真实工具的安全云端环境,面向生产级代理执行。 |
+| Browser Harness | [GitHub](https://github.com/browser-use/browser-harness) | [](https://github.com/browser-use/browser-harness) | browser, cdp, self-healing | 轻量可编辑的 CDP harness,可将 LLM 直接接入真实浏览器,并允许代理在运行中扩展辅助能力。 |
+| OpenSandbox | [GitHub](https://github.com/alibaba/OpenSandbox) | [](https://github.com/alibaba/OpenSandbox) | sandbox, security, runtime | 面向代理工作负载的安全可扩展沙箱运行时。 |
+| agent-infra sandbox | [GitHub](https://github.com/agent-infra/sandbox) | [](https://github.com/agent-infra/sandbox) | all-in-one, browser, shell | 集成浏览器、Shell、文件、MCP 与 IDE 服务的一体化沙箱。 |
+| Judge0 | [GitHub](https://github.com/judge0/judge0) | [](https://github.com/judge0/judge0) | code-execution, sandbox, backend | 可扩展的沙箱代码执行系统,可作为代理执行后端。 |
+| Agent Sandbox | [GitHub](https://github.com/kubernetes-sigs/agent-sandbox) | [](https://github.com/kubernetes-sigs/agent-sandbox) | kubernetes, sandbox, stateful | 面向隔离且有状态 agent runtime 的 Kubernetes 原生沙箱控制平面,提供稳定身份、持久化与预热池能力。 |
+| stakpak/agent | [GitHub](https://github.com/stakpak/agent) | [](https://github.com/stakpak/agent) | always-on, autonomous, ops | 常驻机器运行的开源自治代理,强调持续运维闭环。 |
| OSS-Fuzz Gen | [GitHub](https://github.com/google/oss-fuzz-gen) | [](https://github.com/google/oss-fuzz-gen) | fuzzing, security, execution | 将 LLM 驱动模糊测试与受控执行环境结合的工程实现。 |
| E2B Desktop Sandbox | [GitHub](https://github.com/e2b-dev/desktop) | [](https://github.com/e2b-dev/desktop) | desktop, sandbox, computer-use | 面向 computer-use 代理的安全虚拟桌面沙箱,提供 SDK 控制与屏幕流式能力。 |
-| Tensorlake | [GitHub](https://github.com/tensorlakeai/tensorlake) | [](https://github.com/tensorlakeai/tensorlake) | microvm, sandbox, orchestration | 面向 agent 沙箱的无服务器运行时,提供 MicroVM 隔离、快照、挂起恢复与后台编排能力。 |
-| Arrakis | [GitHub](https://github.com/abshkbh/arrakis) | [](https://github.com/abshkbh/arrakis) | sandbox, microvm, snapshots | 自托管沙箱基座,提供 MicroVM 隔离、快照恢复,以及面向代理代码执行与 computer use 的 REST、SDK 与 MCP 接口。 |
-| AgentScope Runtime | [GitHub](https://github.com/agentscope-ai/agentscope-runtime) | [](https://github.com/agentscope-ai/agentscope-runtime) | runtime, sandbox, deployment | 面向代理应用的生产运行时,提供安全工具沙箱、部署 API、可观测能力与状态服务。 |
-| SWE-ReX | [GitHub](https://github.com/SWE-agent/SWE-ReX) | [](https://github.com/SWE-agent/SWE-ReX) | sandbox, execution, coding-agent | 面向 AI 编码代理的沙箱执行基础设施,支持本地与云端扩展。 |
-| sandboxed.sh | [GitHub](https://github.com/Th0rgal/sandboxed.sh) | [](https://github.com/Th0rgal/sandboxed.sh) | self-hosted, isolation, orchestrator | 在隔离 Linux 工作区中运行编码代理的自托管编排器。 |
+| Tensorlake | [GitHub](https://github.com/tensorlakeai/tensorlake) | [](https://github.com/tensorlakeai/tensorlake) | microvm, sandbox, orchestration | 面向 agent 沙箱的无服务器运行时,提供 MicroVM 隔离、快照、挂起恢复与后台编排能力。 |
+| Arrakis | [GitHub](https://github.com/abshkbh/arrakis) | [](https://github.com/abshkbh/arrakis) | sandbox, microvm, snapshots | 自托管沙箱基座,提供 MicroVM 隔离、快照恢复,以及面向代理代码执行与 computer use 的 REST、SDK 与 MCP 接口。 |
+| AgentScope Runtime | [GitHub](https://github.com/agentscope-ai/agentscope-runtime) | [](https://github.com/agentscope-ai/agentscope-runtime) | runtime, sandbox, deployment | 面向代理应用的生产运行时,提供安全工具沙箱、部署 API、可观测能力与状态服务。 |
+| SWE-ReX | [GitHub](https://github.com/SWE-agent/SWE-ReX) | [](https://github.com/SWE-agent/SWE-ReX) | sandbox, execution, coding-agent | 面向 AI 编码代理的沙箱执行基础设施,支持本地与云端扩展。 |
+| sandboxed.sh | [GitHub](https://github.com/Th0rgal/sandboxed.sh) | [](https://github.com/Th0rgal/sandboxed.sh) | self-hosted, isolation, orchestrator | 在隔离 Linux 工作区中运行编码代理的自托管编排器。 |
| Capsule | [GitHub](https://github.com/capsulerun/capsule) | [](https://github.com/capsulerun/capsule) | wasm, sandbox, task-runtime | 在隔离 WebAssembly 沙箱中协调 agent 任务的耐久运行时,提供重试与生命周期跟踪。 |
-| terminal-bench-env | [GitHub](https://github.com/ucsb-mlsec/terminal-bench-env) | [](https://github.com/ucsb-mlsec/terminal-bench-env) | terminal, benchmark-env, sandbox | 为终端代理基准测试提供执行环境层。 |
+| terminal-bench-env | [GitHub](https://github.com/ucsb-mlsec/terminal-bench-env) | [](https://github.com/ucsb-mlsec/terminal-bench-env) | terminal, benchmark-env, sandbox | 为终端代理基准测试提供执行环境层。 |
### Protocols, Tool Interfaces & Agent Contracts
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| GitHub Spec Kit | [GitHub](https://github.com/github/spec-kit) | [](https://github.com/github/spec-kit) | spec-driven, workflows, tooling | 面向规范驱动开发的工具包,可引导代理进行确定性执行。 |
-| MCP Servers | [GitHub](https://github.com/modelcontextprotocol/servers) | [](https://github.com/modelcontextprotocol/servers) | mcp, servers, implementations | 官方 MCP Server 实现集合,覆盖多种工具与场景。 |
-| AGENTS.md | [GitHub](https://github.com/agentsmd/agents.md) | [](https://github.com/agentsmd/agents.md) | spec, agent-file, instructions | 面向代码仓库本地代理指令的开放格式规范。 |
-| Model Context Protocol | [GitHub](https://github.com/modelcontextprotocol/modelcontextprotocol) | [](https://github.com/modelcontextprotocol/modelcontextprotocol) | mcp, protocol, interoperability | MCP 的核心规范与文档,定义工具与上下文互操作方式。 |
-| directories (rules and MCP indexes) | [GitHub](https://github.com/leerob/directories) | [](https://github.com/leerob/directories) | directories, mcp, rules | 面向规则与 MCP server 发现的目录索引集合。 |
-| LangChain MCP Adapters | [GitHub](https://github.com/langchain-ai/langchain-mcp-adapters) | [](https://github.com/langchain-ai/langchain-mcp-adapters) | mcp, adapters, integration | 用于连接 LangChain 组件与 MCP server 的适配层。 |
-| Microsoft MCP Servers | [GitHub](https://github.com/microsoft/mcp) | [](https://github.com/microsoft/mcp) | mcp, enterprise, servers | 微软官方 MCP server 目录,连接企业数据与工具。 |
-| ACPX | [GitHub](https://github.com/openclaw/acpx) | [](https://github.com/openclaw/acpx) | acp, client, sessions | 面向有状态 Agent Client Protocol 会话的无头 CLI 客户端。 |
-| Microsoft Learn MCP | [GitHub](https://github.com/MicrosoftDocs/mcp) | [](https://github.com/MicrosoftDocs/mcp) | mcp, docs, grounding | 为代理接入微软文档知识提供的 MCP server 与 CLI。 |
+| GitHub Spec Kit | [GitHub](https://github.com/github/spec-kit) | [](https://github.com/github/spec-kit) | spec-driven, workflows, tooling | 面向规范驱动开发的工具包,可引导代理进行确定性执行。 |
+| MCP Servers | [GitHub](https://github.com/modelcontextprotocol/servers) | [](https://github.com/modelcontextprotocol/servers) | mcp, servers, implementations | 官方 MCP Server 实现集合,覆盖多种工具与场景。 |
+| AGENTS.md | [GitHub](https://github.com/agentsmd/agents.md) | [](https://github.com/agentsmd/agents.md) | spec, agent-file, instructions | 面向代码仓库本地代理指令的开放格式规范。 |
+| Model Context Protocol | [GitHub](https://github.com/modelcontextprotocol/modelcontextprotocol) | [](https://github.com/modelcontextprotocol/modelcontextprotocol) | mcp, protocol, interoperability | MCP 的核心规范与文档,定义工具与上下文互操作方式。 |
+| directories (rules and MCP indexes) | [GitHub](https://github.com/leerob/directories) | [](https://github.com/leerob/directories) | directories, mcp, rules | 面向规则与 MCP server 发现的目录索引集合。 |
+| LangChain MCP Adapters | [GitHub](https://github.com/langchain-ai/langchain-mcp-adapters) | [](https://github.com/langchain-ai/langchain-mcp-adapters) | mcp, adapters, integration | 用于连接 LangChain 组件与 MCP server 的适配层。 |
+| Microsoft MCP Servers | [GitHub](https://github.com/microsoft/mcp) | [](https://github.com/microsoft/mcp) | mcp, enterprise, servers | 微软官方 MCP server 目录,连接企业数据与工具。 |
+| ACPX | [GitHub](https://github.com/openclaw/acpx) | [](https://github.com/openclaw/acpx) | acp, client, sessions | 面向有状态 Agent Client Protocol 会话的无头 CLI 客户端。 |
+| Microsoft Learn MCP | [GitHub](https://github.com/MicrosoftDocs/mcp) | [](https://github.com/MicrosoftDocs/mcp) | mcp, docs, grounding | 为代理接入微软文档知识提供的 MCP server 与 CLI。 |
| IBM MCP | [GitHub](https://github.com/IBM/mcp) | [](https://github.com/IBM/mcp) | mcp, clients, tooling | IBM 提供的 MCP server、client 与开发工具集合。 |
-| AGENT.md | [GitHub](https://github.com/agentmd/agent.md) | [](https://github.com/agentmd/agent.md) | standard, agent-file, interoperability | 面向代理编码工具的标准化机器可读文件格式。 |
+| AGENT.md | [GitHub](https://github.com/agentmd/agent.md) | [](https://github.com/agentmd/agent.md) | standard, agent-file, interoperability | 面向代理编码工具的标准化机器可读文件格式。 |
### Evaluation Harnesses & Benchmarks
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| Promptfoo | [GitHub](https://github.com/promptfoo/promptfoo) | [](https://github.com/promptfoo/promptfoo) | eval, red-team, ci | 配置驱动的 Prompt/Agent/RAG 测试、对比与红队评估工具。 |
-| DeepEval | [GitHub](https://github.com/confident-ai/deepeval) | [](https://github.com/confident-ai/deepeval) | evaluation, framework, testing | 支持代理与工作流质量测试的 LLM 评估框架。 |
-| RAGAS | [GitHub](https://github.com/vibrantlabsai/ragas) | [](https://github.com/vibrantlabsai/ragas) | rag, metrics, evaluation | 面向 LLM 与 RAG 质量指标的开源评测工具集。 |
-| lm-evaluation-harness | [GitHub](https://github.com/EleutherAI/lm-evaluation-harness) | [](https://github.com/EleutherAI/lm-evaluation-harness) | benchmark, harness, llm | 广泛使用的 LLM 基准 harness,用于跨任务一致评估。 |
-| SWE-bench | [GitHub](https://github.com/SWE-bench/SWE-bench) | [](https://github.com/SWE-bench/SWE-bench) | benchmark, swe, evaluation | 软件工程代理 issue 修复能力的标准评测基准。 |
-| verifiers | [GitHub](https://github.com/PrimeIntellect-ai/verifiers) | [](https://github.com/PrimeIntellect-ai/verifiers) | verifier, rl, evaluation | 面向 RL 环境与 verifier 评测回路的库。 |
-| AgentBench | [GitHub](https://github.com/THUDM/AgentBench) | [](https://github.com/THUDM/AgentBench) | benchmark, cross-domain, agent | 跨环境评测基准,用于衡量 LLM 代理的工具使用能力。 |
-| LangWatch | [GitHub](https://github.com/langwatch/langwatch) | [](https://github.com/langwatch/langwatch) | simulation, evaluation, testing | 面向代理模拟、评测闭环与生产测试的端到端平台。 |
-| EvalScope | [GitHub](https://github.com/modelscope/evalscope) | [](https://github.com/modelscope/evalscope) | benchmark, framework, llm | 可定制的大模型基准与性能评测框架。 |
-| Terminal-Bench | [GitHub](https://github.com/harbor-framework/terminal-bench) | [](https://github.com/harbor-framework/terminal-bench) | terminal, benchmark, long-horizon | 面向长时与重验证任务的终端原生代理基准套件。 |
-| Harbor | [GitHub](https://github.com/harbor-framework/harbor) | [](https://github.com/harbor-framework/harbor) | evaluation, harness, rl-env | 用于运行代理评测并构建类 RL 环境的框架。 |
-| tau2-bench | [GitHub](https://github.com/sierra-research/tau2-bench) | [](https://github.com/sierra-research/tau2-bench) | tool-use, interaction, benchmark | 强调多步执行质量的工具-代理-用户交互基准。 |
-| NeMo Gym | [GitHub](https://github.com/NVIDIA-NeMo/Gym) | [](https://github.com/NVIDIA-NeMo/Gym) | rl-env, training, evaluation | 用于构建 LLM/代理训练与评测 RL 环境的工具集。 |
+| Promptfoo | [GitHub](https://github.com/promptfoo/promptfoo) | [](https://github.com/promptfoo/promptfoo) | eval, red-team, ci | 配置驱动的 Prompt/Agent/RAG 测试、对比与红队评估工具。 |
+| DeepEval | [GitHub](https://github.com/confident-ai/deepeval) | [](https://github.com/confident-ai/deepeval) | evaluation, framework, testing | 支持代理与工作流质量测试的 LLM 评估框架。 |
+| RAGAS | [GitHub](https://github.com/vibrantlabsai/ragas) | [](https://github.com/vibrantlabsai/ragas) | rag, metrics, evaluation | 面向 LLM 与 RAG 质量指标的开源评测工具集。 |
+| lm-evaluation-harness | [GitHub](https://github.com/EleutherAI/lm-evaluation-harness) | [](https://github.com/EleutherAI/lm-evaluation-harness) | benchmark, harness, llm | 广泛使用的 LLM 基准 harness,用于跨任务一致评估。 |
+| SWE-bench | [GitHub](https://github.com/SWE-bench/SWE-bench) | [](https://github.com/SWE-bench/SWE-bench) | benchmark, swe, evaluation | 软件工程代理 issue 修复能力的标准评测基准。 |
+| verifiers | [GitHub](https://github.com/PrimeIntellect-ai/verifiers) | [](https://github.com/PrimeIntellect-ai/verifiers) | verifier, rl, evaluation | 面向 RL 环境与 verifier 评测回路的库。 |
+| AgentBench | [GitHub](https://github.com/THUDM/AgentBench) | [](https://github.com/THUDM/AgentBench) | benchmark, cross-domain, agent | 跨环境评测基准,用于衡量 LLM 代理的工具使用能力。 |
+| LangWatch | [GitHub](https://github.com/langwatch/langwatch) | [](https://github.com/langwatch/langwatch) | simulation, evaluation, testing | 面向代理模拟、评测闭环与生产测试的端到端平台。 |
+| EvalScope | [GitHub](https://github.com/modelscope/evalscope) | [](https://github.com/modelscope/evalscope) | benchmark, framework, llm | 可定制的大模型基准与性能评测框架。 |
+| Terminal-Bench | [GitHub](https://github.com/harbor-framework/terminal-bench) | [](https://github.com/harbor-framework/terminal-bench) | terminal, benchmark, long-horizon | 面向长时与重验证任务的终端原生代理基准套件。 |
+| Harbor | [GitHub](https://github.com/harbor-framework/harbor) | [](https://github.com/harbor-framework/harbor) | evaluation, harness, rl-env | 用于运行代理评测并构建类 RL 环境的框架。 |
+| tau2-bench | [GitHub](https://github.com/sierra-research/tau2-bench) | [](https://github.com/sierra-research/tau2-bench) | tool-use, interaction, benchmark | 强调多步执行质量的工具-代理-用户交互基准。 |
+| NeMo Gym | [GitHub](https://github.com/NVIDIA-NeMo/Gym) | [](https://github.com/NVIDIA-NeMo/Gym) | rl-env, training, evaluation | 用于构建 LLM/代理训练与评测 RL 环境的工具集。 |
| TheAgentCompany | [GitHub](https://github.com/TheAgentCompany/TheAgentCompany) | [](https://github.com/TheAgentCompany/TheAgentCompany) | benchmark, workplace, multi-step | 以模拟软件公司任务评测多步工作场景自治能力的 agent 基准。 |
-| auto-harness | [GitHub](https://github.com/neosigmaai/auto-harness) | [](https://github.com/neosigmaai/auto-harness) | optimization, regression, evals | 以基准门控的优化闭环,可自动挖掘失败样例、修改 agent 代码,并在夜间持续防回归。 |
-| Inspect Evals | [GitHub](https://github.com/UKGovernmentBEIS/inspect_evals) | [](https://github.com/UKGovernmentBEIS/inspect_evals) | inspect, eval-suite, reproducibility | 面向 Inspect AI 工作流的评测套件集合。 |
-| SWE-Bench Pro | [GitHub](https://github.com/scaleapi/SWE-bench_Pro-os) | [](https://github.com/scaleapi/SWE-bench_Pro-os) | swe, benchmark, long-horizon | 面向 issue 驱动编码代理的长时软件工程基准,提供可复现的 Docker 化评测流程。 |
+| auto-harness | [GitHub](https://github.com/neosigmaai/auto-harness) | [](https://github.com/neosigmaai/auto-harness) | optimization, regression, evals | 以基准门控的优化闭环,可自动挖掘失败样例、修改 agent 代码,并在夜间持续防回归。 |
+| Inspect Evals | [GitHub](https://github.com/UKGovernmentBEIS/inspect_evals) | [](https://github.com/UKGovernmentBEIS/inspect_evals) | inspect, eval-suite, reproducibility | 面向 Inspect AI 工作流的评测套件集合。 |
+| SWE-Bench Pro | [GitHub](https://github.com/scaleapi/SWE-bench_Pro-os) | [](https://github.com/scaleapi/SWE-bench_Pro-os) | swe, benchmark, long-horizon | 面向 issue 驱动编码代理的长时软件工程基准,提供可复现的 Docker 化评测流程。 |
| Agent Evaluation | [GitHub](https://github.com/awslabs/agent-evaluation) | [](https://github.com/awslabs/agent-evaluation) | evaluation, testing, ci | AWS 的虚拟代理测试框架,支持评估器驱动的多轮对话、钩子扩展与 CI 友好工作流。 |
-| WorkArena | [GitHub](https://github.com/ServiceNow/WorkArena) | [](https://github.com/ServiceNow/WorkArena) | browser, benchmark, enterprise | 面向企业知识工作任务的浏览器代理基准。 |
-| OpenHands Benchmarks | [GitHub](https://github.com/OpenHands/benchmarks) | [](https://github.com/OpenHands/benchmarks) | openhands, eval, harness | OpenHands 体系的评测 harness 与基准定义。 |
+| WorkArena | [GitHub](https://github.com/ServiceNow/WorkArena) | [](https://github.com/ServiceNow/WorkArena) | browser, benchmark, enterprise | 面向企业知识工作任务的浏览器代理基准。 |
+| OpenHands Benchmarks | [GitHub](https://github.com/OpenHands/benchmarks) | [](https://github.com/OpenHands/benchmarks) | openhands, eval, harness | OpenHands 体系的评测 harness 与基准定义。 |
| WebArena-Verified | [GitHub](https://github.com/ServiceNow/webarena-verified) | [](https://github.com/ServiceNow/webarena-verified) | web-agent, benchmark, deterministic | 带确定性评测器的已验证 Web 代理基准。 |
@@ -176,90 +176,91 @@
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| Langfuse | [GitHub](https://github.com/langfuse/langfuse) | [](https://github.com/langfuse/langfuse) | llmops, tracing, metrics | 开源 LLM 工程平台,覆盖链路追踪、指标、提示词与评测。 |
-| MLflow | [GitHub](https://github.com/mlflow/mlflow) | [](https://github.com/mlflow/mlflow) | platform, monitoring, evaluation | 通用 AI 工程平台,支持代理系统的监控与评测。 |
-| Opik | [GitHub](https://github.com/comet-ml/opik) | [](https://github.com/comet-ml/opik) | monitoring, eval, tracing | 面向 LLM 应用与代理流程的端到端调试、评测与监控平台。 |
-| RagaAI Catalyst | [GitHub](https://github.com/raga-ai-hub/RagaAI-Catalyst) | [](https://github.com/raga-ai-hub/RagaAI-Catalyst) | agentops, analytics, monitoring | 带时间线与执行图分析的代理可观测性监控框架。 |
-| TensorZero | [GitHub](https://github.com/tensorzero/tensorzero) | [](https://github.com/tensorzero/tensorzero) | llmops, gateway, optimization | 开源 LLMOps 栈,统一网关、可观测性、评测与优化。 |
-| Arize Phoenix | [GitHub](https://github.com/Arize-ai/phoenix) | [](https://github.com/Arize-ai/phoenix) | observability, tracing, evaluation | 开放的 AI 可观测性平台,支持追踪与评测分析。 |
-| OpenLLMetry | [GitHub](https://github.com/traceloop/openllmetry) | [](https://github.com/traceloop/openllmetry) | opentelemetry, instrumentation, tracing | 基于 OpenTelemetry 的 GenAI/LLM 应用可观测性埋点方案。 |
-| Helicone | [GitHub](https://github.com/Helicone/helicone) | [](https://github.com/Helicone/helicone) | monitoring, traffic, production | 轻量平台,用于生产环境 LLM 流量监控与评估。 |
-| AgentOps SDK | [GitHub](https://github.com/AgentOps-AI/agentops) | [](https://github.com/AgentOps-AI/agentops) | agentops, monitoring, cost | 面向代理工作流的监控与基准 SDK,支持成本与链路追踪。 |
-| Latitude | [GitHub](https://github.com/latitude-dev/latitude-llm) | [](https://github.com/latitude-dev/latitude-llm) | platform, eval, observability | 开源 agent 工程平台,集成评测与可观测性能力。 |
-| Laminar | [GitHub](https://github.com/lmnr-ai/lmnr) | [](https://github.com/lmnr-ai/lmnr) | observability, tracing, evals | 面向代理系统的可观测平台,覆盖追踪、评测运行、监控与仪表盘。 |
-| claude-code-reverse | [GitHub](https://github.com/Yuyz0112/claude-code-reverse) | [](https://github.com/Yuyz0112/claude-code-reverse) | trace, visualization, debugging | 可视化并分析 Claude Code 大模型交互链路的工具。 |
-| OpenInference | [GitHub](https://github.com/Arize-ai/openinference) | [](https://github.com/Arize-ai/openinference) | spec, instrumentation, observability | 面向 AI 可观测性的开放埋点规范与工具。 |
-| Future AGI | [GitHub](https://github.com/future-agi/future-agi) | [](https://github.com/future-agi/future-agi) | observability, evaluation, guardrails | 可自托管的平台,将代理追踪、评测、模拟、护栏与网关运维闭环整合在一起。 |
+| Langfuse | [GitHub](https://github.com/langfuse/langfuse) | [](https://github.com/langfuse/langfuse) | llmops, tracing, metrics | 开源 LLM 工程平台,覆盖链路追踪、指标、提示词与评测。 |
+| MLflow | [GitHub](https://github.com/mlflow/mlflow) | [](https://github.com/mlflow/mlflow) | platform, monitoring, evaluation | 通用 AI 工程平台,支持代理系统的监控与评测。 |
+| Opik | [GitHub](https://github.com/comet-ml/opik) | [](https://github.com/comet-ml/opik) | monitoring, eval, tracing | 面向 LLM 应用与代理流程的端到端调试、评测与监控平台。 |
+| RagaAI Catalyst | [GitHub](https://github.com/raga-ai-hub/RagaAI-Catalyst) | [](https://github.com/raga-ai-hub/RagaAI-Catalyst) | agentops, analytics, monitoring | 带时间线与执行图分析的代理可观测性监控框架。 |
+| TensorZero | [GitHub](https://github.com/tensorzero/tensorzero) | [](https://github.com/tensorzero/tensorzero) | llmops, gateway, optimization | 开源 LLMOps 栈,统一网关、可观测性、评测与优化。 |
+| Arize Phoenix | [GitHub](https://github.com/Arize-ai/phoenix) | [](https://github.com/Arize-ai/phoenix) | observability, tracing, evaluation | 开放的 AI 可观测性平台,支持追踪与评测分析。 |
+| OpenLLMetry | [GitHub](https://github.com/traceloop/openllmetry) | [](https://github.com/traceloop/openllmetry) | opentelemetry, instrumentation, tracing | 基于 OpenTelemetry 的 GenAI/LLM 应用可观测性埋点方案。 |
+| Helicone | [GitHub](https://github.com/Helicone/helicone) | [](https://github.com/Helicone/helicone) | monitoring, traffic, production | 轻量平台,用于生产环境 LLM 流量监控与评估。 |
+| AgentOps SDK | [GitHub](https://github.com/AgentOps-AI/agentops) | [](https://github.com/AgentOps-AI/agentops) | agentops, monitoring, cost | 面向代理工作流的监控与基准 SDK,支持成本与链路追踪。 |
+| Latitude | [GitHub](https://github.com/latitude-dev/latitude-llm) | [](https://github.com/latitude-dev/latitude-llm) | platform, eval, observability | 开源 agent 工程平台,集成评测与可观测性能力。 |
+| Laminar | [GitHub](https://github.com/lmnr-ai/lmnr) | [](https://github.com/lmnr-ai/lmnr) | observability, tracing, evals | 面向代理系统的可观测平台,覆盖追踪、评测运行、监控与仪表盘。 |
+| claude-code-reverse | [GitHub](https://github.com/Yuyz0112/claude-code-reverse) | [](https://github.com/Yuyz0112/claude-code-reverse) | trace, visualization, debugging | 可视化并分析 Claude Code 大模型交互链路的工具。 |
+| OpenInference | [GitHub](https://github.com/Arize-ai/openinference) | [](https://github.com/Arize-ai/openinference) | spec, instrumentation, observability | 面向 AI 可观测性的开放埋点规范与工具。 |
+| Future AGI | [GitHub](https://github.com/future-agi/future-agi) | [](https://github.com/future-agi/future-agi) | observability, evaluation, guardrails | 可自托管的平台,将代理追踪、评测、模拟、护栏与网关运维闭环整合在一起。 |
### Guardrails, Security & Governance
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| LiteLLM | [GitHub](https://github.com/BerriAI/litellm) | [](https://github.com/BerriAI/litellm) | gateway, proxy, guardrails | 统一 LLM 网关/代理,支持成本追踪、负载均衡与护栏。 |
-| Kong | [GitHub](https://github.com/Kong/kong) | [](https://github.com/Kong/kong) | gateway, policy, infra | API 与 AI 网关基础设施,可用于代理系统的策略执行。 |
-| Portkey Gateway | [GitHub](https://github.com/Portkey-AI/gateway) | [](https://github.com/Portkey-AI/gateway) | gateway, guardrails, routing | 支持多模型路由与护栏控制的 AI 网关。 |
-| CAI (Cybersecurity AI) | [GitHub](https://github.com/aliasrobotics/cai) | [](https://github.com/aliasrobotics/cai) | security, governance, framework | 面向攻防场景的安全型代理框架。 |
-| OpenAI Realtime Agents | [GitHub](https://github.com/openai/openai-realtime-agents) | [](https://github.com/openai/openai-realtime-agents) | realtime, orchestration, control | 展示高级实时代理模式,强调结构化控制与交互回路。 |
-| Plano | [GitHub](https://github.com/katanemo/plano) | [](https://github.com/katanemo/plano) | proxy, safety, data-plane | 内置编排、安全与可观测性的 AI 原生代理与数据平面。 |
-| OpenAI CS Agents Demo | [GitHub](https://github.com/openai/openai-cs-agents-demo) | [](https://github.com/openai/openai-cs-agents-demo) | demo, handoffs, governance | 客服多代理示例,展示交接流程与类似护栏的控制节点。 |
-| ContextForge | [GitHub](https://github.com/IBM/mcp-context-forge) | [](https://github.com/IBM/mcp-context-forge) | gateway, governance, observability | 统一 MCP、A2A 与 REST/gRPC 端点的注册与代理层,提供集中治理与可观测能力。 |
-| Archestra | [GitHub](https://github.com/archestra-ai/archestra) | [](https://github.com/archestra-ai/archestra) | enterprise, guardrails, governance | 企业级 AI 平台,提供护栏、MCP 注册中心与编排能力。 |
-| Tracecat | [GitHub](https://github.com/TracecatHQ/tracecat) | [](https://github.com/TracecatHQ/tracecat) | security, automation, policy | 面向安全团队的 AI 自动化平台,提供策略与工作流控制。 |
-| AgentGateway | [GitHub](https://github.com/agentgateway/agentgateway) | [](https://github.com/agentgateway/agentgateway) | gateway, mcp, proxy | 面向 AI 代理与 MCP 生态的代理网关。 |
-| Haft | [GitHub](https://github.com/m0n0x41d/haft) | [](https://github.com/m0n0x41d/haft) | governance, decisions, mcp | 面向决策治理的 harness,在代理执行前沉淀可证伪契约、证据与 commission 生命周期。 |
+| LiteLLM | [GitHub](https://github.com/BerriAI/litellm) | [](https://github.com/BerriAI/litellm) | gateway, proxy, guardrails | 统一 LLM 网关/代理,支持成本追踪、负载均衡与护栏。 |
+| Kong | [GitHub](https://github.com/Kong/kong) | [](https://github.com/Kong/kong) | gateway, policy, infra | API 与 AI 网关基础设施,可用于代理系统的策略执行。 |
+| Portkey Gateway | [GitHub](https://github.com/Portkey-AI/gateway) | [](https://github.com/Portkey-AI/gateway) | gateway, guardrails, routing | 支持多模型路由与护栏控制的 AI 网关。 |
+| CAI (Cybersecurity AI) | [GitHub](https://github.com/aliasrobotics/cai) | [](https://github.com/aliasrobotics/cai) | security, governance, framework | 面向攻防场景的安全型代理框架。 |
+| OpenAI Realtime Agents | [GitHub](https://github.com/openai/openai-realtime-agents) | [](https://github.com/openai/openai-realtime-agents) | realtime, orchestration, control | 展示高级实时代理模式,强调结构化控制与交互回路。 |
+| Plano | [GitHub](https://github.com/katanemo/plano) | [](https://github.com/katanemo/plano) | proxy, safety, data-plane | 内置编排、安全与可观测性的 AI 原生代理与数据平面。 |
+| OpenAI CS Agents Demo | [GitHub](https://github.com/openai/openai-cs-agents-demo) | [](https://github.com/openai/openai-cs-agents-demo) | demo, handoffs, governance | 客服多代理示例,展示交接流程与类似护栏的控制节点。 |
+| ContextForge | [GitHub](https://github.com/IBM/mcp-context-forge) | [](https://github.com/IBM/mcp-context-forge) | gateway, governance, observability | 统一 MCP、A2A 与 REST/gRPC 端点的注册与代理层,提供集中治理与可观测能力。 |
+| Archestra | [GitHub](https://github.com/archestra-ai/archestra) | [](https://github.com/archestra-ai/archestra) | enterprise, guardrails, governance | 企业级 AI 平台,提供护栏、MCP 注册中心与编排能力。 |
+| Tracecat | [GitHub](https://github.com/TracecatHQ/tracecat) | [](https://github.com/TracecatHQ/tracecat) | security, automation, policy | 面向安全团队的 AI 自动化平台,提供策略与工作流控制。 |
+| AgentGateway | [GitHub](https://github.com/agentgateway/agentgateway) | [](https://github.com/agentgateway/agentgateway) | gateway, mcp, proxy | 面向 AI 代理与 MCP 生态的代理网关。 |
+| Haft | [GitHub](https://github.com/m0n0x41d/haft) | [](https://github.com/m0n0x41d/haft) | governance, decisions, mcp | 面向决策治理的 harness,在代理执行前沉淀可证伪契约、证据与 commission 生命周期。 |
### Reference Harness Implementations
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| OpenCode | [GitHub](https://github.com/anomalyco/opencode) | [](https://github.com/anomalyco/opencode) | terminal, coding-agent, subagents | 开源编码代理,提供内置 plan/build 角色、子代理、LSP 支持与客户端-服务端运行时。 |
-| Claude Code | [GitHub](https://github.com/anthropics/claude-code) | [](https://github.com/anthropics/claude-code) | terminal, coding-agent, git-workflows | 官方终端编码代理,可理解代码库并通过自然语言执行编辑、调试与 Git 工作流。 |
-| Gemini CLI | [GitHub](https://github.com/google-gemini/gemini-cli) | [](https://github.com/google-gemini/gemini-cli) | terminal, coding-agent, mcp | 开源终端代理,提供内置工具、MCP 支持、会话检查点与沙箱控制能力。 |
-| Codex CLI | [GitHub](https://github.com/openai/codex) | [](https://github.com/openai/codex) | terminal, coding-agent, local-execution | 终端原生的本地编码代理,提供面向软件任务的实用 agent 工作流。 |
-| OpenHands | [GitHub](https://github.com/OpenHands/OpenHands) | [](https://github.com/OpenHands/OpenHands) | coding-agent, software-engineering, repo | 开源 AI 软件工程代理,聚焦仓库级编码任务执行。 |
-| learn-claude-code | [GitHub](https://github.com/shareAI-lab/learn-claude-code) | [](https://github.com/shareAI-lab/learn-claude-code) | tutorial, harness, claude-code | 从 0 到 1 构建 Claude Code 类系统的实战 harness 教程。 |
-| OpenManus | [GitHub](https://github.com/FoundationAgents/OpenManus) | [](https://github.com/FoundationAgents/OpenManus) | general-agent, autonomy, workflows | 面向广义自治任务的开放基础系统,覆盖编码等复杂场景。 |
-| pi | [GitHub](https://github.com/earendil-works/pi) | [](https://github.com/earendil-works/pi) | coding-agent, runtime, monorepo | 将编码代理 CLI、共享运行时与多模型 LLM 栈整合在一起的 agent harness monorepo。 |
-| aider | [GitHub](https://github.com/Aider-AI/aider) | [](https://github.com/Aider-AI/aider) | terminal, repo-map, testing | 终端编码助手,提供仓库映射、Git 感知编辑与内置 lint/test 反馈回路。 |
-| Claude Code Plugins: Orchestration and Automation | [GitHub](https://github.com/wshobson/agents) | [](https://github.com/wshobson/agents) | claude-code, plugins, orchestration | 面向 Claude Code 的生产级插件仓库,整合 agents、skills、tools 与多代理工作流编排器。 |
-| CLI-Anything | [GitHub](https://github.com/HKUDS/CLI-Anything) | [](https://github.com/HKUDS/CLI-Anything) | cli, tool-use, automation | 在代理回路中统一命令行工具使用的 CLI agent 系统。 |
-| NanoClaw | [GitHub](https://github.com/qwibitai/nanoclaw) | [](https://github.com/qwibitai/nanoclaw) | containers, claude-sdk, scheduling | 基于容器隔离的 Claude 代理 harness,提供多通道路由、定时任务、按群组隔离的记忆,以及小代码库定制能力。 |
-| Qwen Code | [GitHub](https://github.com/QwenLM/qwen-code) | [](https://github.com/QwenLM/qwen-code) | terminal, coding-agent, cli | 终端原生开源编码代理,面向实际开发循环优化。 |
-| SuperClaude Framework | [GitHub](https://github.com/SuperClaude-Org/SuperClaude_Framework) | [](https://github.com/SuperClaude-Org/SuperClaude_Framework) | config, personas, workflow | 为编码代理增强命令、角色与方法模板的配置框架。 |
-| Devika | [GitHub](https://github.com/stitionai/devika) | [](https://github.com/stitionai/devika) | assistant, planning, coding | 开源编码助手系统,支持任务规划与实现。 |
-| SWE-agent | [GitHub](https://github.com/SWE-agent/SWE-agent) | [](https://github.com/SWE-agent/SWE-agent) | swe, issue-fixing, tooling | 研究级编码代理,通过明确的工具回路自动修复 GitHub issue。 |
-| cmux | [GitHub](https://github.com/manaflow-ai/cmux) | [](https://github.com/manaflow-ai/cmux) | macos, workspace, browser | 面向 AI 编码代理的原生 macOS 终端与浏览器工作区,提供通知、分屏与可脚本化控制。 |
-| Aperant | [GitHub](https://github.com/AndyMik90/Aperant) | [](https://github.com/AndyMik90/Aperant) | coding-agent, parallel, memory | 自治多代理编码框架,提供并行执行、隔离工作区、质量校验回路与持久记忆。 |
-| Eigent | [GitHub](https://github.com/eigent-ai/eigent) | [](https://github.com/eigent-ai/eigent) | desktop, cowork, productivity | 开源桌面协作代理,可执行自治任务并提升开发生产力。 |
-| IronClaw | [GitHub](https://github.com/nearai/ironclaw) | [](https://github.com/nearai/ironclaw) | security, wasm, routines | 安全优先的个人 agent harness,集成 WASM 沙箱、例程调度、工具插件与持久记忆。 |
-| OpenHarness | [GitHub](https://github.com/HKUDS/OpenHarness) | [](https://github.com/HKUDS/OpenHarness) | tool-use, memory, multi-agent | 开放式 agent harness 实现,覆盖工具调用、技能、记忆、权限与多代理协作。 |
-| Superset | [GitHub](https://github.com/superset-sh/superset) | [](https://github.com/superset-sh/superset) | worktrees, desktop, parallel | 基于 worktree 的桌面编排器,可在统一工作区中并行运行并审阅多个 CLI 编码代理。 |
-| GitHub Copilot CLI | [GitHub](https://github.com/github/copilot-cli) | [](https://github.com/github/copilot-cli) | terminal, coding-agent, mcp | 官方终端编码代理,基于 GitHub Copilot harness,提供 MCP 扩展、审批控制与 GitHub 原生上下文。 |
-| Open SWE | [GitHub](https://github.com/langchain-ai/open-swe) | [](https://github.com/langchain-ai/open-swe) | async, coding-agent, swe | 面向软件问题流的异步开源编码代理。 |
-| Paseo | [GitHub](https://github.com/getpaseo/paseo) | [](https://github.com/getpaseo/paseo) | coding-agent, daemon, multi-device | 面向多设备的编码代理守护进程与客户端栈,用于编排本地代理、并行运行与跨模型工作流。 |
-| 1Code | [GitHub](https://github.com/21st-dev/1code) | [](https://github.com/21st-dev/1code) | coding-agent, orchestration, worktrees | 桌面优先的编码代理编排器,提供 worktree 隔离、后台沙箱、MCP 工具管理与自动化触发。 |
-| OSAURUS | [GitHub](https://github.com/osaurus-ai/osaurus) | [](https://github.com/osaurus-ai/osaurus) | macos, local-first, memory | 面向 macOS 的本地自治编码代理 harness,支持持久记忆。 |
-| holaOS | [GitHub](https://github.com/holaboss-ai/holaOS) | [](https://github.com/holaboss-ai/holaOS) | long-horizon, desktop, durable-state | 面向长时任务的桌面优先 agent environment,整合运行时、记忆、工具、应用与持久状态。 |
-| HiClaw | [GitHub](https://github.com/agentscope-ai/HiClaw) | [](https://github.com/agentscope-ai/HiClaw) | multi-agent, human-in-the-loop, shared-state | 协作式多代理操作系统,通过 Matrix 房间提供管理者-工作者协同、共享状态与人在回路监督。 |
-| mini-swe-agent | [GitHub](https://github.com/SWE-agent/mini-swe-agent) | [](https://github.com/SWE-agent/mini-swe-agent) | minimal, swe, coding-agent | 极简编码代理实现,同时具备较强基准表现。 |
-| oh-my-pi | [GitHub](https://github.com/can1357/oh-my-pi) | [](https://github.com/can1357/oh-my-pi) | terminal, lsp, subagents | 终端 AI 编码代理,包含编辑安全、LSP 集成与子代理支持。 |
-| TinyAGI | [GitHub](https://github.com/TinyAGI/tinyagi) | [](https://github.com/TinyAGI/tinyagi) | team-orchestration, autonomous, workflows | 面向“一人公司”场景的团队化代理编排器。 |
+| OpenCode | [GitHub](https://github.com/anomalyco/opencode) | [](https://github.com/anomalyco/opencode) | terminal, coding-agent, subagents | 开源编码代理,提供内置 plan/build 角色、子代理、LSP 支持与客户端-服务端运行时。 |
+| Claude Code | [GitHub](https://github.com/anthropics/claude-code) | [](https://github.com/anthropics/claude-code) | terminal, coding-agent, git-workflows | 官方终端编码代理,可理解代码库并通过自然语言执行编辑、调试与 Git 工作流。 |
+| Gemini CLI | [GitHub](https://github.com/google-gemini/gemini-cli) | [](https://github.com/google-gemini/gemini-cli) | terminal, coding-agent, mcp | 开源终端代理,提供内置工具、MCP 支持、会话检查点与沙箱控制能力。 |
+| Codex CLI | [GitHub](https://github.com/openai/codex) | [](https://github.com/openai/codex) | terminal, coding-agent, local-execution | 终端原生的本地编码代理,提供面向软件任务的实用 agent 工作流。 |
+| OpenHands | [GitHub](https://github.com/OpenHands/OpenHands) | [](https://github.com/OpenHands/OpenHands) | coding-agent, software-engineering, repo | 开源 AI 软件工程代理,聚焦仓库级编码任务执行。 |
+| learn-claude-code | [GitHub](https://github.com/shareAI-lab/learn-claude-code) | [](https://github.com/shareAI-lab/learn-claude-code) | tutorial, harness, claude-code | 从 0 到 1 构建 Claude Code 类系统的实战 harness 教程。 |
+| OpenManus | [GitHub](https://github.com/FoundationAgents/OpenManus) | [](https://github.com/FoundationAgents/OpenManus) | general-agent, autonomy, workflows | 面向广义自治任务的开放基础系统,覆盖编码等复杂场景。 |
+| pi | [GitHub](https://github.com/earendil-works/pi) | [](https://github.com/earendil-works/pi) | coding-agent, runtime, monorepo | 将编码代理 CLI、共享运行时与多模型 LLM 栈整合在一起的 agent harness monorepo。 |
+| aider | [GitHub](https://github.com/Aider-AI/aider) | [](https://github.com/Aider-AI/aider) | terminal, repo-map, testing | 终端编码助手,提供仓库映射、Git 感知编辑与内置 lint/test 反馈回路。 |
+| Claude Code Plugins: Orchestration and Automation | [GitHub](https://github.com/wshobson/agents) | [](https://github.com/wshobson/agents) | claude-code, plugins, orchestration | 面向 Claude Code 的生产级插件仓库,整合 agents、skills、tools 与多代理工作流编排器。 |
+| CLI-Anything | [GitHub](https://github.com/HKUDS/CLI-Anything) | [](https://github.com/HKUDS/CLI-Anything) | cli, tool-use, automation | 在代理回路中统一命令行工具使用的 CLI agent 系统。 |
+| NanoClaw | [GitHub](https://github.com/qwibitai/nanoclaw) | [](https://github.com/qwibitai/nanoclaw) | containers, claude-sdk, scheduling | 基于容器隔离的 Claude 代理 harness,提供多通道路由、定时任务、按群组隔离的记忆,以及小代码库定制能力。 |
+| Qwen Code | [GitHub](https://github.com/QwenLM/qwen-code) | [](https://github.com/QwenLM/qwen-code) | terminal, coding-agent, cli | 终端原生开源编码代理,面向实际开发循环优化。 |
+| SuperClaude Framework | [GitHub](https://github.com/SuperClaude-Org/SuperClaude_Framework) | [](https://github.com/SuperClaude-Org/SuperClaude_Framework) | config, personas, workflow | 为编码代理增强命令、角色与方法模板的配置框架。 |
+| Devika | [GitHub](https://github.com/stitionai/devika) | [](https://github.com/stitionai/devika) | assistant, planning, coding | 开源编码助手系统,支持任务规划与实现。 |
+| SWE-agent | [GitHub](https://github.com/SWE-agent/SWE-agent) | [](https://github.com/SWE-agent/SWE-agent) | swe, issue-fixing, tooling | 研究级编码代理,通过明确的工具回路自动修复 GitHub issue。 |
+| cmux | [GitHub](https://github.com/manaflow-ai/cmux) | [](https://github.com/manaflow-ai/cmux) | macos, workspace, browser | 面向 AI 编码代理的原生 macOS 终端与浏览器工作区,提供通知、分屏与可脚本化控制。 |
+| Aperant | [GitHub](https://github.com/AndyMik90/Aperant) | [](https://github.com/AndyMik90/Aperant) | coding-agent, parallel, memory | 自治多代理编码框架,提供并行执行、隔离工作区、质量校验回路与持久记忆。 |
+| Eigent | [GitHub](https://github.com/eigent-ai/eigent) | [](https://github.com/eigent-ai/eigent) | desktop, cowork, productivity | 开源桌面协作代理,可执行自治任务并提升开发生产力。 |
+| OpenHarness | [GitHub](https://github.com/HKUDS/OpenHarness) | [](https://github.com/HKUDS/OpenHarness) | tool-use, memory, multi-agent | 开放式 agent harness 实现,覆盖工具调用、技能、记忆、权限与多代理协作。 |
+| IronClaw | [GitHub](https://github.com/nearai/ironclaw) | [](https://github.com/nearai/ironclaw) | security, wasm, routines | 安全优先的个人 agent harness,集成 WASM 沙箱、例程调度、工具插件与持久记忆。 |
+| Superset | [GitHub](https://github.com/superset-sh/superset) | [](https://github.com/superset-sh/superset) | worktrees, desktop, parallel | 基于 worktree 的桌面编排器,可在统一工作区中并行运行并审阅多个 CLI 编码代理。 |
+| GitHub Copilot CLI | [GitHub](https://github.com/github/copilot-cli) | [](https://github.com/github/copilot-cli) | terminal, coding-agent, mcp | 官方终端编码代理,基于 GitHub Copilot harness,提供 MCP 扩展、审批控制与 GitHub 原生上下文。 |
+| Open SWE | [GitHub](https://github.com/langchain-ai/open-swe) | [](https://github.com/langchain-ai/open-swe) | async, coding-agent, swe | 面向软件问题流的异步开源编码代理。 |
+| Paseo | [GitHub](https://github.com/getpaseo/paseo) | [](https://github.com/getpaseo/paseo) | coding-agent, daemon, multi-device | 面向多设备的编码代理守护进程与客户端栈,用于编排本地代理、并行运行与跨模型工作流。 |
+| 1Code | [GitHub](https://github.com/21st-dev/1code) | [](https://github.com/21st-dev/1code) | coding-agent, orchestration, worktrees | 桌面优先的编码代理编排器,提供 worktree 隔离、后台沙箱、MCP 工具管理与自动化触发。 |
+| holaOS | [GitHub](https://github.com/holaboss-ai/holaOS) | [](https://github.com/holaboss-ai/holaOS) | long-horizon, desktop, durable-state | 面向长时任务的桌面优先 agent environment,整合运行时、记忆、工具、应用与持久状态。 |
+| OSAURUS | [GitHub](https://github.com/osaurus-ai/osaurus) | [](https://github.com/osaurus-ai/osaurus) | macos, local-first, memory | 面向 macOS 的本地自治编码代理 harness,支持持久记忆。 |
+| HiClaw | [GitHub](https://github.com/agentscope-ai/HiClaw) | [](https://github.com/agentscope-ai/HiClaw) | multi-agent, human-in-the-loop, shared-state | 协作式多代理操作系统,通过 Matrix 房间提供管理者-工作者协同、共享状态与人在回路监督。 |
+| mini-swe-agent | [GitHub](https://github.com/SWE-agent/mini-swe-agent) | [](https://github.com/SWE-agent/mini-swe-agent) | minimal, swe, coding-agent | 极简编码代理实现,同时具备较强基准表现。 |
+| oh-my-pi | [GitHub](https://github.com/can1357/oh-my-pi) | [](https://github.com/can1357/oh-my-pi) | terminal, lsp, subagents | 终端 AI 编码代理,包含编辑安全、LSP 集成与子代理支持。 |
+| TinyAGI | [GitHub](https://github.com/TinyAGI/tinyagi) | [](https://github.com/TinyAGI/tinyagi) | team-orchestration, autonomous, workflows | 面向“一人公司”场景的团队化代理编排器。 |
| Devon | [GitHub](https://github.com/entropy-research/Devon) | [](https://github.com/entropy-research/Devon) | pair-programming, coding-agent, autonomous | 开源结对编程代理,提供自治编码执行模式。 |
-| Open Claude Cowork | [GitHub](https://github.com/DevAgentForge/Open-Claude-Cowork) | [](https://github.com/DevAgentForge/Open-Claude-Cowork) | desktop, ui, orchestration | 桌面化协作编码助手,将代理编排能力图形化。 |
-| Amazon Bedrock AgentCore Samples | [GitHub](https://github.com/awslabs/agentcore-samples) | [](https://github.com/awslabs/agentcore-samples) | aws, runtime, operations | 官方示例套件,覆盖基于 Runtime、Gateway、Memory、可观测、评测与策略层的代理部署与运维。 |
-| mini-coding-agent | [GitHub](https://github.com/rasbt/mini-coding-agent) | [](https://github.com/rasbt/mini-coding-agent) | coding-agent, minimal, approvals | 极简编码 agent harness,实现了审批、记忆、受限委派与持久化转录等核心机制。 |
+| Open Claude Cowork | [GitHub](https://github.com/DevAgentForge/Open-Claude-Cowork) | [](https://github.com/DevAgentForge/Open-Claude-Cowork) | desktop, ui, orchestration | 桌面化协作编码助手,将代理编排能力图形化。 |
+| Amazon Bedrock AgentCore Samples | [GitHub](https://github.com/awslabs/agentcore-samples) | [](https://github.com/awslabs/agentcore-samples) | aws, runtime, operations | 官方示例套件,覆盖基于 Runtime、Gateway、Memory、可观测、评测与策略层的代理部署与运维。 |
+| mini-coding-agent | [GitHub](https://github.com/rasbt/mini-coding-agent) | [](https://github.com/rasbt/mini-coding-agent) | coding-agent, minimal, approvals | 极简编码 agent harness,实现了审批、记忆、受限委派与持久化转录等核心机制。 |
+| AgentPlane | [GitHub](https://github.com/basilisk-labs/agentplane) | [](https://github.com/basilisk-labs/agentplane) | coding-agent, git-native, workflow-control | 本地优先、Git 原生的编码代理 harness,将任务、计划、验证与收尾记录保存在仓库内。 |
### Essential Readings & Ecosystem Maps
| 项目 | 链接 | Stars | 标签 | 简介 |
| --- | --- | --- | --- | --- |
-| awesome-claude-code | [GitHub](https://github.com/hesreallyhim/awesome-claude-code) | [](https://github.com/hesreallyhim/awesome-claude-code) | awesome-list, claude-code, skills | Claude Code 技能、钩子与编排工具的社区清单。 |
-| awesome-agentic-patterns | [GitHub](https://github.com/nibzard/awesome-agentic-patterns) | [](https://github.com/nibzard/awesome-agentic-patterns) | awesome-list, patterns, design | 可复用的 agentic 设计模式与实现范式目录。 |
-| awesome-mcp-servers | [GitHub](https://github.com/wong2/awesome-mcp-servers) | [](https://github.com/wong2/awesome-mcp-servers) | awesome-list, mcp, tools | MCP server 精选索引,便于代理系统进行工具互操作。 |
-| awesome-harness-engineering | [GitHub](https://github.com/walkinglabs/awesome-harness-engineering) | [](https://github.com/walkinglabs/awesome-harness-engineering) | awesome-list, curation, harness | 聚焦 harness engineering 的精选清单,覆盖文章、基准与实现。 |
+| awesome-claude-code | [GitHub](https://github.com/hesreallyhim/awesome-claude-code) | [](https://github.com/hesreallyhim/awesome-claude-code) | awesome-list, claude-code, skills | Claude Code 技能、钩子与编排工具的社区清单。 |
+| awesome-agentic-patterns | [GitHub](https://github.com/nibzard/awesome-agentic-patterns) | [](https://github.com/nibzard/awesome-agentic-patterns) | awesome-list, patterns, design | 可复用的 agentic 设计模式与实现范式目录。 |
+| awesome-mcp-servers | [GitHub](https://github.com/wong2/awesome-mcp-servers) | [](https://github.com/wong2/awesome-mcp-servers) | awesome-list, mcp, tools | MCP server 精选索引,便于代理系统进行工具互操作。 |
+| awesome-harness-engineering | [GitHub](https://github.com/walkinglabs/awesome-harness-engineering) | [](https://github.com/walkinglabs/awesome-harness-engineering) | awesome-list, curation, harness | 聚焦 harness engineering 的精选清单,覆盖文章、基准与实现。 |
| 12 Factor Agents | [Reference](https://www.humanlayer.dev/blog/12-factor-agents) | - | reading, operations, principles | 面向生产代理可维护性的运维原则总结。 |
| Agent Frameworks, Runtimes, and Harnesses, oh my! | [Reference](https://blog.langchain.com/agent-frameworks-runtimes-and-harnesses-oh-my/) | - | reading, langchain, architecture | 清晰拆解 framework、runtime 与 harness 的职责边界。 |
| An open-source spec for Codex orchestration: Symphony. | [Reference](https://openai.com/index/open-source-codex-orchestration-symphony/) | - | reading, openai, orchestration | OpenAI 对编排层的实践拆解,介绍如何把 issue 跟踪器变成面向编码代理的常驻控制平面。 |
diff --git a/data/projects.yaml b/data/projects.yaml
index cea6367..8535880 100644
--- a/data/projects.yaml
+++ b/data/projects.yaml
@@ -1917,6 +1917,21 @@ entries:
updated_at: '2026-04-07'
license: Apache-2.0
why_included: Readable end-to-end reference for core coding-agent harness components.
+- name: AgentPlane
+ repo_url: https://github.com/basilisk-labs/agentplane
+ category: Reference Harness Implementations
+ summary_en: Local-first Git-native CLI harness for auditable coding-agent work with task, plan, verification, and finish
+ records.
+ summary_zh: 本地优先、Git 原生的编码代理 harness,将任务、计划、验证与收尾记录保存在仓库内。
+ tags:
+ - coding-agent
+ - git-native
+ - workflow-control
+ stars_snapshot: 46
+ updated_at: '2026-05-10'
+ license: MIT
+ why_included: Shows repo-local workflow control and verification records around Claude Code, Codex, Cursor, Aider, and similar
+ coding-agent workflows.
- name: awesome-claude-code
repo_url: https://github.com/hesreallyhim/awesome-claude-code
category: Essential Readings & Ecosystem Maps
diff --git a/reports/verification/2026-05-10.md b/reports/verification/2026-05-10.md
new file mode 100644
index 0000000..98153d7
--- /dev/null
+++ b/reports/verification/2026-05-10.md
@@ -0,0 +1,67 @@
+# Verification Report
+
+- Generated at: `2026-05-10T07:47:33.961613+00:00`
+- Total entries: `172`
+- GitHub entries: `147` (85.5%)
+- GitHub in project categories (excluding `Essential Readings & Ecosystem Maps`): `143/143` (100.0%)
+- Categories: `9`
+- URL checks: `173` total, `173` reachable, `0` broken
+
+## Category Counts
+
+| Category | Entries |
+| --- | ---: |
+| Harness Architecture & Orchestration | 21 |
+| Context & Working-State Engineering | 9 |
+| Execution Substrates & Sandboxing | 18 |
+| Protocols, Tool Interfaces & Agent Contracts | 11 |
+| Evaluation Harnesses & Benchmarks | 21 |
+| Observability & Reliability Operations | 14 |
+| Guardrails, Security & Governance | 12 |
+| Reference Harness Implementations | 37 |
+| Essential Readings & Ecosystem Maps | 29 |
+
+## Structural Errors
+
+- None
+
+## Warnings
+
+- None
+
+## Broken URLs
+
+- None
+
+## Reachable URL Sample
+
+- `HEAD 200` https://blog.langchain.com/agent-frameworks-runtimes-and-harnesses-oh-my/
+- `HEAD 200` https://blog.langchain.com/evaluating-deep-agents-our-learnings/
+- `HEAD 200` https://blog.langchain.com/improving-deep-agents-with-harness-engineering/
+- `HEAD 200` https://blog.langchain.com/the-anatomy-of-an-agent-harness/
+- `HEAD 200` https://claude.com/blog/building-agents-with-the-claude-agent-sdk
+- `HEAD 200` https://developers.openai.com/blog/eval-skills
+- `HEAD 200` https://github.com/21st-dev/1code
+- `HEAD 200` https://github.com/AgentOps-AI/agentops
+- `HEAD 200` https://github.com/Aider-AI/aider
+- `HEAD 200` https://github.com/AndyMik90/Aperant
+- `HEAD 200` https://github.com/Arize-ai/openinference
+- `HEAD 200` https://github.com/Arize-ai/phoenix
+- `HEAD 200` https://github.com/BerriAI/litellm
+- `HEAD 200` https://github.com/DevAgentForge/Open-Claude-Cowork
+- `HEAD 200` https://github.com/EleutherAI/lm-evaluation-harness
+- `HEAD 200` https://github.com/FoundationAgents/OpenManus
+- `HEAD 200` https://github.com/GoogleCloudPlatform/scion
+- `HEAD 200` https://github.com/HKUDS/CLI-Anything
+- `HEAD 200` https://github.com/HKUDS/OpenHarness
+- `HEAD 200` https://github.com/Helicone/helicone
+- `HEAD 200` https://github.com/IBM/mcp
+- `HEAD 200` https://github.com/IBM/mcp-context-forge
+- `HEAD 200` https://github.com/Kong/kong
+- `HEAD 200` https://github.com/Meirtz/Awesome-Context-Engineering
+- `HEAD 200` https://github.com/MicrosoftDocs/mcp
+- `HEAD 200` https://github.com/NVIDIA-NeMo/Gym
+- `HEAD 200` https://github.com/NVIDIA/NeMo-Agent-Toolkit
+- `HEAD 200` https://github.com/OpenHands/OpenHands
+- `HEAD 200` https://github.com/OpenHands/benchmarks
+- `HEAD 200` https://github.com/OthmanAdi/planning-with-files