Skip to content

Commit f3cd625

Browse files
committed
update:readme
1 parent b753ed0 commit f3cd625

2 files changed

Lines changed: 31 additions & 90 deletions

File tree

README.md

Lines changed: 16 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@
1111
📄 arXiv Coming Soon
1212
</p>
1313

14-
Bayesian-Agent is a Bayesian self-evolving layer for turning agent failures into reusable, evidence-weighted Skills and SOPs across agent frameworks and execution harnesses.
14+
Bayesian-Agent is a Bayesian self-evolving layer for turning verified agent trajectories into reusable, evidence-weighted Skills and SOPs across agent frameworks and execution harnesses.
1515

16-
It is designed to stand out from monolithic agent frameworks in three ways:
16+
It supports three usage patterns:
1717

1818
- **Run from scratch**: start with no prior traces and evolve Skills during full benchmark or production runs.
1919
- **Repair incrementally**: attach to an existing agent, read its failed trajectories, and rerun only the tasks that need repair.
@@ -26,29 +26,23 @@ It is designed to stand out from monolithic agent frameworks in three ways:
2626
- **2026-06-05:** Added full-sample native-harness results for SOP-Bench, Lifelong AgentBench, and RealFin-Bench with `deepseek-v4-flash` and `deepseek-v4-pro`; see [Experimental Results](#-experimental-results).
2727
- **2026-06-05:** Added the first-party Bayesian-Agent native harness. It runs its own LLM loop, workspace tools, three-layer memory, and trajectory capture; GenericAgent, mini-swe-agent, and Claude Code remain optional compatibility backends. See the [Native Harness design note](docs/native-harness.md).
2828
- **2026-05-31:** Added the Bayesian Evidence Model as the default Skill belief backend, with a categorical likelihood implementation and a legacy Beta-Bernoulli backend for ablations.
29-
- **2026-05-09:** Released Bayesian-Agent v0.4 as a standalone cross-harness Bayesian Skill Evolution package with schemas, CLI utilities, and experiment artifacts.
29+
- **2026-05-09:** Released Bayesian-Agent as a standalone cross-harness Bayesian Skill Evolution package with schemas, CLI utilities, and experiment artifacts.
3030
- **2026-05-09:** Added the optional GenericAgent adapter boundary without copying or vendoring GenericAgent.
3131
- **2026-05-09:** Published bilingual project documentation and the Bayesian-Agent framework diagram.
3232

3333
## 🌟 Overview
3434

35-
Agent engineering is moving through three layers:
35+
Prompt Engineering improves task instructions. Context Engineering controls what evidence the model sees at inference time. Harness Engineering puts the model inside an observable, executable, recoverable system with tools, files, tests, memory, logs, and failure recovery.
3636

37-
1. **Prompt Engineering**: write better task instructions.
38-
2. **Context Engineering**: decide what evidence the model can see at inference time.
39-
3. **Harness Engineering**: put the model inside an observable, executable, recoverable system.
40-
41-
Prompting can improve one answer. Context can improve one decision. Harness Engineering is what lets an agent work across tools, files, tests, memory, logs, and failure recovery.
42-
43-
In that setting, **Skills** and **SOPs** become first-class engineering assets. A good Skill is not just a longer prompt. It is compressed operational knowledge:
37+
In this setting, **Skills** and **SOPs** become first-class engineering assets. A good Skill is compressed operational knowledge:
4438

4539
- what to inspect first
4640
- which tools to use
4741
- how to verify progress
4842
- which failure modes to avoid
4943
- when to stop, retry, or rewrite the procedure
5044

51-
Bayesian-Agent asks a simple question: if Skills are hypotheses about how to solve tasks, why should they evolve by anecdote instead of evidence? The answer is a framework-agnostic evolution layer that can bootstrap Skills from scratch, repair existing agents incrementally, and move across harnesses as long as they emit verified trajectories.
45+
Bayesian-Agent treats Skills as hypotheses about how to solve tasks. Verified trajectories become evidence for updating, ranking, rewriting, compressing, or retiring those Skills. The same evolution layer can bootstrap Skills from scratch, repair existing agents incrementally, and move across harnesses that emit compatible trajectories.
5246

5347
<div align="center">
5448
<img src="assets/bayesian_agent_overview.png" width="900" alt="Bayesian-Agent overview"/>
@@ -93,7 +87,7 @@ The surface behavior of Bayesian-Agent may look like failure-driven Skill repair
9387

9488
Agent runs are expensive: tokens are expensive, latency is high, benchmark cases are limited, and real production failures are even rarer. When samples are scarce, each sample is costly, and we cannot wait for large-sample statistics to stabilize, Bayesian modeling lets Bayesian-Agent combine prior belief, uncertainty, and new verified evidence into more stable decisions.
9589

96-
This is why Bayesian-Agent is especially useful for sample-scarce, cost-sensitive, online Skill/SOP evolution. Read the full explanation in [Why Bayesian for Skill Evolution](docs/articles/why-bayesian-for-skill-evolution.md).
90+
Bayesian-Agent is most useful for sample-scarce, cost-sensitive, online Skill/SOP evolution. Read the full explanation in [Why Bayesian for Skill Evolution](docs/articles/why-bayesian-for-skill-evolution.md).
9791

9892
### What "Bayesian" Means in v0.5
9993

@@ -299,34 +293,6 @@ print(skill_context)
299293

300294
`SkillContextBuilder` renders a compact posterior audit view. The built-in SOP/Lifelong runners convert recurring posterior-backed failure modes into executable patches and guardrails before adding them to model prompts.
301295

302-
## 🔁 Three Operating Patterns
303-
304-
### 🌱 Full Self-Evolving Mode
305-
306-
Bayesian-Agent starts from scratch, runs benchmark tasks, collects verified evidence, and evolves Skills during the run.
307-
308-
This mode tests whether Bayesian Skill Evolution can improve an agent without relying on prior traces.
309-
310-
### 🛠️ Incremental Repair Mode
311-
312-
Bayesian-Agent can also attach to an existing agent. The base agent runs first. Bayesian-Agent reads its success and failure traces, updates posterior Skill beliefs, then reruns only the failed tasks.
313-
314-
```text
315-
Base Agent -> Failure Traces -> Bayesian Skill Evolution -> Rerun Failures -> Higher Accuracy
316-
```
317-
318-
This is the recommended production path because it improves an existing agent without retraining the model or replacing the original harness.
319-
320-
### 🔌 Cross-Harness Adaptation Mode
321-
322-
Bayesian-Agent is not tied to a single agent runtime. Any agent framework can become a backend if it emits the common trajectory schema and accepts model-facing Skill/SOP text through an adapter.
323-
324-
```text
325-
Any Agent Harness -> Trajectory Schema -> Bayesian Skill Registry -> Adapter -> Next Harness Run
326-
```
327-
328-
This makes Bayesian-Agent a portable Skill/SOP evolution layer rather than another closed agent framework.
329-
330296
## 📊 Experimental Results
331297

332298
Bayesian-Agent now has its own native harness. The results below are full-sample runs with no `--limit`: SOP-Bench and Lifelong AgentBench use 20 tasks each, and RealFin-Bench uses 40 tasks.
@@ -357,7 +323,7 @@ Bayesian-Agent now has its own native harness. The results below are full-sample
357323
| deepseek-v4-pro | bayesian_full | 28/40 (70.0%) | 9.91M | `results/native_harness_deepseek_v4_pro_full/realfin_retry` |
358324
| deepseek-v4-pro | bayesian_incremental | 31/40 final, 5/14 repaired | 4.59M incremental | `results/native_harness_deepseek_v4_pro_full/realfin_retry` |
359325

360-
Compared with the earlier GA-backed artifacts, BA native improves the full RealFin final score on `deepseek-v4-pro` from 68% to 77.5%, but it spends more tokens because the first-party harness deliberately keeps the runtime minimal and lets the model inspect cached market data directly. On SOP/Lifelong, BA native reaches 95-100% full-sample accuracy while using less token budget than the historical GA-backed full runs.
326+
Compared with earlier GA-backed artifacts, BA native improves the full RealFin final score on `deepseek-v4-pro` from 68% to 77.5%. It spends more tokens because the first-party harness keeps the runtime minimal and lets the model inspect cached market data directly. On SOP/Lifelong, BA native reaches 95-100% full-sample accuracy with lower token use than the historical GA-backed full runs.
361327

362328
### 🧱 Published GA Validation: GenericAgent + deepseek-v4-flash
363329

@@ -399,7 +365,7 @@ The earlier RealFin validation used GenericAgent as the execution backend with `
399365
| RealFin-Bench | GA+Bayesian | deepseek-v4-pro | 65% | 3.70M | `results/realfin_deepseek_v4_pro_20260602` |
400366
| RealFin-Bench | GA+BayesianIncremental | deepseek-v4-pro | 68% | 1.72M incremental | `results/realfin_deepseek_v4_pro_20260602` |
401367

402-
The result shows that Bayesian-Agent can work as a plug-in repair layer: it can take an existing agent below 100% accuracy and improve it with a small amount of incremental inference. This is the practical advantage over one-off benchmark agents: Bayesian-Agent can sit beside a harness, learn from its failures, and improve it without replacing it.
368+
In incremental mode, Bayesian-Agent uses an existing agent's failed trajectories, updates Skill beliefs, and reruns only the failed tasks. The repair-only token columns report the additional inference cost.
403369

404370
Experiment artifacts are stored under [`artifacts/`](artifacts/) and [`results/`](results/), and the method note is in [`docs/method.md`](docs/method.md). The native harness design note is in [`docs/native-harness.md`](docs/native-harness.md).
405371

@@ -420,7 +386,7 @@ The script runs three phases by default: selected-harness baseline, Bayesian ful
420386

421387
## 🔌 Native Harness and Cross-Harness Adaptation
422388

423-
The first prototype was validated inside GenericAgent, but Bayesian-Agent now has its own execution harness. It is not a GenericAgent fork and not just a GenericAgent add-on.
389+
Bayesian-Agent ships a native harness plus adapter boundaries for external agent runtimes. The first prototype was validated inside GenericAgent; v0.5 keeps GenericAgent as an optional compatibility backend.
424390

425391
The open-source structure is:
426392

@@ -473,6 +439,12 @@ tests/ # Standard-library unittest suite
473439
- [ ] Add adapters for more agent harnesses after the current boundaries stabilize.
474440
- [ ] Move beyond the current per-Skill evidence backend toward richer Bayesian reasoning, including Skill hypothesis inference, Bayesian Networks for context/failure structure, uncertainty-aware Skill selection, Bayesian decision policies, and online adaptation.
475441

442+
## 📝 Articles
443+
444+
- [Bayesian Agent (1): 让 Harness 与 Skills 的 Self-Evolving 过程不再黑盒、没有方向,走向贝叶斯信念](https://zhuanlan.zhihu.com/p/2036275199008565089)
445+
- [Bayesian-Agent (2): 不仅是又一个 agent framework,而是可跨 harness 的 Bayesian Evolution Layer](https://zhuanlan.zhihu.com/p/2036315473344714645)
446+
- [Bayesian-Agent (3): 从三门问题到 Bayesian-Agent:Evidence Model、后天学习与 Skill 自进化](https://zhuanlan.zhihu.com/p/2044881314734768900?share_code=10y84pyoZtQ15&utm_psn=2045452102202307791)
447+
- [Bayesian-Agent (4): skill evolving 为什么需要贝叶斯,我直接进化不就行了吗?](https://zhuanlan.zhihu.com/p/2046259943565686690)
476448

477449
## 📈 Star History
478450

0 commit comments

Comments
 (0)