Avoid single-failure active skill patches

wxj630 · wxj630 · commit 80b45938d4c7 · 2026-06-02T19:12:40.000+08:00
diff --git a/README.md b/README.md
@@ -113,6 +113,7 @@ Both backends feed the same Skill ranking, posterior audit rendering, and rewrit
 - **Evidence-weighted Skill evolution**: update Skill beliefs from verified success and failure trajectories.
 - **Bayesian Skill registry**: maintain Bayesian Evidence Model beliefs, optional Beta-Bernoulli posteriors, failure modes, token cost, latency, turns, and context distribution.
 - **Failure-mode-aware repair**: identify recurring errors and generate focused repair plans.
+- **Overfitting-resistant patch activation**: keep single failures as audit evidence, and promote a failure-mode patch into the benchmark prompt only after at least two verified occurrences.
 - **Token-aware context building**: select concise, evidence-backed Skill/SOP text; benchmark prompts receive executable patches and guardrails, while posterior numbers stay in artifacts.
 - **Full self-evolution from scratch**: run all tasks, collect evidence online, and evolve Skills without prior traces.
 - **Incremental repair for existing agents**: consume failed trajectories from a baseline agent and rerun only the failed tasks.
@@ -262,7 +263,7 @@ skill_context = SkillContextBuilder(registry).render(task_context="sop_bench")
 print(skill_context)
 ```
 
-`SkillContextBuilder` renders a compact posterior audit view. The built-in SOP/Lifelong runners convert posterior decisions into executable failure-mode patches and guardrails before adding them to model prompts.
+`SkillContextBuilder` renders a compact posterior audit view. The built-in SOP/Lifelong runners convert recurring posterior-backed failure modes into executable patches and guardrails before adding them to model prompts.
 
 ## 🔁 Three Operating Patterns
 
diff --git a/README_ZH.md b/README_ZH.md
@@ -113,6 +113,7 @@ E[p_k | D_k] = (alpha_0 + s_k) / (alpha_0 + beta_0 + s_k + f_k)
 - **证据加权的 Skill 进化**：从 verified success/failure trajectory 更新 Skill belief。
 - **Bayesian Skill Registry**：维护 Bayesian Evidence Model belief、可选 Beta-Bernoulli posterior、失败模式、token 成本、延迟、轮次和 context 分布。
 - **面向失败模式的修复**：识别反复出现的错误，生成聚焦的 repair plan。
+- **抗过拟合的 patch 激活**：单次失败只作为审计证据保存；同一 failure mode 至少出现两次验证失败后，才把 patch 提升到 benchmark prompt。
 - **Token-aware context 构建**：选择简洁、有证据支持的 Skill/SOP 文本；benchmark prompt 接收可执行 patches 和 guardrails，posterior 数字保存在 artifacts 中。
 - **从零全量自进化**：完整运行任务，在线收集 evidence，并在无历史 traces 的情况下进化 Skills。
 - **已有 Agent 的增量修复层**：读取 baseline agent 的失败轨迹，只重跑失败任务。
@@ -262,7 +263,7 @@ skill_context = SkillContextBuilder(registry).render(task_context="sop_bench")
 print(skill_context)
 ```
 
-`SkillContextBuilder` 渲染的是简洁的 posterior 审计视图。内置 SOP/Lifelong runners 会先把 posterior 决策转成可执行的 failure-mode patches 和 guardrails，再加入模型 prompt。
+`SkillContextBuilder` 渲染的是简洁的 posterior 审计视图。内置 SOP/Lifelong runners 会先把反复出现、posterior 有证据支持的 failure mode 转成可执行 patches 和 guardrails，再加入模型 prompt。
 
 ## 🔁 三种运行形态
 
diff --git a/bayesian_agent/benchmarks/evolution.py b/bayesian_agent/benchmarks/evolution.py
@@ -12,6 +12,9 @@
 from bayesian_agent.core.registry import BayesianSkillRegistry
 
 
+ACTIVE_PATCH_MIN_SUPPORT = 2
+
+
 def classify_failure(benchmark: str, run: Mapping[str, Any]) -> str:
     """Classify common benchmark failures into reusable evidence labels."""
 
@@ -161,7 +164,7 @@ def _failure_mode_patch_rules(benchmark: str, registry: BayesianSkillRegistry):
         if belief.skill_id != f"benchmark/{benchmark}" and benchmark not in belief.contexts:
             continue
         for failure_mode, count in belief.failure_modes.items():
-            if count > 0:
+            if count >= ACTIVE_PATCH_MIN_SUPPORT:
                 counts[failure_mode] = counts.get(failure_mode, 0) + int(count)
 
     patches = []
diff --git a/docs/articles/bayesian-evidence-acquired-learning.md b/docs/articles/bayesian-evidence-acquired-learning.md
@@ -716,7 +716,7 @@ rewrite = patch
 reason = failures cluster around left_expected_output_blank
 ```
 
-改写后的 Skill context 不是泛泛地说“仔细一点”，而是把失败模式变成可执行约束。当前 v0.x 实现会在下一轮 prompt 里注入类似这样的 patch section：
+改写后的 Skill context 不是泛泛地说“仔细一点”，而是把反复出现的失败模式变成可执行约束。当前 v0.x 实现会先把单次失败作为 candidate evidence 保存在 audit artifact 中；同一 failure mode 至少出现两次后，才会在下一轮 prompt 里注入类似这样的 active patch section：
 
 ```text
 ### Bayesian Failure-Mode Patches: sop_bench
diff --git a/docs/articles/complex-bayesian-rewrite-example.md b/docs/articles/complex-bayesian-rewrite-example.md
@@ -434,7 +434,7 @@ P_h(failure | x_risk) ≈ 0.997
 2. left_expected_output_blank 这类失败簇仍然需要被 guardrail 约束。
 ```
 
-所以当前 v0.x 里，即使后续 repair 成功，`failure_modes` 计数仍然会留在 registry 中。只要这个 recurring failure mode 还在，context 里就会继续保留相关 patch。这是保守的，但对 benchmark repair 和生产环境都更安全。
+所以当前 v0.x 里，即使后续 repair 成功，`failure_modes` 计数仍然会留在 registry 中。第一次出现的 failure mode 只作为 candidate evidence 保存在 audit artifact 中；同一 failure mode 至少出现两次后，context 里才会保留相关 active patch。这比“一错就改 skill”更稳，也能降低单个异常样本导致过拟合的风险。
 
 ## 十、这个例子说明了什么
 
@@ -457,7 +457,7 @@ rewrite 触发:
   当前 RewritePolicy 看到同一 failure mode 出现 2 次，触发 patch
 
 context 改写:
-  benchmark-specific patch rules 被注入下一轮 prompt
+  benchmark-specific patch rules 被注入下一轮 prompt；单次失败只进入 audit，不进入 active prompt patch
 
 repair 成功:
   成功 evidence 回写 registry，健康轨迹的 posterior_success 上升
diff --git a/docs/articles/zhihu-bayesian-agent.md b/docs/articles/zhihu-bayesian-agent.md
@@ -32,7 +32,7 @@ P(success | theta, C, skill)
 - `C` 是推理环境，包括 prompt、context、tools、memory、harness feedback
 - `skill` 是可复用的任务流程或 SOP
 
-每次 Agent 执行任务后，Bayesian-Agent 会读取经过验证的 trajectory evidence，更新 Skill 的 posterior belief，并在下一次运行时生成由 posterior 驱动的 Skill patches、guardrails 或压缩后的 SOP 文本。原始 posterior 数字保存在 artifact 中用于审计，而不是默认直接塞进 benchmark prompt。
+每次 Agent 执行任务后，Bayesian-Agent 会读取经过验证的 trajectory evidence，更新 Skill 的 posterior belief，并在下一次运行时生成由 posterior 驱动的 Skill patches、guardrails 或压缩后的 SOP 文本。原始 posterior 数字保存在 artifact 中用于审计，而不是默认直接塞进 benchmark prompt。为了避免过拟合，单次失败只作为 candidate evidence；同一 failure mode 至少出现两次后，才会激活进入 benchmark prompt 的 patch。
 
 换句话说，它不是“把失败经历都塞进记忆里”，而是问：
 
diff --git a/docs/core-concepts.md b/docs/core-concepts.md
@@ -108,4 +108,4 @@ The default policy maps posterior state to small, inspectable actions:
 
 These actions are recommendations. External harnesses decide how to rewrite, rerun, or retire Skills.
 
-The bundled SOP-Bench and Lifelong runners implement one concrete `patch` behavior: known failure modes are converted into short failure-mode-specific guardrails in the next prompt. This keeps the current v0.x implementation honest: it patches the inference context for the same Skill belief, rather than silently creating a separate child Skill hypothesis.
+The bundled SOP-Bench and Lifelong runners implement one concrete `patch` behavior: recurring known failure modes are converted into short failure-mode-specific guardrails in the next prompt. A single failure is recorded in `belief_*.json` and `posterior_context_*.md` as candidate evidence, but it is not promoted into model-facing patch text until the same failure mode has at least two verified occurrences. This keeps the current v0.x implementation honest: it patches the inference context for the same Skill belief, rather than silently creating a separate child Skill hypothesis.
diff --git a/docs/experiments.md b/docs/experiments.md
@@ -45,7 +45,7 @@ Bayesian modes now persist per-task Skill evolution artifacts under:
       snapshot_after.json
 ```
 
-`skill_context_before.md` is the exact model-facing Skill/SOP text injected into that task. For the built-in benchmarks, it contains executable `Bayesian Failure-Mode Patches` plus stable benchmark guardrails, not raw posterior numbers. `skill_context_after.md` is the next model-facing Skill/SOP text after verifier feedback is recorded.
+`skill_context_before.md` is the exact model-facing Skill/SOP text injected into that task. For the built-in benchmarks, it contains stable benchmark guardrails and any active `Bayesian Failure-Mode Patches`. A patch becomes active only after the same failure mode has at least two verified occurrences, so single failures stay audit-only. `skill_context_after.md` is the next model-facing Skill/SOP text after verifier feedback is recorded.
 
 `posterior_context_before.md` and `posterior_context_after.md` are audit artifacts for the Bayesian belief state. They may include posterior summaries such as `posterior_success`, `alpha`, `beta`, observations, and rewrite decisions, but those numeric summaries are not injected into the benchmark prompt.
 
diff --git a/docs/experiments/index.md b/docs/experiments/index.md
@@ -38,7 +38,7 @@ Bayesian runs also write a per-task Skill evolution trail:
     snapshot_after.json
 ```
 
-The `before` Skill context is the exact model-facing Skill/SOP text injected into the model for that task. For the built-in benchmarks, it contains executable `Bayesian Failure-Mode Patches` plus stable benchmark guardrails, not raw posterior numbers.
+The `before` Skill context is the exact model-facing Skill/SOP text injected into the model for that task. For the built-in benchmarks, it contains stable benchmark guardrails and any active `Bayesian Failure-Mode Patches`, not raw posterior numbers. A patch becomes active only after the same failure mode has at least two verified occurrences.
 
 The `after` Skill context is rendered after the verifier result is recorded, so it represents the next model-facing Skill version produced by the Bayesian update. The paired `posterior_context_before.md` and `posterior_context_after.md` files keep the posterior summaries for audit/debugging.
 
diff --git a/docs/method.md b/docs/method.md
@@ -81,7 +81,7 @@ The default policy maps posterior state to actions:
 
 The policy is intentionally small in v0.4. It is designed to be replaced by project-specific policies.
 
-For the built-in SOP-Bench and Lifelong AgentBench runners, `patch` is not only a label in the posterior audit context. Observed benchmark failure modes are mapped to concrete patch rules and injected into the next prompt under `Bayesian Failure-Mode Patches`. The prompt does not include raw posterior numbers such as `posterior_success`, `alpha`, or `beta`; those stay in `belief_*.json` and `posterior_context_*.md` artifacts. For example, `left_expected_output_blank` adds a CSV writeback verification rule, and `invented_unrequested_column` adds SQL column-use constraints. v0.x records post-patch evidence back to the same benchmark Skill; later releases may split recurring patches into separate child Skill hypotheses.
+For the built-in SOP-Bench and Lifelong AgentBench runners, `patch` is not only a label in the posterior audit context. Observed benchmark failure modes are mapped to concrete patch rules, but they are injected into the next prompt under `Bayesian Failure-Mode Patches` only after the same failure mode has at least two verified occurrences. A single failure remains candidate evidence in `belief_*.json` and `posterior_context_*.md`, which reduces overfitting to one-off mistakes. The prompt does not include raw posterior numbers such as `posterior_success`, `alpha`, or `beta`. For example, repeated `left_expected_output_blank` failures add a CSV writeback verification rule, and repeated `invented_unrequested_column` failures add SQL column-use constraints. v0.x records post-patch evidence back to the same benchmark Skill; later releases may split recurring patches into separate child Skill hypotheses.
 
 ## Full Mode
 
diff --git a/tests/test_benchmark_evolution.py b/tests/test_benchmark_evolution.py
@@ -15,7 +15,7 @@
 
 
 class BenchmarkEvolutionTests(unittest.TestCase):
-    def test_sop_failure_context_is_owned_by_bayesian_agent(self):
+    def test_single_failure_mode_is_audit_only_not_active_patch(self):
         registry = BayesianSkillRegistry.in_memory()
         registry.record(
             TrajectoryEvidence(
@@ -30,10 +30,11 @@ def test_sop_failure_context_is_owned_by_bayesian_agent(self):
 
         context = build_benchmark_skill_context("sop_bench", registry)
 
-        self.assertIn("Bayesian Failure-Mode Patches", context)
         self.assertIn("Benchmark SOP Guardrails", context)
         self.assertIn("rows[row_index - 1]", context)
         self.assertIn("raw category string", context)
+        self.assertNotIn("Bayesian Failure-Mode Patches", context)
+        self.assertNotIn("failure_mode=left_expected_output_blank", context)
         self.assertNotIn("Bayesian Skill Context", context)
         self.assertNotIn("Bayesian Posterior Audit", context)
         self.assertNotIn("posterior_success=", context)

Original file line number	Diff line number	Diff line change
`@@ -108,4 +108,4 @@ The default policy maps posterior state to small, inspectable actions:`
`108`	`108`
`109`	`109`	`These actions are recommendations. External harnesses decide how to rewrite, rerun, or retire Skills.`
`110`	`110`
`111`		-The bundled SOP-Bench and Lifelong runners implement one concrete `patch` behavior: known failure modes are converted into short failure-mode-specific guardrails in the next prompt. This keeps the current v0.x implementation honest: it patches the inference context for the same Skill belief, rather than silently creating a separate child Skill hypothesis.
	`111`	+The bundled SOP-Bench and Lifelong runners implement one concrete `patch` behavior: recurring known failure modes are converted into short failure-mode-specific guardrails in the next prompt. A single failure is recorded in `belief_.json` and `posterior_context_.md` as candidate evidence, but it is not promoted into model-facing patch text until the same failure mode has at least two verified occurrences. This keeps the current v0.x implementation honest: it patches the inference context for the same Skill belief, rather than silently creating a separate child Skill hypothesis.