Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ Gradata/scripts/*
!Gradata/scripts/publish-npm.sh
!Gradata/scripts/cloud/
!Gradata/scripts/migrate_legacy_scopes.py
!Gradata/scripts/weekly_correction_snapshot.py

# npm sub-package build outputs (source tracked, outputs ignored)
Gradata/packages/npm/node_modules/
Expand Down
36 changes: 36 additions & 0 deletions Gradata/docs/weekly-correction-snapshot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Weekly Correction Snapshot

`scripts/weekly_correction_snapshot.py` builds a deterministic JSON summary from newline-delimited JSON (NDJSON) events. This is intended for weekly correction-outcome trend reporting.

## Usage

From file:

```bash
python scripts/weekly_correction_snapshot.py --input /path/to/events.jsonl
```

From stdin:

```bash
cat /path/to/events.jsonl | python scripts/weekly_correction_snapshot.py
```

## Output schema

The script always emits one compact JSON object with stable key ordering:

- `total_corrections` (int): count of correction events (`event=correction.created` or `kind=correction`)
- `accepted_graduations` (int): count of accepted graduation outcomes
- `rejection_count` (int): count of rejected graduation outcomes
- `acceptance_rate` (float): `accepted_graduations / (accepted_graduations + rejection_count)`, rounded to 6 decimals, or `0.0` if denominator is zero
- `top_rule_categories` (list): up to 5 entries sorted by descending count, then category name
- `skipped_rows` (int): malformed or non-object rows ignored during parsing

`top_rule_categories` entries use:

```json
{"category":"tone","count":12}
```

Category normalization is lowercase + trimmed whitespace. Empty/missing categories normalize to `"unknown"`.
2 changes: 1 addition & 1 deletion Gradata/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ skips = [
# --- Pytest ---
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
pythonpath = ["src", "scripts"]
markers = [
"integration: tests that hit external LLM APIs (cost money, skip in CI)",
"dualwrite: dual-write crash recovery and reconciliation tests",
Expand Down
130 changes: 130 additions & 0 deletions Gradata/scripts/weekly_correction_snapshot.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
#!/usr/bin/env python3
"""Compute weekly correction/graduation aggregates from NDJSON events."""

from __future__ import annotations

import argparse
import json
import sys
from collections import Counter
from typing import Any


def _normalize_category(value: Any) -> str:
if value is None:
return "unknown"
normalized = str(value).strip().lower()
return normalized or "unknown"


def _is_correction(row: dict[str, Any]) -> bool:
event = str(row.get("event", "")).strip().lower()
kind = str(row.get("kind", "")).strip().lower()
return event == "correction.created" or kind == "correction"


def _is_graduation_accepted(row: dict[str, Any]) -> bool:
event = str(row.get("event", "")).strip().lower()
outcome = str(row.get("outcome", "")).strip().lower()
accepted_flag = row.get("accepted")
status = str(row.get("status", "")).strip().lower()
return (
event in {"lesson.graduated", "graduation.accepted"}
or outcome == "accepted"
or accepted_flag is True
or status in {"accepted", "graduated"}
)
Comment on lines +26 to +36
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Graduation metrics are currently over-counted by non-graduation rows.

Lines 31–36 and 44–49 treat generic outcome/status/accepted fields as graduation outcomes without verifying the row is graduation-related, so unrelated events can skew accepted_graduations/rejection_count.

Proposed fix
+def _is_graduation_row(row: dict[str, Any]) -> bool:
+    event = str(row.get("event", "")).strip().lower()
+    kind = str(row.get("kind", "")).strip().lower()
+    return "graduation" in event or event.startswith("lesson.") or kind == "graduation"
+
 def _is_graduation_accepted(row: dict[str, Any]) -> bool:
     event = str(row.get("event", "")).strip().lower()
     outcome = str(row.get("outcome", "")).strip().lower()
     accepted_flag = row.get("accepted")
     status = str(row.get("status", "")).strip().lower()
     return (
         event in {"lesson.graduated", "graduation.accepted"}
-        or outcome == "accepted"
-        or accepted_flag is True
-        or status in {"accepted", "graduated"}
+        or (_is_graduation_row(row) and outcome == "accepted")
+        or (_is_graduation_row(row) and accepted_flag is True)
+        or (_is_graduation_row(row) and status in {"accepted", "graduated"})
     )

 def _is_rejection(row: dict[str, Any]) -> bool:
     event = str(row.get("event", "")).strip().lower()
     outcome = str(row.get("outcome", "")).strip().lower()
     accepted_flag = row.get("accepted")
     status = str(row.get("status", "")).strip().lower()
     return (
         event in {"graduation.rejected", "lesson.rejected"}
-        or outcome == "rejected"
-        or accepted_flag is False
-        or status == "rejected"
+        or (_is_graduation_row(row) and outcome == "rejected")
+        or (_is_graduation_row(row) and accepted_flag is False)
+        or (_is_graduation_row(row) and status == "rejected")
     )

Also applies to: 39-49

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/scripts/weekly_correction_snapshot.py` around lines 26 - 36, The
function _is_graduation_accepted is treating generic outcome/status/accepted
fields as graduation results even when the event isn't graduation-related;
update it to first detect that the row is a graduation event (e.g., event
contains "graduation" or is in the graduation event set like
{"lesson.graduated","graduation.accepted","graduation.rejected"}) and only then
evaluate outcome/status/accepted for accepted logic. Make the same change for
the corresponding rejection logic (the similar block at lines referenced 39–49)
so outcome/status/accepted are only considered when the row is confirmed to be a
graduation event.



def _is_rejection(row: dict[str, Any]) -> bool:
event = str(row.get("event", "")).strip().lower()
outcome = str(row.get("outcome", "")).strip().lower()
accepted_flag = row.get("accepted")
status = str(row.get("status", "")).strip().lower()
return (
event in {"graduation.rejected", "lesson.rejected"}
or outcome == "rejected"
or accepted_flag is False
or status == "rejected"
)


def parse_rows(lines: list[str]) -> tuple[list[dict[str, Any]], int]:
rows: list[dict[str, Any]] = []
skipped = 0
for raw in lines:
line = raw.strip()
if not line:
continue
try:
row = json.loads(line)
except json.JSONDecodeError:
skipped += 1
continue
if not isinstance(row, dict):
skipped += 1
continue
rows.append(row)
return rows, skipped


def aggregate(rows: list[dict[str, Any]]) -> dict[str, Any]:
total_corrections = 0
accepted_graduations = 0
rejection_count = 0
categories: Counter[str] = Counter()

for row in rows:
if _is_correction(row):
total_corrections += 1
categories[_normalize_category(row.get("category"))] += 1
is_accepted = _is_graduation_accepted(row)
is_rejected = _is_rejection(row)
if is_accepted and not is_rejected:
accepted_graduations += 1
elif is_rejected and not is_accepted:
rejection_count += 1

denominator = accepted_graduations + rejection_count
acceptance_rate = round(accepted_graduations / denominator, 6) if denominator else 0.0

top_categories = [
{"category": name, "count": count}
for name, count in sorted(categories.items(), key=lambda item: (-item[1], item[0]))[:5]
]

return {
"total_corrections": total_corrections,
"accepted_graduations": accepted_graduations,
"rejection_count": rejection_count,
"acceptance_rate": acceptance_rate,
"top_rule_categories": top_categories,
}


def _read_lines(path: str | None) -> list[str]:
if path:
with open(path, encoding="utf-8") as handle:
return handle.readlines()
return sys.stdin.readlines()


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description="Compute correction-outcome aggregates for weekly trend snapshots."
)
parser.add_argument("--input", help="Path to newline-delimited JSON input file")
args = parser.parse_args(argv)

lines = _read_lines(args.input)
rows, skipped_rows = parse_rows(lines)
snapshot = aggregate(rows)
snapshot["skipped_rows"] = skipped_rows

json.dump(snapshot, sys.stdout, sort_keys=True, separators=(",", ":"))
sys.stdout.write("\n")
return 0


if __name__ == "__main__":
raise SystemExit(main())
50 changes: 47 additions & 3 deletions Gradata/src/gradata/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import contextlib
import logging
import math
import os
import re # used by export functions for slug sanitization
import statistics
from datetime import UTC
Expand All @@ -30,6 +31,11 @@
_STATE_RANK = {"INSTINCT": 0, "PATTERN": 1, "RULE": 2}
# Severity ordering for min_severity gating
_SEV_RANK = {"as-is": 0, "minor": 1, "moderate": 2, "major": 3, "discarded": 4}
_LOW_SIGNAL_EDIT_DISTANCE_FLOOR = 0.04
# FORMAT/DRAFTING synonym swaps carry minimal signal; require a larger edit
# before recording a lesson so we don't learn from synonym-level noise.
_FORMAT_DRAFTING_EDIT_DISTANCE_FLOOR = 0.07
_FORMAT_DRAFTING_CATEGORIES = frozenset({"FORMAT", "DRAFTING"})

# Map evaluator dimension names to correction categories
_DIMENSION_CATEGORY_MAP = {
Expand All @@ -54,6 +60,18 @@ def _filter_lessons_by_state(lessons, min_state: str = "PATTERN"):
]


def _is_meaningful_low_signal_change(draft: str, final: str, category: str) -> bool:
"""Allow known-meaningful tiny edits to pass the low-signal floor."""
cat = (category or "UNKNOWN").upper()
if cat in {"ACCURACY", "SECURITY"}:
return True
# Proper-noun/acronym capitalization fixes can carry meaning even when
# edit distance is tiny.
if draft != final and draft.lower() == final.lower():
return bool(re.search(r"\b[A-Z]{2,}\b|\b[A-Z][a-z]{2,}\b", final))
return False


# ── correct() ──────────────────────────────────────────────────────────


Expand Down Expand Up @@ -99,7 +117,7 @@ def brain_correct(
agent_type: str | None = None,
approval_required: bool = False,
dry_run: bool = False,
min_severity: str = "as-is",
min_severity: str = "minor",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Revert the default min_severity to avoid an unintentional behavior change.

Line 120 changes default gating to "minor", which suppresses "as-is" corrections for all existing callers and changes learning behavior by default.

Proposed fix
-    min_severity: str = "minor",
+    min_severity: str = "as-is",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
min_severity: str = "minor",
min_severity: str = "as-is",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/_core.py` at line 120, Revert the default value of the
min_severity parameter back to its prior setting to avoid changing default
gating behavior—restore min_severity from "minor" to the original "as-is" in the
function or class signature where min_severity is declared (look for the
min_severity parameter in _core.py) so existing callers continue to receive
"as-is" corrections by default.

scope: str | None = None,
applies_to: str | None = None,
auto_heal: bool = False,
Expand Down Expand Up @@ -353,8 +371,22 @@ def brain_correct(
update_confidence,
)

if not is_observation_dup and _SEV_RANK.get(diff.severity, 0) >= _SEV_RANK.get(
min_severity, 0
_cat_upper = (category or "UNKNOWN").upper()
_ed_floor = (
_FORMAT_DRAFTING_EDIT_DISTANCE_FLOOR
if _cat_upper in _FORMAT_DRAFTING_CATEGORIES
else _LOW_SIGNAL_EDIT_DISTANCE_FLOOR
)
low_signal_filtered = (
diff.severity in {"as-is", "minor"}
and diff.edit_distance < _ed_floor
and not _is_meaningful_low_signal_change(draft, final, category or "UNKNOWN")
)
event["low_signal_filtered"] = low_signal_filtered
if (
not is_observation_dup
and not low_signal_filtered
and _SEV_RANK.get(diff.severity, 0) >= _SEV_RANK.get(min_severity, 0)
):
lessons_path = brain._find_lessons_path(create=True)
if lessons_path:
Expand Down Expand Up @@ -1014,6 +1046,18 @@ def _lesson_key(lesson):
if all_lessons: # guard against wiping lessons file when all lessons are killed
write_lessons_safe(lessons_path, format_lessons(all_lessons))

# Auto-export AGENTS.md by default so post-graduation rules are
# available to AGENTS.md-aware tools without requiring a manual CLI step.
auto_export_agents = os.environ.get("GRADATA_AUTO_EXPORT_AGENTS", "1").strip().lower()
if auto_export_agents not in {"0", "false", "off", "no"}:
try:
from gradata.enhancements.rule_export import export_rules

agents_text = export_rules(brain.dir, target="agents", lessons_path=lessons_path)
(brain.dir / "AGENTS.md").write_text(agents_text, encoding="utf-8")
except Exception as e:
_log.debug("AGENTS.md auto-export skipped: %s", e)

# Archive graduated RULE lessons
new_rules = [
l
Expand Down
14 changes: 4 additions & 10 deletions Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@
graduation_thresholds,
is_hook_enforced,
)
from gradata.enhancements.self_improvement._graduation_flags import (
read_beta_lb_threshold,
)

_log = logging.getLogger(__name__)

Expand Down Expand Up @@ -105,7 +108,6 @@ def _read_beta_lb_config() -> tuple[bool, float, int]:
Called once per ``graduate()`` invocation so per-lesson gate checks can
skip repeated ``os.environ.get`` lookups inside the graduation loop.
"""
import math
import os

enabled = os.environ.get("GRADATA_BETA_LB_GATE", "1").lower() not in (
Expand All @@ -115,15 +117,7 @@ def _read_beta_lb_config() -> tuple[bool, float, int]:
"off",
)
defaults = graduation_thresholds()
try:
threshold = float(
os.environ.get("GRADATA_BETA_LB_THRESHOLD", str(defaults.beta_lb_threshold))
)
if not math.isfinite(threshold):
threshold = defaults.beta_lb_threshold
threshold = min(max(threshold, 0.0), 1.0)
except (TypeError, ValueError):
threshold = defaults.beta_lb_threshold
threshold = read_beta_lb_threshold(defaults.beta_lb_threshold)
try:
min_fires = max(
0,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""Experiment knobs for Beta-LB graduation gating.

These settings belong to GRA-210 and intentionally keep runtime behavior
backwards-compatible by default. Production default remains 0.75, while
`GRADATA_BETA_LB_THRESHOLD` can be set to `0.55` for the staged experiment.
"""

from __future__ import annotations

import math
import os

# GRA-210: graduation_threshold experiment parameter for Beta-LB lower-bound checks.
GRA_210_EXPERIMENT = "GRA-210"
GRA_210_GRADUATION_THRESHOLD_ENV = "GRADATA_BETA_LB_THRESHOLD"
GRA_210_GRADUATION_THRESHOLD_DEFAULT = 0.75


def read_beta_lb_threshold(default: float = GRA_210_GRADUATION_THRESHOLD_DEFAULT) -> float:
"""Read the Beta-LB threshold override from env.

Returns a float clipped to [0.0, 1.0], or ``default`` when parsing fails.
"""

raw_value = os.environ.get(GRA_210_GRADUATION_THRESHOLD_ENV)
if raw_value is None:
return default

try:
threshold = float(raw_value)
except (TypeError, ValueError):
return default

if not math.isfinite(threshold):
return default

return min(max(threshold, 0.0), 1.0)
Comment on lines +19 to +37
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Normalize default before returning to honor the function contract.

When env parsing fails (or env is unset), an out-of-range default can currently bypass clipping even though the docstring promises a [0.0, 1.0] result.

Proposed patch
 def read_beta_lb_threshold(default: float = GRA_210_GRADUATION_THRESHOLD_DEFAULT) -> float:
@@
-    raw_value = os.environ.get(GRA_210_GRADUATION_THRESHOLD_ENV)
+    if not math.isfinite(default):
+        default = GRA_210_GRADUATION_THRESHOLD_DEFAULT
+    default = min(max(default, 0.0), 1.0)
+
+    raw_value = os.environ.get(GRA_210_GRADUATION_THRESHOLD_ENV)
     if raw_value is None:
         return default
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_graduation_flags.py`
around lines 19 - 37, The function read_beta_lb_threshold can return an
out-of-range default without clipping; update it so every return path returns a
normalized value in [0.0, 1.0] by clipping the default before returning when
raw_value is None, parsing fails, or threshold is non-finite. Modify
read_beta_lb_threshold to compute a clipped_default = min(max(default, 0.0),
1.0) (or reuse the same min/max logic used for threshold) and return
clipped_default instead of default in the branches that currently return
default; keep the same behavior for successfully parsed finite thresholds.

Loading
Loading