Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ Gradata/scripts/*
!Gradata/scripts/publish-npm.sh
!Gradata/scripts/cloud/
!Gradata/scripts/migrate_legacy_scopes.py
!Gradata/scripts/weekly_correction_snapshot.py

# npm sub-package build outputs (source tracked, outputs ignored)
Gradata/packages/npm/node_modules/
Expand Down
36 changes: 36 additions & 0 deletions Gradata/docs/weekly-correction-snapshot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Weekly Correction Snapshot

`scripts/weekly_correction_snapshot.py` builds a deterministic JSON summary from newline-delimited JSON (NDJSON) events. This is intended for weekly correction-outcome trend reporting.

## Usage

From file:

```bash
python scripts/weekly_correction_snapshot.py --input /path/to/events.jsonl
```

From stdin:

```bash
cat /path/to/events.jsonl | python scripts/weekly_correction_snapshot.py
```

## Output schema

The script always emits one compact JSON object with stable key ordering:

- `total_corrections` (int): count of correction events (`event=correction.created` or `kind=correction`)
- `accepted_graduations` (int): count of accepted graduation outcomes
- `rejection_count` (int): count of rejected graduation outcomes
- `acceptance_rate` (float): `accepted_graduations / (accepted_graduations + rejection_count)`, rounded to 6 decimals, or `0.0` if denominator is zero
- `top_rule_categories` (list): up to 5 entries sorted by descending count, then category name
- `skipped_rows` (int): malformed or non-object rows ignored during parsing

`top_rule_categories` entries use:

```json
{"category":"tone","count":12}
```

Category normalization is lowercase + trimmed whitespace. Empty/missing categories normalize to `"unknown"`.
2 changes: 1 addition & 1 deletion Gradata/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ skips = [
# --- Pytest ---
[tool.pytest.ini_options]
testpaths = ["tests"]
pythonpath = ["src"]
pythonpath = ["src", "scripts"]
markers = [
"integration: tests that hit external LLM APIs (cost money, skip in CI)",
"dualwrite: dual-write crash recovery and reconciliation tests",
Expand Down
130 changes: 130 additions & 0 deletions Gradata/scripts/weekly_correction_snapshot.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
#!/usr/bin/env python3
"""Compute weekly correction/graduation aggregates from NDJSON events."""

from __future__ import annotations

import argparse
import json
import sys
from collections import Counter
from typing import Any


def _normalize_category(value: Any) -> str:
if value is None:
return "unknown"
normalized = str(value).strip().lower()
return normalized or "unknown"


def _is_correction(row: dict[str, Any]) -> bool:
event = str(row.get("event", "")).strip().lower()
kind = str(row.get("kind", "")).strip().lower()
return event == "correction.created" or kind == "correction"


def _is_graduation_accepted(row: dict[str, Any]) -> bool:
event = str(row.get("event", "")).strip().lower()
outcome = str(row.get("outcome", "")).strip().lower()
accepted_flag = row.get("accepted")
status = str(row.get("status", "")).strip().lower()
return (
event in {"lesson.graduated", "graduation.accepted"}
or outcome == "accepted"
or accepted_flag is True
or status in {"accepted", "graduated"}
)
Comment on lines +26 to +36
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Graduation metrics are currently over-counted by non-graduation rows.

Lines 31–36 and 44–49 treat generic outcome/status/accepted fields as graduation outcomes without verifying the row is graduation-related, so unrelated events can skew accepted_graduations/rejection_count.

Proposed fix
+def _is_graduation_row(row: dict[str, Any]) -> bool:
+    event = str(row.get("event", "")).strip().lower()
+    kind = str(row.get("kind", "")).strip().lower()
+    return "graduation" in event or event.startswith("lesson.") or kind == "graduation"
+
 def _is_graduation_accepted(row: dict[str, Any]) -> bool:
     event = str(row.get("event", "")).strip().lower()
     outcome = str(row.get("outcome", "")).strip().lower()
     accepted_flag = row.get("accepted")
     status = str(row.get("status", "")).strip().lower()
     return (
         event in {"lesson.graduated", "graduation.accepted"}
-        or outcome == "accepted"
-        or accepted_flag is True
-        or status in {"accepted", "graduated"}
+        or (_is_graduation_row(row) and outcome == "accepted")
+        or (_is_graduation_row(row) and accepted_flag is True)
+        or (_is_graduation_row(row) and status in {"accepted", "graduated"})
     )

 def _is_rejection(row: dict[str, Any]) -> bool:
     event = str(row.get("event", "")).strip().lower()
     outcome = str(row.get("outcome", "")).strip().lower()
     accepted_flag = row.get("accepted")
     status = str(row.get("status", "")).strip().lower()
     return (
         event in {"graduation.rejected", "lesson.rejected"}
-        or outcome == "rejected"
-        or accepted_flag is False
-        or status == "rejected"
+        or (_is_graduation_row(row) and outcome == "rejected")
+        or (_is_graduation_row(row) and accepted_flag is False)
+        or (_is_graduation_row(row) and status == "rejected")
     )

Also applies to: 39-49

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/scripts/weekly_correction_snapshot.py` around lines 26 - 36, The
function _is_graduation_accepted is treating generic outcome/status/accepted
fields as graduation results even when the event isn't graduation-related;
update it to first detect that the row is a graduation event (e.g., event
contains "graduation" or is in the graduation event set like
{"lesson.graduated","graduation.accepted","graduation.rejected"}) and only then
evaluate outcome/status/accepted for accepted logic. Make the same change for
the corresponding rejection logic (the similar block at lines referenced 39–49)
so outcome/status/accepted are only considered when the row is confirmed to be a
graduation event.



def _is_rejection(row: dict[str, Any]) -> bool:
event = str(row.get("event", "")).strip().lower()
outcome = str(row.get("outcome", "")).strip().lower()
accepted_flag = row.get("accepted")
status = str(row.get("status", "")).strip().lower()
return (
event in {"graduation.rejected", "lesson.rejected"}
or outcome == "rejected"
or accepted_flag is False
or status == "rejected"
)


def parse_rows(lines: list[str]) -> tuple[list[dict[str, Any]], int]:
rows: list[dict[str, Any]] = []
skipped = 0
for raw in lines:
line = raw.strip()
if not line:
continue
try:
row = json.loads(line)
except json.JSONDecodeError:
skipped += 1
continue
if not isinstance(row, dict):
skipped += 1
continue
rows.append(row)
return rows, skipped


def aggregate(rows: list[dict[str, Any]]) -> dict[str, Any]:
total_corrections = 0
accepted_graduations = 0
rejection_count = 0
categories: Counter[str] = Counter()

for row in rows:
if _is_correction(row):
total_corrections += 1
categories[_normalize_category(row.get("category"))] += 1
is_accepted = _is_graduation_accepted(row)
is_rejected = _is_rejection(row)
if is_accepted and not is_rejected:
accepted_graduations += 1
elif is_rejected and not is_accepted:
rejection_count += 1

denominator = accepted_graduations + rejection_count
acceptance_rate = round(accepted_graduations / denominator, 6) if denominator else 0.0

top_categories = [
{"category": name, "count": count}
for name, count in sorted(categories.items(), key=lambda item: (-item[1], item[0]))[:5]
]

return {
"total_corrections": total_corrections,
"accepted_graduations": accepted_graduations,
"rejection_count": rejection_count,
"acceptance_rate": acceptance_rate,
"top_rule_categories": top_categories,
}


def _read_lines(path: str | None) -> list[str]:
if path:
with open(path, encoding="utf-8") as handle:
return handle.readlines()
return sys.stdin.readlines()


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(
description="Compute correction-outcome aggregates for weekly trend snapshots."
)
parser.add_argument("--input", help="Path to newline-delimited JSON input file")
args = parser.parse_args(argv)

lines = _read_lines(args.input)
rows, skipped_rows = parse_rows(lines)
snapshot = aggregate(rows)
snapshot["skipped_rows"] = skipped_rows

json.dump(snapshot, sys.stdout, sort_keys=True, separators=(",", ":"))
sys.stdout.write("\n")
return 0


if __name__ == "__main__":
raise SystemExit(main())
50 changes: 47 additions & 3 deletions Gradata/src/gradata/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import contextlib
import logging
import math
import os
import re # used by export functions for slug sanitization
import statistics
from datetime import UTC
Expand All @@ -30,6 +31,11 @@
_STATE_RANK = {"INSTINCT": 0, "PATTERN": 1, "RULE": 2}
# Severity ordering for min_severity gating
_SEV_RANK = {"as-is": 0, "minor": 1, "moderate": 2, "major": 3, "discarded": 4}
_LOW_SIGNAL_EDIT_DISTANCE_FLOOR = 0.04
# FORMAT/DRAFTING synonym swaps carry minimal signal; require a larger edit
# before recording a lesson so we don't learn from synonym-level noise.
_FORMAT_DRAFTING_EDIT_DISTANCE_FLOOR = 0.07
_FORMAT_DRAFTING_CATEGORIES = frozenset({"FORMAT", "DRAFTING"})

# Map evaluator dimension names to correction categories
_DIMENSION_CATEGORY_MAP = {
Expand All @@ -54,6 +60,18 @@ def _filter_lessons_by_state(lessons, min_state: str = "PATTERN"):
]


def _is_meaningful_low_signal_change(draft: str, final: str, category: str) -> bool:
"""Allow known-meaningful tiny edits to pass the low-signal floor."""
cat = (category or "UNKNOWN").upper()
if cat in {"ACCURACY", "SECURITY"}:
return True
# Proper-noun/acronym capitalization fixes can carry meaning even when
# edit distance is tiny.
if draft != final and draft.lower() == final.lower():
return bool(re.search(r"\b[A-Z]{2,}\b|\b[A-Z][a-z]{2,}\b", final))
return False


# ── correct() ──────────────────────────────────────────────────────────


Expand Down Expand Up @@ -99,7 +117,7 @@ def brain_correct(
agent_type: str | None = None,
approval_required: bool = False,
dry_run: bool = False,
min_severity: str = "as-is",
min_severity: str = "minor",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Revert the default min_severity to avoid an unintentional behavior change.

Line 120 changes default gating to "minor", which suppresses "as-is" corrections for all existing callers and changes learning behavior by default.

Proposed fix
-    min_severity: str = "minor",
+    min_severity: str = "as-is",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
min_severity: str = "minor",
min_severity: str = "as-is",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/_core.py` at line 120, Revert the default value of the
min_severity parameter back to its prior setting to avoid changing default
gating behavior—restore min_severity from "minor" to the original "as-is" in the
function or class signature where min_severity is declared (look for the
min_severity parameter in _core.py) so existing callers continue to receive
"as-is" corrections by default.

scope: str | None = None,
applies_to: str | None = None,
auto_heal: bool = False,
Expand Down Expand Up @@ -353,8 +371,22 @@ def brain_correct(
update_confidence,
)

if not is_observation_dup and _SEV_RANK.get(diff.severity, 0) >= _SEV_RANK.get(
min_severity, 0
_cat_upper = (category or "UNKNOWN").upper()
_ed_floor = (
_FORMAT_DRAFTING_EDIT_DISTANCE_FLOOR
if _cat_upper in _FORMAT_DRAFTING_CATEGORIES
else _LOW_SIGNAL_EDIT_DISTANCE_FLOOR
)
low_signal_filtered = (
diff.severity in {"as-is", "minor"}
and diff.edit_distance < _ed_floor
and not _is_meaningful_low_signal_change(draft, final, category or "UNKNOWN")
)
event["low_signal_filtered"] = low_signal_filtered
if (
not is_observation_dup
and not low_signal_filtered
and _SEV_RANK.get(diff.severity, 0) >= _SEV_RANK.get(min_severity, 0)
):
lessons_path = brain._find_lessons_path(create=True)
if lessons_path:
Expand Down Expand Up @@ -1014,6 +1046,18 @@ def _lesson_key(lesson):
if all_lessons: # guard against wiping lessons file when all lessons are killed
write_lessons_safe(lessons_path, format_lessons(all_lessons))

# Auto-export AGENTS.md by default so post-graduation rules are
# available to AGENTS.md-aware tools without requiring a manual CLI step.
auto_export_agents = os.environ.get("GRADATA_AUTO_EXPORT_AGENTS", "1").strip().lower()
if auto_export_agents not in {"0", "false", "off", "no"}:
try:
from gradata.enhancements.rule_export import export_rules

agents_text = export_rules(brain.dir, target="agents", lessons_path=lessons_path)
(brain.dir / "AGENTS.md").write_text(agents_text, encoding="utf-8")
except Exception as e:
_log.debug("AGENTS.md auto-export skipped: %s", e)

# Archive graduated RULE lessons
new_rules = [
l
Expand Down
14 changes: 4 additions & 10 deletions Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@
graduation_thresholds,
is_hook_enforced,
)
from gradata.enhancements.self_improvement._graduation_flags import (
read_beta_lb_threshold,
)

_log = logging.getLogger(__name__)

Expand Down Expand Up @@ -105,7 +108,6 @@ def _read_beta_lb_config() -> tuple[bool, float, int]:
Called once per ``graduate()`` invocation so per-lesson gate checks can
skip repeated ``os.environ.get`` lookups inside the graduation loop.
"""
import math
import os

enabled = os.environ.get("GRADATA_BETA_LB_GATE", "1").lower() not in (
Expand All @@ -115,15 +117,7 @@ def _read_beta_lb_config() -> tuple[bool, float, int]:
"off",
)
defaults = graduation_thresholds()
try:
threshold = float(
os.environ.get("GRADATA_BETA_LB_THRESHOLD", str(defaults.beta_lb_threshold))
)
if not math.isfinite(threshold):
threshold = defaults.beta_lb_threshold
threshold = min(max(threshold, 0.0), 1.0)
except (TypeError, ValueError):
threshold = defaults.beta_lb_threshold
threshold = read_beta_lb_threshold(defaults.beta_lb_threshold)
try:
min_fires = max(
0,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
"""Experiment knobs for Beta-LB graduation gating.

These settings belong to GRA-210 and intentionally keep runtime behavior
backwards-compatible by default. Production default remains 0.75, while
`GRADATA_BETA_LB_THRESHOLD` can be set to `0.55` for the staged experiment.
"""

from __future__ import annotations

import math
import os

# GRA-210: graduation_threshold experiment parameter for Beta-LB lower-bound checks.
GRA_210_EXPERIMENT = "GRA-210"
GRA_210_GRADUATION_THRESHOLD_ENV = "GRADATA_BETA_LB_THRESHOLD"
GRA_210_GRADUATION_THRESHOLD_DEFAULT = 0.75


def read_beta_lb_threshold(default: float = GRA_210_GRADUATION_THRESHOLD_DEFAULT) -> float:
"""Read the Beta-LB threshold override from env.

Returns a float clipped to [0.0, 1.0], or ``default`` when parsing fails.
"""

raw_value = os.environ.get(GRA_210_GRADUATION_THRESHOLD_ENV)
if raw_value is None:
return default

try:
threshold = float(raw_value)
except (TypeError, ValueError):
return default

if not math.isfinite(threshold):
return default

return min(max(threshold, 0.0), 1.0)
Comment on lines +19 to +37
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Normalize default before returning to honor the function contract.

When env parsing fails (or env is unset), an out-of-range default can currently bypass clipping even though the docstring promises a [0.0, 1.0] result.

Proposed patch
 def read_beta_lb_threshold(default: float = GRA_210_GRADUATION_THRESHOLD_DEFAULT) -> float:
@@
-    raw_value = os.environ.get(GRA_210_GRADUATION_THRESHOLD_ENV)
+    if not math.isfinite(default):
+        default = GRA_210_GRADUATION_THRESHOLD_DEFAULT
+    default = min(max(default, 0.0), 1.0)
+
+    raw_value = os.environ.get(GRA_210_GRADUATION_THRESHOLD_ENV)
     if raw_value is None:
         return default
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_graduation_flags.py`
around lines 19 - 37, The function read_beta_lb_threshold can return an
out-of-range default without clipping; update it so every return path returns a
normalized value in [0.0, 1.0] by clipping the default before returning when
raw_value is None, parsing fails, or threshold is non-finite. Modify
read_beta_lb_threshold to compute a clipped_default = min(max(default, 0.0),
1.0) (or reuse the same min/max logic used for threshold) and return
clipped_default instead of default in the branches that currently return
default; keep the same behavior for successfully parsed finite thresholds.

4 changes: 4 additions & 0 deletions Gradata/tests/test_byo_key_provider.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
from __future__ import annotations

import pytest

from gradata.llm.byo_key import BYOKeyProvider

pytest.importorskip("httpx")


class _Response:
def __init__(self, payload: dict):
Expand Down
Loading
Loading