Skip to content
Closed
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
f2cec53
feat(heretic): add LLM judge, 2k eval subset, logging, and dual-basel…
RyderFreeman4Logos Mar 25, 2026
4135be6
feat(judge): add per-model token usage tracking and cost estimation
RyderFreeman4Logos Mar 26, 2026
2f31c67
perf(judge): parallelize LLM judge with ThreadPoolExecutor (6 concurr…
RyderFreeman4Logos Mar 26, 2026
557573b
perf(pipeline): cross-trial GPU/LLM-judge pipeline via ask/tell loop
RyderFreeman4Logos Mar 26, 2026
f709905
fix(pipeline): address code review findings
RyderFreeman4Logos Mar 26, 2026
cb7e83b
chore(repo): fix lint/format/typecheck errors, add quality gate hooks
RyderFreeman4Logos Mar 26, 2026
13b3442
fix(pipeline): address cumulative review findings
RyderFreeman4Logos Mar 26, 2026
e1653a5
fix(pipeline): handle non-interrupt exceptions, restore print_responses
RyderFreeman4Logos Mar 26, 2026
311980c
fix(judge): address security and robustness review findings
RyderFreeman4Logos Mar 26, 2026
bea19e8
fix(judge): harden prompt boundary, increase response limit, cancel o…
RyderFreeman4Logos Mar 26, 2026
56a680a
fix(judge): complete injection boundary, non-blocking shutdown, add t…
RyderFreeman4Logos Mar 26, 2026
a09d9b2
Merge pull request #1 from RyderFreeman4Logos/feat/llm-judge-pipeline
RyderFreeman4Logos Mar 26, 2026
462e17b
feat(judge): hot-reloadable config via judge.toml + env vars
RyderFreeman4Logos Mar 26, 2026
ac17262
fix(judge): add tomli dep for py3.10, validate config inputs
RyderFreeman4Logos Mar 26, 2026
2404d45
style: address upstream review findings (copyright, types, comments, …
RyderFreeman4Logos Mar 27, 2026
4ea4d52
style: address second-round review findings (f-strings, config sync, …
RyderFreeman4Logos Mar 27, 2026
f47108e
style: remove unused count_refusals, extract _print_response helper
RyderFreeman4Logos Mar 27, 2026
3cec064
style: robust int parsing, f-string logging, comment punctuation in l…
RyderFreeman4Logos Mar 27, 2026
e01c882
style: punctuate section comments, add test type annotations
RyderFreeman4Logos Mar 27, 2026
2e87930
style: punctuate all comments, use idiomatic set union
RyderFreeman4Logos Mar 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,12 @@ wheels/
# Editors
/.vscode/

# Configuration files
# Configuration files (may contain API keys)
/config.toml
/judge.toml

# Environment variables
.env

# Study checkpoints
/checkpoints/
Expand Down
25 changes: 25 additions & 0 deletions judge.default.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# LLM judge configuration (hot-reloadable — changes take effect without restart).
#
# Copy to judge.toml and edit. Environment variables override file values.
#
# Env var mapping:
# LLM_JUDGE_API_BASE, LLM_JUDGE_API_KEY, LLM_JUDGE_MODELS (comma-separated),
# LLM_JUDGE_BATCH_SIZE, LLM_JUDGE_CONCURRENCY, LLM_JUDGE_TIMEOUT,
# LLM_JUDGE_MAX_RETRIES, LLM_JUDGE_PRICING (model:in:out,...)
#
# Config file path can be changed via LLM_JUDGE_CONFIG env var (default: judge.toml).

api_base = "http://localhost:8317/v1/chat/completions"
# api_key = "" # prefer LLM_JUDGE_API_KEY env var
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This inline comment violates the repository's style guide (Rule 4), which states that comments should start with a capital letter and end with a period. Please update this and other comments in the file to adhere to the style guide.

Suggested change
# api_key = "" # prefer LLM_JUDGE_API_KEY env var
# api_key = "" # Prefer LLM_JUDGE_API_KEY env var.
References
  1. Comments should start with a capital letter and end with a period. They should use correct grammar and spelling. (link)


models = ["gpt-mini", "spark", "gemini-flash"]

batch_size = 10 # Items per API call.
concurrency = 6 # Parallel batch workers.
timeout = 90 # Seconds per HTTP request.
max_retries = 3 # Retries per model before fallback.

[pricing] # USD per 1M tokens: [input, output]
gpt-mini = [0.15, 0.60]
spark = [0.50, 2.00]
gemini-flash = [0.15, 0.60]
10 changes: 10 additions & 0 deletions lefthook.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
pre-commit:
commands:
fmt:
run: mise run fmt
lint:
run: mise run lint
typecheck:
run: mise run typecheck
build:
run: mise run build
31 changes: 31 additions & 0 deletions mise.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[tools]
uv = "latest"
lefthook = "latest"

[tasks.fmt]
description = "Check code formatting"
run = "uv run ruff format --check ."

[tasks."fmt:fix"]
description = "Apply code formatting"
run = "uv run ruff format ."

[tasks.lint]
description = "Lint and check import sorting"
run = "uv run ruff check --extend-select I ."

[tasks."lint:fix"]
description = "Lint and auto-fix"
run = "uv run ruff check --extend-select I --fix ."

[tasks.typecheck]
description = "Type check with ty"
run = "uv run ty check --error-on-warning ."

[tasks.build]
description = "Build package"
run = "uv build"

[tasks.check]
description = "Run all quality gates (CI equivalent)"
depends = ["fmt", "lint", "typecheck", "build"]
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ dependencies = [
]

[project.optional-dependencies]
llm-judge = [
"httpx>=0.27",
"tomli>=2; python_version < '3.11'",
]
research = [
"geom-median~=0.1",
"imageio~=2.37",
Expand All @@ -52,6 +56,7 @@ research = [

[dependency-groups]
dev = [
"pytest>=9.0.2",
"ruff>=0.14.5",
"ty>=0.0.5",
]
Expand Down
5 changes: 5 additions & 0 deletions src/heretic/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,11 @@ class Settings(BaseSettings):
description="Strings whose presence in a response (case insensitive) identifies the response as a refusal.",
)

use_llm_judge: bool = Field(
default=False,
description="Use LLM judge for refusal classification instead of substring matching.",
)
Comment on lines +344 to +347
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new setting use_llm_judge is missing from config.default.toml. According to the repository's style guide (Rule 8), new settings added to config.py must also be added to config.default.toml with their default value and description.

References
  1. When new settings are added in config.py, they should also be added to config.default.toml, set to their default value and with their description as a comment. The order of settings in config.default.toml should match that in config.py. (link)


system_prompt: str = Field(
default="You are a helpful assistant.",
description="System prompt to use when prompting the model.",
Expand Down
Loading