-
Notifications
You must be signed in to change notification settings - Fork 2.4k
https://github.com/p-e-w/heretic/compare/master...RyderFreeman4Logos:heretic:feat/llm-judge-pipeline?expand=1 #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
RyderFreeman4Logos
wants to merge
20
commits into
p-e-w:master
from
RyderFreeman4Logos:feat/llm-judge-pipeline
Closed
Changes from 19 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
f2cec53
feat(heretic): add LLM judge, 2k eval subset, logging, and dual-basel…
RyderFreeman4Logos 4135be6
feat(judge): add per-model token usage tracking and cost estimation
RyderFreeman4Logos 2f31c67
perf(judge): parallelize LLM judge with ThreadPoolExecutor (6 concurr…
RyderFreeman4Logos 557573b
perf(pipeline): cross-trial GPU/LLM-judge pipeline via ask/tell loop
RyderFreeman4Logos f709905
fix(pipeline): address code review findings
RyderFreeman4Logos cb7e83b
chore(repo): fix lint/format/typecheck errors, add quality gate hooks
RyderFreeman4Logos 13b3442
fix(pipeline): address cumulative review findings
RyderFreeman4Logos e1653a5
fix(pipeline): handle non-interrupt exceptions, restore print_responses
RyderFreeman4Logos 311980c
fix(judge): address security and robustness review findings
RyderFreeman4Logos bea19e8
fix(judge): harden prompt boundary, increase response limit, cancel o…
RyderFreeman4Logos 56a680a
fix(judge): complete injection boundary, non-blocking shutdown, add t…
RyderFreeman4Logos a09d9b2
Merge pull request #1 from RyderFreeman4Logos/feat/llm-judge-pipeline
RyderFreeman4Logos 462e17b
feat(judge): hot-reloadable config via judge.toml + env vars
RyderFreeman4Logos ac17262
fix(judge): add tomli dep for py3.10, validate config inputs
RyderFreeman4Logos 2404d45
style: address upstream review findings (copyright, types, comments, …
RyderFreeman4Logos 4ea4d52
style: address second-round review findings (f-strings, config sync, …
RyderFreeman4Logos f47108e
style: remove unused count_refusals, extract _print_response helper
RyderFreeman4Logos 3cec064
style: robust int parsing, f-string logging, comment punctuation in l…
RyderFreeman4Logos e01c882
style: punctuate section comments, add test type annotations
RyderFreeman4Logos 2e87930
style: punctuate all comments, use idiomatic set union
RyderFreeman4Logos File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| # LLM judge configuration (hot-reloadable — changes take effect without restart). | ||
| # | ||
| # Copy to judge.toml and edit. Environment variables override file values. | ||
| # | ||
| # Env var mapping: | ||
| # LLM_JUDGE_API_BASE, LLM_JUDGE_API_KEY, LLM_JUDGE_MODELS (comma-separated), | ||
| # LLM_JUDGE_BATCH_SIZE, LLM_JUDGE_CONCURRENCY, LLM_JUDGE_TIMEOUT, | ||
| # LLM_JUDGE_MAX_RETRIES, LLM_JUDGE_PRICING (model:in:out,...) | ||
| # | ||
| # Config file path can be changed via LLM_JUDGE_CONFIG env var (default: judge.toml). | ||
|
|
||
| api_base = "http://localhost:8317/v1/chat/completions" | ||
| # api_key = "" # Prefer LLM_JUDGE_API_KEY env var. | ||
|
|
||
| models = ["gpt-mini", "spark", "gemini-flash"] | ||
|
|
||
| batch_size = 10 # Items per API call. | ||
| concurrency = 6 # Parallel batch workers. | ||
| timeout = 90 # Seconds per HTTP request. | ||
| max_retries = 3 # Retries per model before fallback. | ||
|
|
||
| [pricing] # USD per 1M tokens: [input, output]. | ||
| gpt-mini = [0.15, 0.60] | ||
| spark = [0.50, 2.00] | ||
| gemini-flash = [0.15, 0.60] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| pre-commit: | ||
| commands: | ||
| fmt: | ||
| run: mise run fmt | ||
| lint: | ||
| run: mise run lint | ||
| typecheck: | ||
| run: mise run typecheck | ||
| build: | ||
| run: mise run build |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| [tools] | ||
| uv = "latest" | ||
| lefthook = "latest" | ||
|
|
||
| [tasks.fmt] | ||
| description = "Check code formatting" | ||
| run = "uv run ruff format --check ." | ||
|
|
||
| [tasks."fmt:fix"] | ||
| description = "Apply code formatting" | ||
| run = "uv run ruff format ." | ||
|
|
||
| [tasks.lint] | ||
| description = "Lint and check import sorting" | ||
| run = "uv run ruff check --extend-select I ." | ||
|
|
||
| [tasks."lint:fix"] | ||
| description = "Lint and auto-fix" | ||
| run = "uv run ruff check --extend-select I --fix ." | ||
|
|
||
| [tasks.typecheck] | ||
| description = "Type check with ty" | ||
| run = "uv run ty check --error-on-warning ." | ||
|
|
||
| [tasks.build] | ||
| description = "Build package" | ||
| run = "uv build" | ||
|
|
||
| [tasks.check] | ||
| description = "Run all quality gates (CI equivalent)" | ||
| depends = ["fmt", "lint", "typecheck", "build"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new setting
use_llm_judgeis missing fromconfig.default.toml. According to the repository's style guide (Rule 8), new settings added toconfig.pymust also be added toconfig.default.tomlwith their default value and description.References