-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: generic plugin system #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
red40maxxer
wants to merge
209
commits into
p-e-w:master
Choose a base branch
from
red40maxxer:refusal-plugins
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
209 commits
Select commit
Hold shift + click to select a range
450dacf
feat: refusal detector plugin scaffold
red40maxxer 7ac5c7d
chore: add logging
red40maxxer fcd54e0
style: satisfy ruff
red40maxxer 83fdf07
wip: implement tagging/scoring plugin system and scaffold response me…
red40maxxer 7c8174d
wip: minor fixes and avoiding AttributeError
red40maxxer a4a97c7
style: ruff
red40maxxer 7f0562b
feat(wip): populate metadata fields and allow plugins to declare what…
red40maxxer d1f1428
refactor: extract metadata logic to separate module
red40maxxer 9d9530a
style: placate ruff
red40maxxer d7964e9
chore: use eos token for inferring finish reason with fallback
red40maxxer 81c0913
fix: handle empty responses better
red40maxxer db1a3b6
style: ruff
red40maxxer 8b62b58
refactor: combine response text and metadata into single object
red40maxxer ee3090e
refactor: clean up tagger and scorer usage
red40maxxer ad69824
style: ruff
red40maxxer cfe7939
Merge branch 'master' into refusal-plugins
red40maxxer 99ab9ae
chore: remove is_refusal
red40maxxer 79fab69
style: ruff import ordering
red40maxxer 52b7b12
feat: remove embeddings and generation traces
red40maxxer 7bb0f1b
feat: return all hidden states instead of just last ones
red40maxxer d8f8928
chore: remove testing changes
red40maxxer f0b57c1
style: ruff format
red40maxxer cc32c48
fix: mismatching stop reason identifier
red40maxxer aaf94c2
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer 1ef8a38
chore: update default config ordering
red40maxxer 571a66e
chore: fix merge
red40maxxer 200b957
feat: allow external plugin imports
red40maxxer 16792be
feat: add good_residuals and bad_residuals to context metadata
red40maxxer 996878b
style: ruff
red40maxxer dfc89db
chore: remove unnecessary allow extra
red40maxxer 878b6f9
chore: remove unnecessary system prompt and model name
red40maxxer 558c32d
style: ruff
red40maxxer b670d5f
perf: clear residuals from memory if plugin doesn't need them
red40maxxer 831a60b
feat: support external filepaths and clean up import logic
red40maxxer e65b730
style: ruff
red40maxxer 9f78c46
refactor: consolidate tagger and scorer functionality into a single s…
red40maxxer 8ccb56a
refactor: parent Plugin class for all plugins
red40maxxer e06de46
feat: support multiple scorer plugins
red40maxxer 77c74d9
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer 66fe24a
refactor: type fixes
red40maxxer e826ef2
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer 3c4265d
style: satisfy ruff
red40maxxer 79446d0
refactor: centralize scorer dataclasses
red40maxxer a5a0475
refactor: rename MetricResult to Score
red40maxxer b287bf4
feat: simplify plugin loading
red40maxxer 75f12ad
feat: split response metadata objects and access in evaluationContext
red40maxxer 8f6b46b
style: ruff
red40maxxer b83fd74
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer 0ccdc88
style: ruff
red40maxxer ad3d77f
chore: remove old tagger code
red40maxxer c2973ae
refactor: scorer settings inherit directly from Pydantic
red40maxxer cf0bb6f
refactor: move eval prompts and settings to CountRefusals and KLDiver…
red40maxxer f8eeda6
feat: move scorer config to top level and add support for scale factor
red40maxxer 6db14b8
fix: missing config for scorers
red40maxxer 60c6593
style: ruff
red40maxxer 0f9bde3
fix: scale type error
red40maxxer d4fe63e
docs: fix misleading docstring
red40maxxer f64a520
fix: clean up old fields
red40maxxer 4f18de6
refactor: use BaseModel for scorer settings
red40maxxer f299dcb
chore: make scale default to 1 for safety
red40maxxer 162dd95
refactor: get metadata dynamically through EvaluationContext
red40maxxer 3e42ea8
refactor: rename CountRefusals to RefusalRate
red40maxxer e246555
chore: remove unused kl_divergence config fields
red40maxxer 1a66b2f
docs: restore missing comment
red40maxxer 1310d72
refactor: remove unused code
red40maxxer 8ef5c30
chore: specify settings and model field types
red40maxxer 17332bb
refactor: rename to prompts
red40maxxer 23bf147
refactor: move load_plugin to plugin
red40maxxer 54c723e
style: ruff
red40maxxer 924e54f
refactor: update optimization direction config to use StudyDirection …
red40maxxer 5cf9688
fix: missing TypeVar
red40maxxer 3008c07
fix: missing imports
red40maxxer 03ae02a
fix: use OptimizationDirection peoperly
red40maxxer b970f20
chore: remove names
red40maxxer 85b217f
chore: remove unecessary future import
red40maxxer 1b9b248
chore: remove unused scorer imports
red40maxxer f512ef9
refactor: objective should only return tuple of floats
red40maxxer cfd47f7
refactor: use dataclass for scorer config
red40maxxer 5d054cc
feat: support multiple instances of the same scorer
red40maxxer d40b408
style: ruff
red40maxxer 26d4912
fix: nonexistent name attribute in scorer
red40maxxer 92e8d1c
refactor: clear residuals and analyser
red40maxxer 014298a
docs: MetricResult -> Score
red40maxxer d2a789e
fix: clean up default toml
red40maxxer 39ae171
fix: missed renaming to RefusalRate
red40maxxer 9139df5
chore: missing return ModuleType
red40maxxer 6fb8c38
docs: add SPDX header
red40maxxer 29be299
docs: add SPDX header
red40maxxer 3f9fc26
docs: add SPDX header
red40maxxer 1723a5f
chore: fix misleading field description leftover from old code
red40maxxer 2aa2bb5
chore: add newline
red40maxxer b0134e0
chore: unused settings class
red40maxxer 4a31547
fix: bad import
red40maxxer b8af138
refactor: rename ResponseText -> TextCompletion
red40maxxer e706df3
feat: simplify api
red40maxxer 1ecac64
refactor: rename to get_score
red40maxxer a72c184
feat: namespace scorer configs
red40maxxer 3645450
style: ruff
red40maxxer 5ffee0c
fix: genericize readme intro
red40maxxer 9c87a7b
chore: move init to scorer base class
red40maxxer f542c39
refactor: handle direction and scale outside scorer
red40maxxer a4c0b7d
chore: use underscore for instance names
red40maxxer 5765e9c
fix: add scorer instance name to scores
red40maxxer d5ce30b
refactor: create structured api for scorers to access model
red40maxxer 7ec3729
refactor: rename plugin-specific Settings to PluginSettings
red40maxxer 49d91b2
feat: add instance name to plugin load logging
red40maxxer 291e033
style: ruff
red40maxxer cfbab04
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer d8a24da
chore: allow extra fields for plugins
red40maxxer e1c0eaf
fix: improve plugin loading logic
red40maxxer 72899d0
chore: undo change fixed in master
red40maxxer 4c3693a
chore: remove old code
red40maxxer 6d968ce
docs: adjust docstring
red40maxxer 94a5638
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer d6b0b01
chore: cleanup import
red40maxxer e608424
refactor: unnest plugin settings class and detect from type annotation
red40maxxer 084c904
refactor: use plain str instead of Response object with metadata
red40maxxer 389d501
refactor: move non evaluator-specific methods out
red40maxxer e40ef29
refactor: use enum for StudyDirection
red40maxxer 9f43e40
refactor: no strings as type annotations
red40maxxer a0c2b5a
chore: let evaluator blow up on error
red40maxxer 3d79f02
refactor: rename metrics to scores globally
red40maxxer 8256336
feat: separate cli and hf score displays and clean up readme logic
red40maxxer 04ef601
fix: direction serialization ValidationError when restoring from save
red40maxxer 3da97a9
refactor: rename scorer start() to setup()
red40maxxer 5cde2e6
style: ruff
red40maxxer f1f5faf
fix: remove external plugin test
red40maxxer 4b5668b
refactor: rename setup to init
red40maxxer 790570d
docs: formatting
red40maxxer c1c53ba
refactor: move scorers location in config
red40maxxer d31025d
docs: add comment describing return tensor shape
red40maxxer 53788b8
style: ruff
red40maxxer e172b4b
refactor: simplify scorer setting logic
red40maxxer a5ac47c
refactor: clarify plugin loading logic
red40maxxer 2409a59
refactor: remove unnecessary hashing and inline import_module
red40maxxer 7e29bc2
style: ruff
red40maxxer e9bbbdd
fix: don't use classnames for readme
red40maxxer a12501e
refactor: don't expose heretic settings to scorer
red40maxxer 2df14e7
fix: adjust print responses logic and move to scorer config level
red40maxxer 216f77b
refactor: separate baseline score computation
red40maxxer 21ea9f6
refactor: rename hf_display to md_display
red40maxxer 315cfe0
style: ruff
red40maxxer fbb60ca
Update src/heretic/scorer.py
red40maxxer 4f37ded
Update src/heretic/scorer.py
red40maxxer f1498c6
style: ruff
red40maxxer 4225acd
Merge branch 'refusal-plugins' of https://github.com/red40maxxer/here…
red40maxxer cdade6f
fix: ty error
red40maxxer 7d17a32
refactor: bind Score names to parent Scorers as class property
red40maxxer ca26fb7
docs: fix doc
red40maxxer 96d8da8
Merge branch 'master' into refusal-plugins
red40maxxer d39ddd7
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer 9d4528a
docs: update comment
red40maxxer c6e8fc0
style: remove changes
red40maxxer 9e8cc6b
chore: define default refusal markers
red40maxxer 4164125
style: ruff
red40maxxer ee43328
style: remove whitespace changes
red40maxxer eacb3e2
docs: tweak docs
red40maxxer eae3bed
Merge remote-tracking branch 'origin/master' into refusal-plugins
red40maxxer e383e31
chore: cleanup from merge
red40maxxer 49dd9ab
style: ruff
red40maxxer 747db56
fix: handle negative floating point kld
red40maxxer e6ec4d5
style: formatting
red40maxxer 544d431
chore: remove unused code
red40maxxer 39a29cb
chore: ruff
red40maxxer 61976d1
style: undo line removal
red40maxxer 89ac29e
style: update formatting and remove old comment
red40maxxer 93eb30f
docs: undo style change
red40maxxer d31faf2
docs: update field description
red40maxxer c44cc5b
docs: tweak docstring
red40maxxer c0ba229
chore: revert kld absolute value forcing
red40maxxer 1e22fe3
style: ruff
red40maxxer f00049c
Merge branch 'p-e-w:master' into refusal-plugins
red40maxxer b907c0b
chore: cleanup
red40maxxer d782a0e
docs: update header
red40maxxer 37741d7
docs: update header
red40maxxer 7733984
refactor: remove unnecessary conditional imports
red40maxxer 2980ffa
fix: apply review omments on refusalrate
red40maxxer 371406b
refactor: move contract validation to plugin
red40maxxer a98469b
refactor: move Context to Plugin
red40maxxer 42a5825
refactor: move init to plugin level
red40maxxer 0375b3a
refactor: move init() to plugin
red40maxxer 83495a5
style: ruff
red40maxxer 36a396e
docs: update SPDX header
red40maxxer f4ea641
refactor: derive score name from scorer.score_name
red40maxxer b8722bd
chore: no None option for baseline_score_displays
red40maxxer 88edd13
fix: show CLI formatted metrics in trial selection
red40maxxer e2dcd56
fix: sort trials by scores
red40maxxer b9caa59
chore: remove unnecessary from future import
red40maxxer b3c4d14
chore: remove scorer scale field
red40maxxer 8ef6556
refactor: import Context from plugin
red40maxxer 3a0d5b1
docs: add quote to direction
red40maxxer 3b67f74
refactor: move model_config to the end of the class
red40maxxer 9fc3b1d
refactor: use dataclass for consistency
red40maxxer 0cd7d8f
refactor: use BaseModel and store study direction as str
red40maxxer 129550f
docs: move docstring location
red40maxxer 36d042e
refactor: combine scorer load and init
red40maxxer 682ae81
refactor: use best_trials for single and multi-objective
red40maxxer 5eace95
refactor: remove all .get()
red40maxxer c1a43ae
refactor: remove unused dataclass
red40maxxer ffe24d7
refactor: use ScorerEntry dataclass for improved code quality
red40maxxer c3f330f
Merge branch 'master' into refusal-plugins
red40maxxer f8bb147
Merge branch 'master' into refusal-plugins
red40maxxer bf0c283
Merge remote-tracking branch 'upstream/master' into refusal-plugins
red40maxxer 744e968
style: ruff
red40maxxer 025c42e
Merge remote-tracking branch 'upstream/master' into refusal-plugins
red40maxxer e54769f
chore: adapt reproducibility to plugin architecture
red40maxxer b90ed2f
chore: address PR comments
red40maxxer 84d5d47
chore: make `ScorerConfig` fields full `Field()`
red40maxxer 43a674a
chore: address pr comments
red40maxxer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.