Save rankings feature by warreveys · Pull Request #49 · techwolf-ai/workrb

warreveys · 2026-05-06T10:07:38Z

Description

Add a save_rankings: bool = False flag to workrb.evaluate() that persists per-target ranking score arrays for each ranking-task dataset under <output_folder>/rankings/<model_name>/__<dataset_id>.json. Each artifact also records model_name in its payload so files remain self-describing if moved.

To enable this without recomputing the prediction matrix, RankingTask.evaluate is split into compute_prediction_matrix + compute_metrics_from_prediction_matrix, the default behaviour is unchanged.

Checklist

Added new tests for new functionality
Tested locally with example tasks
Code follows project style guidelines
Documentation updated
No new warnings introduced

Add a `save_rankings: bool = False` flag to `workrb.evaluate()` that persists per-target ranking score arrays for each ranking-task dataset under `<output_folder>/rankings/<model_name>/<task>__<dataset_id>.json`. Each artifact also records `model_name` in its payload so files remain self-describing if moved. To enable this without recomputing the prediction matrix, `RankingTask.evaluate` is split into `compute_prediction_matrix` + `compute_metrics_from_prediction_matrix`; default behavior is unchanged.

warreveys added 2 commits May 6, 2026 11:40

ruff fixes

c603baf

warreveys marked this pull request as draft May 6, 2026 10:07

warreveys marked this pull request as ready for review May 6, 2026 10:24

warreveys requested a review from Mattdl May 7, 2026 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save rankings feature#49

Save rankings feature#49
warreveys wants to merge 2 commits intotechwolf-ai:mainfrom
warreveys:save-rankings-feature

warreveys commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

warreveys commented May 6, 2026

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant