feat: add live eval progress tracking and GET /api/eval-progress.#55
Merged
Conversation
Introduce ephemeral eval_progress.json written by the CPU and GPU during each round, mirrored from S3 on the orchestrator so the API can expose phase, per-challenger status, and a bounded step timeline without touching state.json.
Collaborator
Author
|
@BugBot review |
- Removed the purge_old_logs function from cpu_validator.py and relocated it to eval_progress.py for better organization. - Updated imports in relevant files to reflect the new location of the purge_old_logs function.
- Added functionality to clear progress in CPU validator upon interruption. - Updated GPU evaluation to exclude `eval_progress.json` during S3 downloads, ensuring CPU maintains progress tracking. - Enhanced S3 upload process to push progress updates from GPU evaluations, preserving CPU steps. - Modified the download function to support excluding specified files, improving flexibility in file management.
- Added signal handling for SIGTERM in CPU validator to gracefully raise KeyboardInterrupt. - Updated GPU evaluation to simplify S3 download process by removing exclusion of `eval_progress.json`. - Introduced a new function to upload only the `eval_progress.json` file to S3, enhancing progress tracking efficiency.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 876f23f. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Introduce ephemeral eval_progress.json written by the CPU and GPU during each round, mirrored from S3 on the orchestrator so the API can expose phase, per-challenger status, and a bounded step timeline without touching state.json.
Note
Medium Risk
Touches CPU/GPU validator orchestration and S3 sync behavior (new state file writes, uploads, and deletions), so regressions could affect eval round handoff and housekeeping. Changes are additive and guarded (best-effort writes, exception swallowing), reducing blast radius.
Overview
Adds live eval progress reporting by introducing an ephemeral
eval_progress.jsonthat records phase transitions, GPU metadata, per-challenger status, and a step timeline, plus a newGET /api/eval-progressAPI route (with a 30-min staleness flag and short caching).Wires progress updates into the CPU validator (on challenger selection, round cleanup, SIGTERM shutdown) and GPU pipeline (baseline/prompt phases, per-challenger status, eval completion), including progress-only S3 uploads and an orchestrator thread that mirrors
eval_progress.jsonfrom S3 while the remote GPU job runs.Adds housekeeping: configurable log retention pruning (
LOG_RETENTION_DAYS), deletion of staleeval_job.jsonwhen all challengers are already known, and a new S3 helperdelete_remote_keys; includes new unit tests for progress writing/API, stale job cleanup, and log pruning.Reviewed by Cursor Bugbot for commit 876f23f. Bugbot is set up for automated code reviews on this repo. Configure here.