Skip to content

feat: add live eval progress tracking and GET /api/eval-progress.#55

Merged
xavierlyu merged 6 commits into
mainfrom
feat/eval-progress-api
May 17, 2026
Merged

feat: add live eval progress tracking and GET /api/eval-progress.#55
xavierlyu merged 6 commits into
mainfrom
feat/eval-progress-api

Conversation

@xavierlyu
Copy link
Copy Markdown
Collaborator

@xavierlyu xavierlyu commented May 16, 2026

Introduce ephemeral eval_progress.json written by the CPU and GPU during each round, mirrored from S3 on the orchestrator so the API can expose phase, per-challenger status, and a bounded step timeline without touching state.json.


Note

Medium Risk
Touches CPU/GPU validator orchestration and S3 sync behavior (new state file writes, uploads, and deletions), so regressions could affect eval round handoff and housekeeping. Changes are additive and guarded (best-effort writes, exception swallowing), reducing blast radius.

Overview
Adds live eval progress reporting by introducing an ephemeral eval_progress.json that records phase transitions, GPU metadata, per-challenger status, and a step timeline, plus a new GET /api/eval-progress API route (with a 30-min staleness flag and short caching).

Wires progress updates into the CPU validator (on challenger selection, round cleanup, SIGTERM shutdown) and GPU pipeline (baseline/prompt phases, per-challenger status, eval completion), including progress-only S3 uploads and an orchestrator thread that mirrors eval_progress.json from S3 while the remote GPU job runs.

Adds housekeeping: configurable log retention pruning (LOG_RETENTION_DAYS), deletion of stale eval_job.json when all challengers are already known, and a new S3 helper delete_remote_keys; includes new unit tests for progress writing/API, stale job cleanup, and log pruning.

Reviewed by Cursor Bugbot for commit 876f23f. Bugbot is set up for automated code reviews on this repo. Configure here.

Introduce ephemeral eval_progress.json written by the CPU and GPU during each round, mirrored from S3 on the orchestrator so the API can expose phase, per-challenger status, and a bounded step timeline without touching state.json.
@xavierlyu
Copy link
Copy Markdown
Collaborator Author

@BugBot review

Comment thread validator/eval_progress.py
@xavierlyu xavierlyu marked this pull request as ready for review May 16, 2026 12:23
Comment thread validator/gpu_orchestrator.py
Comment thread validator/gpu_eval.py Outdated
- Removed the purge_old_logs function from cpu_validator.py and relocated it to eval_progress.py for better organization.
- Updated imports in relevant files to reflect the new location of the purge_old_logs function.
Comment thread validator/gpu_eval.py
- Added functionality to clear progress in CPU validator upon interruption.
- Updated GPU evaluation to exclude `eval_progress.json` during S3 downloads, ensuring CPU maintains progress tracking.
- Enhanced S3 upload process to push progress updates from GPU evaluations, preserving CPU steps.
- Modified the download function to support excluding specified files, improving flexibility in file management.
Comment thread validator/gpu_eval.py Outdated
- Added signal handling for SIGTERM in CPU validator to gracefully raise KeyboardInterrupt.
- Updated GPU evaluation to simplify S3 download process by removing exclusion of `eval_progress.json`.
- Introduced a new function to upload only the `eval_progress.json` file to S3, enhancing progress tracking efficiency.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 876f23f. Configure here.

Comment thread validator/gpu_orchestrator.py
@xavierlyu xavierlyu merged commit 1280213 into main May 17, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant