Repin vllm and inspect_evals so the Dockerfile builds by surelyMersad · Pull Request #36 · aisa-group/PostTrainBench

surelyMersad · 2026-04-25T00:19:12Z

Two upstream-shifted dependencies were preventing the harbor_adapter Dockerfile from building on Modal:

vllm==0.11.0 requires xformers==0.0.32.post1, which is no longer on PyPI for manylinux_x86_64. Repinned to 0.19.1, which builds and runs end-to-end on Modal.
inspect_evals was cloned --depth=1 from main, but main HEAD now requires Python>=3.11 while the image installs python3.10. Switched to uv pip install "inspect_evals @ git+...@<sha>" pinned to commit 03cb4bc2 (2026-03-15), the last commit on main still declaring requires-python = ">=3.10". Also removes the manual git clone step.

Also adds local-only artifacts to .gitignore so they don't sneak in.

Test plan

python run_adapter.py --benchmark gsm8k --model qwen3-1.7b --output ./tasks generates without
error
harbor run -c <task>/job.yaml --agent nop --env modal --yes builds the image successfully on
Modal — verified twice in this branch (~8 min build with warm cache, ~15 min cold), no resolver errors
on either the vllm or inspect_evals install steps. Verifier writes reward.txt and metrics.json
cleanly.

Two upstream-shifted dependencies were preventing the harbor_adapter Dockerfile from building on Modal: - vllm==0.11.0 requires xformers==0.0.32.post1, which is no longer on PyPI for manylinux_x86_64. Repinned to 0.19.1, which builds and runs end-to-end on Modal. - inspect_evals was cloned --depth=1 from main, but main HEAD now requires Python>=3.11 while the image installs python3.10. Switched to `uv pip install "inspect_evals @ git+...@<sha>"` pinned to commit 03cb4bc2 (2026-03-15), the last commit on main still declaring requires-python = ">=3.10". Also removes the manual git clone step. Also adds local-only artifacts to .gitignore so they don't sneak in.

hrdkbhatnagar · 2026-05-08T15:09:31Z

Thanks for catching the build break, but we can't take the vllm bump as-is. We used vllm 0.11.0 for the original PTB leaderboard runs, and any inference version change risks decoding level divergence from those baselines, which is a parity claim we need to defend in the paper. Same logic for the rest of the ML stack.

Already pushed an alternate fix to add_harbor_support that mirrors containers/opus_4_6_1m.def from the upstream repo (the def file we used to generate the leaderboard):

vllm pinned to 0.11.0
ML deps from requirements-direct.txt
flash-attn 2.8.3
inspect_ai_vllm_stdout fork
--torch-backend=cu128 (Modal's build VM has no nvidia-smi, so cuda autodetect resolves to CPU torch and breaks vllm's xformers requirement)

Builds on Modal end to end. Thus the inspect_evals switch to a pinned commit would also not be needed

surelyMersad mentioned this pull request Apr 25, 2026

Close Harbor integration gaps: timer daemon, verifier isolation, artifact collection #37

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repin vllm and inspect_evals so the Dockerfile builds#36

Repin vllm and inspect_evals so the Dockerfile builds#36
surelyMersad wants to merge 1 commit into
aisa-group:add_harbor_supportfrom
surelyMersad:harbor-dockerfile-fix

surelyMersad commented Apr 25, 2026

Uh oh!

hrdkbhatnagar commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

surelyMersad commented Apr 25, 2026

Test plan

Uh oh!

hrdkbhatnagar commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants