feat: add openenv.yaml manifests (repo + per-Space)#19
Merged
Conversation
Closes the "valid openenv.yaml manifest" engineering-quality criterion in
the OpenEnv hackathon judging guidelines.
- Repo-root openenv.yaml: package-level manifest pointing at all 3 task
Spaces, listing the shared package primitives (FastAPI app, port, tools,
base classes) and confirming none of our MCP tools collide with reserved
names (reset/step/state/close).
- Per-Space manifests at spaces/{notebook,postgres,type-checker}/openenv.yaml:
one task per file, populated from the corresponding TaskConfig defaults
(workspace_dir, episode_timeout_s, l1_score_mode, etc.) plus a
composite-rubric description (gate_checks, l1_tests, l2_code_review,
l3_plan_review, episode_aggregator).
- prepare_hf_space.py: lift openenv.yaml to the Space root alongside
Dockerfile + README.md so judges pulling the live Space see a valid
manifest at the URL root. Missing manifest is non-fatal for backward
compatibility.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rycerzes
approved these changes
Apr 26, 2026
Member
rycerzes
left a comment
There was a problem hiding this comment.
LGTM! but waiting on merge of the later 3 tasks, so that we can bundle them up together in this PR itself
Member
|
merging this now, not waiting for the rest 2 tasks completion since deadline is imminent |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the "valid
openenv.yamlmanifest" engineering-quality criterion in the OpenEnv hackathon judging guidelines (Round 2 minimum requirements).Why
The judging rubric explicitly calls out a valid
openenv.yamlmanifest as table-stakes engineering quality. Audit ofmainshowed every other env-scope criterion was already met (Spaces healthy, no reserved tool names, Gym-style API, client/server separation, multi-layer rubric) — only this manifest was missing.What changed
submit_plan,submit_subtask,get_status,advance) avoid the reserved set (reset,step,state,close).TaskConfigdefaults (workspace_dir,episode_timeout_s,l1_score_mode, etc.) plus a composite-rubric description (gate_checks → l1_tests → l2_code_review → l3_plan_review → episode_aggregator).openenv.yamlto the Space root alongsideDockerfile+README.mdso judges pulling the live Space see a valid manifest at the URL root. Missing manifest is non-fatal for backward compatibility.Verified
yaml.safe_loadprepare_hf_space.pyfor each task liftsopenenv.yamlto payload rootspaces/<other>/subtree stripped from each payload/healthon all 3 deployed Spaces/reset200,/state200 (sampled on type-checker)submit_plan,submit_subtask,get_status,advanceFollowups
reward.jsonhard-fail coverage — to be verified in a separate PR (PG usesratiomode so naturally safe).{diff}and{plan}in<agent_*>delimiters) — separate PR; small change but shifts judge calibration slightly.Test plan
sync-hf-spacesredeploys all 3 Spaces;validate-spacesprobe still 200curl https://rycerzes-frontier-swe-{notebook,postgres,type-checker}.hf.space/openenv.yamlpost-deploy returns the per-task manifest🤖 Generated with Claude Code