diff --git a/README.md b/README.md
index e69de29..83bda3b 100644
--- a/README.md
+++ b/README.md
@@ -0,0 +1,278 @@
+# Frontier SWE OpenEnv
+
+A family of long-horizon software-engineering environments for [OpenEnv](https://github.com/rycerzes/OpenEnv), packaged as Docker images and mirrored to Hugging Face Spaces. Each task exposes the same OpenEnv-shaped **FastAPI** surface (Gym-style `/reset`, `/step`, `/state`, `/health`) plus **MCP** tools for planning and submission. A **composite rubric** (workspace gates, structured or regex-based L1 scores, optional LLM code and plan review) aggregates into a normalised episode reward.
+
+This repository is organised like a small monorepo: shared Python server and client live under `frontier_swe_env/`, task assets under `tasks/<task-id>/`, and each deployable Space under `spaces/<space-name>/` (Dockerfile, README with HF card front matter, and `openenv.yaml`).
+
+These environments are **adapted from the [FrontierSWE](https://www.frontierswe.com/) benchmark** ([`proximal-labs/frontier-swe`](https://github.com/proximal-labs/frontier-swe) on GitHub): long-horizon systems and performance problems repackaged as OpenEnv-shaped services with a shared rubric and MCP tooling. The **Tasks** table below links each OpenEnv task **one-to-one** to its official FrontierSWE write-up.
+
+## Features
+
+- **Shared runtime**: One FastMCP/OpenEnv stack per image; task-specific workspace, verifier, and instructions are baked into the image.
+- **Gym-style control**: `POST /reset`, `POST /step`, `GET /state`, `GET /health` for training and evaluation harnesses.
+- **MCP for agents**: OpenEnv JSON-RPC at `POST /mcp`, and Streamable HTTP for adapters at `/tools/mcp` (POST and GET/SSE).
+- **Episode tools**: `submit_plan`, `submit_subtask`, `get_status`, `advance` (see `openenv.yaml` and each Space manifest).
+- **Multi-layer scoring**: Gate scripts, L1 (tests, `reward.json`, or regex ratio), L2/L3 LLM judges when grader API env vars are set, then a weighted episode blend.
+
+## Tasks
+
+| Task ID | Domain | FrontierSWE write-up | OpenEnv manifest | Hugging Face Space | GHCR image |
+| --- | --- | --- | --- | --- | --- |
+| `notebook-compression` | Systems / compression | [Notebook compression](https://www.frontierswe.com/notebook-compression) | [`spaces/notebook/openenv.yaml`](spaces/notebook/openenv.yaml) | [rycerzes/frontier-swe-notebook](https://huggingface.co/spaces/rycerzes/frontier-swe-notebook) | `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-notebook:latest` |
+| `postgres-sqlite-wire-adapter` | Systems / databases / Zig | [PostgreSQL on SQLite](https://www.frontierswe.com/postgres-sqlite-wire-adapter) | [`spaces/postgres/openenv.yaml`](spaces/postgres/openenv.yaml) | [rycerzes/frontier-swe-postgres](https://huggingface.co/spaces/rycerzes/frontier-swe-postgres) | `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-postgres:latest` |
+| `dependent-type-checker` | PL / type theory | [Dependent type checker](https://www.frontierswe.com/dependent-type-checker) | [`spaces/type-checker/openenv.yaml`](spaces/type-checker/openenv.yaml) | [rycerzes/frontier-swe-type-checker](https://huggingface.co/spaces/rycerzes/frontier-swe-type-checker) | `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-dependent-type-checker:latest` |
+| `libexpat-to-x86asm` | Systems / x86-64 assembly / XML | [libexpat to assembly](https://www.frontierswe.com/libexpat-to-x86asm) | [`spaces/libexpat-to-x86asm/openenv.yaml`](spaces/libexpat-to-x86asm/openenv.yaml) | [rycerzes/frontier-swe-libexpat-to-x86asm](https://huggingface.co/spaces/rycerzes/frontier-swe-libexpat-to-x86asm) | `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-libexpat-to-x86asm:latest` |
+
+Authoritative package metadata for tooling (for example `openenv pull`) lives in the root [`openenv.yaml`](openenv.yaml).
+
+## Task assets and runtime configuration
+
+The repo splits responsibilities in two places that sound similar but are **not** duplicates of each other:
+
+| Location | Role |
+| --- | --- |
+| [`tasks/<task-id>/`](tasks/) | **Problem pack** checked into git: human-facing `instruction.md`, verifier shell scripts, Python helpers such as `compute_reward.py`, hidden tests, datasets, and anything the Dockerfile `COPY`s into the image. This is where each task’s **reward semantics** are actually implemented (what gets run, what gets written to disk, what counts as a hard fail). |
+| [`frontier_swe_env/tasks/`](frontier_swe_env/tasks/) | **Python registry** of [`TaskConfig`](frontier_swe_env/task_config.py) factories (`pg.py`, `notebook_compression.py`, …). Each module describes how the **running server** should drive scoring: paths **inside the container**, the L1 command string, `l1_score_mode`, JSON paths and anchors, timeouts, episode limits, and text used for L2/L3 LLM prompts. |
+
+**Build time.** Per-task Dockerfiles under [`docker/`](docker/) copy a slice of `tasks/<task-id>/` into fixed locations (for example verifier assets under `/opt/verifier/`, full instructions at `/app/instruction.md` or `/opt/task/instruction.md`, workspaces under `/app/...`). Those paths are what the verifier scripts assume.
+
+**Run time.** [`FrontierSweEnvironment`](frontier_swe_env/server/frontier_swe_env_environment.py) loads a `TaskConfig` via [`get_task_config`](frontier_swe_env/tasks/__init__.py). The task is selected with environment variables (defaults match the image):
+
+- `FSWE_TASK_NAME` — logical name (`postgres`, `notebook-compression`, `dependent-type-checker`, `libexpat-to-x86asm`, …); aliases like `pg` or `type-checker` map to the same factories.
+- `FSWE_TASK_MODE` — `training` vs `demo` (different budgets, attempts, and sometimes instruction source).
+
+From that single config object the environment wires **shared** rubric classes to **task-specific** commands and parsers:
+
+1. **Gate checks** — shell script from `TaskConfig.gate_script_path` (baked from `tasks/...` into the image).
+2. **L1** — [`TestOutputRubric`](frontier_swe_env/rubrics/l1_tests.py) runs `TaskConfig.visible_test_command`. Depending on `l1_score_mode`, it either parses **stdout** with a regex (`ratio` and similar) or reads a structured **`reward.json`** after the verifier finishes (`reward_json` vs `reward_json_score`). Each task’s verifier under `tasks/<id>/tests/` is responsible for producing the format its mode expects.
+3. **L2 / L3** — LLM judges use `task_description`, `task_domain`, and `scoring_context` from `TaskConfig` so prompts stay aligned with that task even though the judge code is shared.
+4. **Episode reward** — [`EpisodeRubric`](frontier_swe_env/rubrics/episode_rubric.py) blends plan quality, mean frozen subtask scores, completion, and tool usage using weights from the same `TaskConfig`.
+
+So: **`tasks/` defines what “correct” means operationally**; **`frontier_swe_env/tasks/` tells the server how to invoke and normalise that signal** inside the shared OpenEnv stack.
+
+**`spaces/*/openenv.yaml`.** These manifests document the Space for judges and tooling (rubric layers, metrics, HF metadata). They should stay **consistent** with the Python `TaskConfig` and Docker layout for the same task. The live server inside the image is driven by **`TaskConfig` + env vars**, not by parsing `openenv.yaml` at runtime.
+
+```mermaid
+flowchart LR
+  subgraph repo["Git repo"]
+    TPACK["tasks/task-id/"]
+    TPY["frontier_swe_env/tasks/*.py"]
+    DOCK["docker/Dockerfile.*"]
+  end
+  subgraph image["Task Docker image"]
+    WS["Workspace /app/..."]
+    VER["Verifier /opt/verifier/"]
+    RJ["/logs/verifier/reward.json optional"]
+  end
+  subgraph runtime["Python server"]
+    CFG["TaskConfig"]
+    ENV["FrontierSweEnvironment"]
+    R1["Gates + L1 + L2 + L3 + EpisodeRubric"]
+  end
+  TPACK --> DOCK
+  DOCK --> WS
+  DOCK --> VER
+  TPY --> CFG
+  CFG --> ENV
+  VER --> RJ
+  ENV --> R1
+  VER -.->|"subprocess"| R1
+```
+
+### L1 score modes (per-task flavour)
+
+[`TaskConfig.l1_score_mode`](frontier_swe_env/task_config.py) selects how L1 turns verifier output into a number in \([0, 1]\):
+
+| Mode | Typical task | Meaning |
+| --- | --- | --- |
+| `ratio` | Postgres wire adapter | Regex on test runner stdout (`Total: N/M passed`). |
+| `reward_json` | Notebook compression | Verifier writes JSON (e.g. `geom_mean_ratio`, `status`); normalisation is mode-specific in `TestOutputRubric`. |
+| `reward_json_score` | Dependent type checker, libexpat assembly | Verifier writes a numeric `score` (field configurable); linear map between `reward_json_score_anchors`, optional hard-fail handling. |
+
+Adding a new task usually means: add `tasks/new-task/`, extend a Dockerfile to copy it, add `frontier_swe_env/tasks/new_task.py` plus [`register_task`](frontier_swe_env/tasks/__init__.py), and add a Space manifest under `spaces/`.
+
+## Task catalog
+
+Short descriptions of what each episode asks for and how **L1** is determined. (Gates, L2 code review, L3 plan review, and episode blending behave the same way structurally; only L1 and task copy differ.)
+
+### Notebook compression (`notebook-compression`)
+
+Agents implement a **lossless** Jupyter `.ipynb` codec as `/app/run` with `fit` / `compress` / `decompress` stages. The hidden verifier under [`tasks/notebook-compression/tests/`](tasks/notebook-compression/tests/) runs the full pipeline and writes [`reward.json`](tasks/notebook-compression/tests/compute_reward.py) with corpus-driven metrics; **byte-exact round-trip** failures are hard fails. Python config in [`notebook_compression.py`](frontier_swe_env/tasks/notebook_compression.py) sets `l1_score_mode="reward_json"`, long `l1_timeout_s`, and `scoring_context` for judges. Benchmark write-up: [FrontierSWE — Notebook compression](https://www.frontierswe.com/notebook-compression).
+
+### Postgres / SQLite wire adapter (`postgres-sqlite-wire-adapter`)
+
+Agents implement a **Zig** binary that speaks enough of the **PostgreSQL wire protocol** to satisfy a tiered compat suite while using **SQLite** for storage. L1 is primarily **`ratio`** mode: the configured command runs [`pg_compat_test.sh`](tasks/postgres-sqlite-wire-adapter/tests/pg_compat_test.sh)-style output and the rubric parses pass counts from stdout. Config and copy live in [`pg.py`](frontier_swe_env/tasks/pg.py) and [`tasks/postgres-sqlite-wire-adapter/`](tasks/postgres-sqlite-wire-adapter/). Benchmark write-up: [FrontierSWE — PostgreSQL on SQLite](https://www.frontierswe.com/postgres-sqlite-wire-adapter).
+
+### Dependent type checker (`dependent-type-checker`)
+
+Agents implement a **Rust** type checker for a small dependently typed surface language; the release binary is exercised by a large accept/reject corpus plus latency benchmarks vs a reference. The verifier emits **`reward_json_score`** with gates on accept/reject rates and anti-cheat signals in JSON. Anchors and timeouts are set in [`dependent_type_checker.py`](frontier_swe_env/tasks/dependent_type_checker.py); the heavy spec and tests live under [`tasks/dependent-type-checker/`](tasks/dependent-type-checker/). Benchmark write-up: [FrontierSWE — Dependent type checker](https://www.frontierswe.com/dependent-type-checker).
+
+### libexpat to x86-64 assembly (`libexpat-to-x86asm`)
+
+Agents produce **`/app/asm-port/libexpat.so`** implementing the **libexpat C ABI** in assembly (no vendored C core). The verifier builds reference C libexpat, runs upstream tests and benchmarks, and writes **`reward_json_score`** (correctness plus performance, with hard fails for missing `.so` or anti-cheat). See [`libexpat_to_x86asm.py`](frontier_swe_env/tasks/libexpat_to_x86asm.py) and [`tasks/libexpat-to-x86asm/`](tasks/libexpat-to-x86asm/). Benchmark write-up: [FrontierSWE — libexpat to x86-64 assembly](https://www.frontierswe.com/libexpat-to-x86asm).
+
+## Quick start
+
+### Install (Python 3.13)
+
+```bash
+uv sync
+```
+
+Optional extras:
+
+```bash
+uv sync --extra test
+```
+For training on local
+```bash
+uv sync --extra training
+```
+
+### Run the API locally (development)
+
+The full task workspace and verifiers are intended to run inside the published Docker images. For a minimal local smoke test of the HTTP app only:
+
+```bash
+uv run uvicorn frontier_swe_env.server.app:app --host 127.0.0.1 --port 8000 --reload
+```
+
+Then open `http://127.0.0.1:8000/health`.
+
+### Run a task image
+
+Replace the image tag with the task you need (see table above). Grader-related env vars are optional unless you want LLM rubric layers to run inside the container.
+
+```bash
+docker run --rm -p 8000:8000 \
+  -e FSWE_GRADER_MODEL=... \
+  -e FSWE_GRADER_API_URL=... \
+  -e FSWE_GRADER_API_KEY=... \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-postgres:latest
+```
+
+For an end-to-end baseline over WebSocket (connect, `reset`, repeated `step`), see [`scripts/run_baseline.py`](scripts/run_baseline.py).
+
+## Python client
+
+```python
+import asyncio
+from frontier_swe_env.client import FrontierSweEnv
+from frontier_swe_env.models import FrontierSweAction
+
+
+async def main():
+    client = FrontierSweEnv(base_url="http://localhost:8000")
+    await client.connect()
+    try:
+        result = await client.reset()
+        print(result.observation.phase)
+        result = await client.step(FrontierSweAction(message="Your turn"))
+        print(result.observation.response)
+    finally:
+        await client.close()
+
+
+asyncio.run(main())
+```
+
+The client maintains a WebSocket session to the server; see `FrontierSweEnv` in [`frontier_swe_env/client.py`](frontier_swe_env/client.py) for `from_docker_image` and timeout options.
+
+## MCP tools (all tasks)
+
+| Tool | Purpose |
+| --- | --- |
+| `submit_plan` | Propose subtasks (`id`, `description`, `acceptance_criteria`); moves PLANNING → EXECUTING. |
+| `submit_subtask` | Run L1 + L2 scoring for the given `subtask_id`. |
+| `get_status` | Snapshot of phase, scores, time remaining, feedback. |
+| `advance` | Freeze the current subtask score and advance to the next. |
+
+Implementations are registered in [`frontier_swe_env/server/mcp_tools.py`](frontier_swe_env/server/mcp_tools.py).
+
+## Environment variables
+
+Typical deployment sets **agent** variables (for the in-container coding harness) and **grader** variables (for LLM rubric layers):
+
+| Prefix | Role |
+| --- | --- |
+| `FSWE_AGENT_MODEL`, `FSWE_AGENT_API_URL`, `FSWE_AGENT_API_KEY` | Agent LLM (also used to generate `/root/.pi/agent/models.json` in the entrypoint when `FSWE_AGENT_API_URL` is set). |
+| `FSWE_GRADER_MODEL`, `FSWE_GRADER_API_URL`, `FSWE_GRADER_API_KEY` | LLM judges for L2/L3 layers in the rubric. |
+
+Exact behaviour is defined per task in each Space `openenv.yaml` under `rubric.layers`.
+
+## Hugging Face Spaces
+
+CI assembles a minimal Space directory (root `Dockerfile`, `README.md`, `openenv.yaml`) from `spaces/<task>/` via [`scripts/prepare_hf_space.py`](scripts/prepare_hf_space.py). The **HF — Sync** workflow pushes to `spaces/{HF_OWNER}/frontier-swe-{notebook|postgres|type-checker|libexpat-to-x86asm}` after images build on `main`.
+
+## Training (offline RL)
+
+A single Frontier SWE episode often runs on the order of **45 minutes to about 90 minutes**, depending on the task, verifier cost, and agent behaviour. That makes dense **online** RL on live environments impractical at scale, so this project uses **offline RL**: collect fixed trajectories, post-process rewards and hindsight signals, build a static training set, then fine-tune on Hugging Face with **Trackio** for metrics.
+
+For **why not GRPO/DPO alone**, **paper vs code** differences, and **equations** mapped to [`scripts/compute_hindsight_scores.py`](scripts/compute_hindsight_scores.py), [`scripts/build_hcapo_dataset.py`](scripts/build_hcapo_dataset.py), and [`training/train_hcapo.py`](training/train_hcapo.py), see [`training/README.md`](training/README.md).
+
+The walk-through below uses the **`postgres-sqlite-wire-adapter`** task as the reference pipeline.
+
+### Data collection and post-processing
+
+1. **Rollouts** — [`scripts/collect_trajectories.py`](scripts/collect_trajectories.py) was used to gather **20 episodes** on a **2× NVIDIA A100** host running **sglang**, with the agent powered by **[`Qwen/Qwen3.6-27B`](https://huggingface.co/Qwen/Qwen3.6-27B)** (Qwen 3.6 27B). Run id **pg-01** labels this batch in tooling and dataset names.
+2. **Backfill** — Some episodes finished without a persisted **`episode_reward`** because of a server-side bug; [`scripts/backfill_rewards.py`](scripts/backfill_rewards.py) was run to fill those fields from episode metadata.
+3. **Hindsight** — [`scripts/compute_hindsight_scores.py`](scripts/compute_hindsight_scores.py) was run with the same **Qwen 3.6 27B** stack to attach per-step hindsight quantities (HCAPO-style) for training. For how that differs from the original HCAPO formulation (paper [2603.08754](https://arxiv.org/abs/2603.08754)), formulae, and design rationale, see [`training/README.md`](training/README.md).
+
+The **raw trajectory bundle** (per-episode `result.json`, `pi_session.jsonl`, `container_logs.txt`, optional `hindsight_scores.json`) is published on Hugging Face as **[`rycerzes/fswe-pg-01-traj-q36-27b`](https://huggingface.co/datasets/rycerzes/fswe-pg-01-traj-q36-27b)**.
+
+### HCAPO dataset build
+
+From a local `trajectories/` tree, the JSONL used for fine-tuning was produced with:
+
+```bash
+uv run python scripts/build_hcapo_dataset.py \
+  --input-dir trajectories \
+  --output-dir datasets \
+  --min-reward 0.05 \
+  --omega 1.0
+```
+
+The resulting **HCAPO training set** is **[`rycerzes/fswe-hcapo-pg-01-trajectories`](https://huggingface.co/datasets/rycerzes/fswe-hcapo-pg-01-trajectories)** (messages + step advantages derived from the pg-01 trajectories).
+
+### Fine-tuning run
+
+Training was launched with:
+
+```bash
+./scripts/launch_hf_space.sh --with-dataset-upload
+```
+
+That configuration runs **3 epochs** over **18 optimizer steps** on the Space-backed trainer (dataset upload + run as implemented in [`scripts/launch_hf_space.sh`](scripts/launch_hf_space.sh)).
+
+**Metrics dashboard (Trackio on Hugging Face):** [`rycerzes/trackio`](https://huggingface.co/spaces/rycerzes/trackio) — run name **`fswe-hcapo-pg-01-qwen36-27b`**.
+
+![Trackio dashboard: loss, epoch, learning rate, gradient norm, and global step for fswe-hcapo-pg-01-qwen36-27b](assets/training-trackio-dashboard.png)
+
+The screenshot above (smoothing ≈ 20 on the step axis) shows a **post-training** phase on the HCAPO dataset:
+
+- **Loss** decreases from roughly **1.0** at the start of the plotted window to about **0.75** by the end (**~25%** relative drop), with noisy raw traces but a clear downward trend in the smoothed curve.
+- **Epoch** advances linearly to approximately **2.7** over the **18** logged steps, consistent with targeting **3 epochs** in a short run.
+- **Learning rate** follows a **warmup then decay**: it rises toward a peak near the middle of the run (on the order of **3.5×10⁻⁶**) and falls toward roughly **1.5×10⁻⁶** by the final steps.
+- **Gradient norm** stays in a moderate band (mostly about **1.0–1.5**, ending near **1.2**), which suggests optimization without obvious gradient blow-ups for this snapshot.
+- **Global step** in the sidebar advances in line with the trainer (e.g. into the low tens over the same window)q
+
+Together, these curves read as a **successful small-scale sanity fine-tune**: loss improves steadily, the LR schedule behaves as expected, and gradients remain bounded.
+
+## Repository layout
+
+- **`frontier_swe_env/`** — FastAPI app, [`FrontierSweEnvironment`](frontier_swe_env/server/frontier_swe_env_environment.py), shared rubrics, MCP tools, [`TaskConfig`](frontier_swe_env/task_config.py), task registry under [`frontier_swe_env/tasks/`](frontier_swe_env/tasks/), models, client.
+- **`tasks/<task-id>/`** — Instructions, verifier scripts, rewards, and data **consumed at image build** (see [Task assets and runtime configuration](#task-assets-and-runtime-configuration)).
+- **`docker/`** — Shared base image, per-task Dockerfiles, [`openenv_entrypoint.sh`](docker/openenv_entrypoint.sh) (uvicorn + optional pi models).
+- **`spaces/`** — Thin HF Space wrappers: Dockerfile pin, README (HF card), `openenv.yaml` for external metadata.
+
+Each Space README under `spaces/*/README.md` is the human-facing description for that Hugging Face Space (including YAML front matter for the Space card).
+
+## Testing
+
+Task-specific verifiers and reward scripts live under `tasks/<task-id>/tests/`. There is no single top-level pytest suite yet; run task-local scripts as documented in each task directory when you change a verifier.
+
+## About
+
+**frontier-swe-openenv** packages Frontier-style long-horizon tasks for [OpenEnv](https://github.com/rycerzes/OpenEnv), adapted from **[FrontierSWE](https://www.frontierswe.com/)** ([`proximal-labs/frontier-swe`](https://github.com/proximal-labs/frontier-swe)). Official benchmark task pages for the four environments here: [postgres-sqlite-wire-adapter](https://www.frontierswe.com/postgres-sqlite-wire-adapter), [libexpat-to-x86asm](https://www.frontierswe.com/libexpat-to-x86asm), [dependent-type-checker](https://www.frontierswe.com/dependent-type-checker), [notebook-compression](https://www.frontierswe.com/notebook-compression).
+
+The OpenEnv runtime dependency is pinned in [`pyproject.toml`](pyproject.toml) (`openenv-core` git source).
diff --git a/assets/blog.md b/assets/blog.md
new file mode 100644
index 0000000..94a8e5b
--- /dev/null
+++ b/assets/blog.md
@@ -0,0 +1,98 @@
+# Building long-horizon SWE environments on Hugging Face: Frontier SWE × OpenEnv
+
+**By the-thing**: we packaged and adapted 4 [FrontierSWE](https://www.frontierswe.com/) tasks as [OpenEnv](https://github.com/rycerzes/OpenEnv)-shaped services, pushed them to **Hugging Face Spaces**, and ran an **offline RL-style** training loop with public **datasets**, **Trackio** metrics, and a trainer Space.
+
+---
+
+## TL;DR
+
+- **Four Dockerized environments** (notebook compression, Postgres wire adapter on SQLite, dependent type checker, libexpat → x86-64 asm) with a **shared Gym-style API** and **MCP** tools for planning and submission.
+- **Custom harness adapter** built on top of OpenEnv harness work ([meta-pytorch/OpenEnv PR #389](https://github.com/meta-pytorch/OpenEnv/pull/389) and RFC005), then forked and extended in [`rycerzes/OpenEnv` on `feature/pi-harness-adapter`](https://github.com/rycerzes/OpenEnv/commits/feature/pi-harness-adapter/).
+- **Composite rubric**: gates → L1 (tests / `reward.json` / regex ratios) → optional LLM layers → **episode reward** you can log and filter on for training.
+- **Offline pipeline**: trajectories on the Hub → hindsight scoring (SGLang) → HCAPO-style dataset → **LoRA fine-tune** on a GPU Space, with **Trackio** curves for loss, LR, and gradient norms.
+
+**Try it:** [frontier-swe-postgres](https://huggingface.co/spaces/rycerzes/frontier-swe-postgres) · [frontier-swe-notebook](https://huggingface.co/spaces/rycerzes/frontier-swe-notebook) · [frontier-swe-type-checker](https://huggingface.co/spaces/rycerzes/frontier-swe-type-checker) · [frontier-swe-libexpat-to-x86asm](https://huggingface.co/spaces/rycerzes/frontier-swe-libexpat-to-x86asm) · [source on GitHub](https://github.com/3xcaffeine/frontier-swe-openenv)
+
+---
+
+## 1. Environment innovation - why this setup is hard (and worth it)
+
+Classic coding benchmarks often score a single patch. **Long-horizon software engineering** is different: the agent has to **plan**, **edit a real workspace**, **call tools**, and **submit** work over many steps-closer to how people ship systems than to a one-shot fix.
+
+**What we built on top of that idea**
+
+We did not reinvent the underlying FrontierSWE task specs; we **re-homed** them inside a **uniform environment contract**:
+
+That includes a **custom harness adapter** layer we built on top of [meta-pytorch/OpenEnv PR #389](https://github.com/meta-pytorch/OpenEnv/pull/389) and RFC005, then maintained and updated in our fork: [`rycerzes/OpenEnv` `feature/pi-harness-adapter`](https://github.com/rycerzes/OpenEnv/tree/feature/pi-harness-adapter/).
+
+| Piece | What it does for the agent |
+| --- | --- |
+| **HTTP control** | `reset` / `step` / `state` / `health` - same shape every task, so harnesses and demos do not fork per domain. Maintaining the `openenv` specs |
+| **MCP tools** | `submit_plan`, `submit_subtask`, `get_status`, `advance` - forces **explicit decomposition** and **scored subtasks**, not a single anonymous blob of edits. |
+| **Multi-layer rubric** | **Gates** catch broken builds or missing artifacts early; **L1** is task-native (wire compat tests, notebook round-trips, type-checker scores, assembly benchmarks); **L2/L3** optionally add LLM code and plan review when grader env vars are set; **episode reward** blends plan quality, frozen subtask scores, completion, and tool usage. |
+
+That combination is deliberately **stressful** in a good way: the agent must **coordinate** (plan → execute → advance), **respect verifier reality** (hidden tests, anti-cheat), and **earn** a dense scalar at the end of an episode that can run on the order of **45–90+ minutes** per run-so the environment is **challenging**, **creative** in how it composes rubrics, and **meaningful** for measuring behavior beyond single-turn chat.
+
+---
+
+## 2. The problem, the box, and what the agent actually does
+
+**Problem.** Training or evaluating agents on real long-horizon SWE needs a **repeatable service**: same ports, same instructions, same scoring, same tool surface-whether you run locally, in CI, or on the Hub.
+
+**Our box.** **frontier-swe-openenv** is a small monorepo: `tasks/<task-id>/` holds instructions and verifiers (what “correct” means operationally); `frontier_swe_env/` holds the **FastAPI** server, shared rubrics, and **TaskConfig** (how to invoke those verifiers inside the image); `spaces/` holds thin **Space** definitions synced from `main` after images build.
+
+**Agent behavior (easy to follow for a demo).**
+
+1. Connect (WebSocket client or baseline script).
+2. `reset` → read observation / phase.
+3. Loop: natural language or tool use → `step` → optional MCP calls to **submit a plan**, run **L1+L2** on a **subtask**, **advance** when satisfied.
+4. Episode ends with a **terminal episode reward** and subtask history you can log.
+
+For a **concrete walkthrough without writing your own client**, the repo ships [`scripts/run_baseline.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/run_baseline.py): point it at `http://localhost:8000` with a task container running, and you get a full **reset → step** episode over the wire-good for recordings and “here is one turn of the loop” explanations.
+
+---
+
+## 3. Observable training progress - rewards, curves
+
+Long episodes make **online** RL on the live env impractical at scale, so we invested in **offline** learning: **collect once**, **score offline**, **fine-tune**, **log everything**.
+
+**Public artifacts (HF-native story)**
+
+| Artifact | Link | Role in the demo |
+| --- | --- | --- |
+| Raw trajectories (pg-01, Qwen 3.6 27B) | [`rycerzes/fswe-pg-01-traj-q36-27b`](https://huggingface.co/datasets/rycerzes/fswe-pg-01-traj-q36-27b) | Shows **what** we logged per episode (`result.json`, sessions, logs, hindsight when present). |
+| HCAPO training JSONL | [`rycerzes/fswe-hcapo-pg-01-trajectories`](https://huggingface.co/datasets/rycerzes/fswe-hcapo-pg-01-trajectories) | **Step-level advantages** paired with messages for supervised fine-tuning. |
+| Trackio dashboard | [`rycerzes/trackio`](https://huggingface.co/spaces/rycerzes/trackio) | **Observable** loss, epoch, learning rate, gradient norm, global step. |
+
+On a **3 epoch / ~18 optimizer step** reference run (Space-backed trainer), the root README documents what we see in Trackio: **loss** trending down on the order of **~25%** over the plotted window (smoothed), **epoch** progressing toward **~2.7**, **LR** warmup-then-decay, **gradient norms** staying in a moderate band-i.e. a **sanity fine-tune** where optimization looks stable, not a mystery box.
+
+We also ship a **static dashboard figure** in-repo for slides and blog embeds: [`assets/training-trackio-dashboard.png`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/assets/training-trackio-dashboard.png).
+
+**Before / after.** The cleanest **before/after** we surface in tooling today is **training loss and optimization metrics** on the HCAPO dataset, plus **episode-level rewards inside collected trajectories** for analysis. A live **A/B rollout score** on the full Docker env after LoRA is the natural next chapter for the demo-and the pipeline is set up so you can **regenerate trajectories** with the adapted policy and compare distributions. For hackathon judging, the **curves + public datasets + reproducible launch script** are the evidence chain we stand behind *right now*.
+
+---
+
+## 4. Reward logic and training pipeline - coherent signal end to end
+
+**Episode reward (macro).** The scalar \(R\) matches [`EpisodeRubric`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/frontier_swe_env/rubrics/episode_rubric.py): weighted **plan score**, mean **frozen subtask** scores, **completion**, and **tool density**-clipped into **[0, 1]** for filtering (e.g. `--min-reward 0.05` in the dataset builder).
+
+**L1 (micro, task-specific).** Each task implements its own verifier output: **regex ratio** on test totals (Postgres), **`reward_json`** fields (notebook), or **`reward_json_score`** with anchors (type checker, libexpat). Same server code paths; different physics.
+
+**Training path (why it should move policy behavior).**
+
+1. [`collect_trajectories.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/collect_trajectories.py) - rollouts into `trajectories/episode_NNN/`.
+2. [`backfill_rewards.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/backfill_rewards.py) - repair missing `episode_reward` when needed.
+3. [`compute_hindsight_scores.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/compute_hindsight_scores.py) - SGLang `/generate` with bounded logprob windows (memory-safe), MCP-aware **step → subtask** mapping, hindsight \(Q^H\) and smoothing.
+4. [`build_hcapo_dataset.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/build_hcapo_dataset.py) - GRPO-style macro advantages + normalized hindsight micro advantages → **JSONL** with **per-step weights**.
+5. [`train_hcapo.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/training/train_hcapo.py) + [`launch_hf_space.sh`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/launch_hf_space.sh) - **weighted CE on assistant tokens** (chunked forward for large models), Trackio reporting.
+
+Coherent design is means that environment reward defines **which episodes matter**; hindsight defines **which tokens inside those episodes** get gradient; the trainer respects **assistant masks** and **step weights** so the update is not “one scalar smeared across the whole transcript.” Details and equations live in [`training/README.md`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/training/README.md)
+
+---
+
+## Where to go next
+
+- **Run a Space** from the TL;DR links and narrate **one** subtask submission end to end.
+- **Open Trackio** to the named run and zoom the **loss / LR** panel while you talk through the pipeline slide.
+- **Clone the repo**, `uv sync`, and use **`./scripts/launch_hf_space.sh`** when you want the full HF training path on your own account.
+
diff --git a/assets/training-trackio-dashboard.png b/assets/training-trackio-dashboard.png
new file mode 100644
index 0000000..f62961c
Binary files /dev/null and b/assets/training-trackio-dashboard.png differ
diff --git a/spaces/libexpat-to-x86asm/README.md b/spaces/libexpat-to-x86asm/README.md
index a13bd4e..f249168 100644
--- a/spaces/libexpat-to-x86asm/README.md
+++ b/spaces/libexpat-to-x86asm/README.md
@@ -10,11 +10,82 @@ pinned: false
 
 # Frontier SWE — libexpat to x86-64 Assembly
 
-OpenEnv-shaped FastAPI service hosting the libexpat-to-x86asm task.
+OpenEnv-shaped **FastAPI** service for the **libexpat-to-x86asm** task: reimplement **libexpat 2.6.4** in **x86-64 assembly**, producing `/app/asm-port/libexpat.so` with the **expat C ABI**. The verifier compares against reference C libexpat, runs upstream tests and benchmarks, and writes `/logs/verifier/reward.json` (correctness and performance blend; hard fail to `0.0` on anti-cheat or missing `.so`).
 
-- Source repo: <https://github.com/3xcaffeine/frontier-swe-openenv>
-- Container image: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-libexpat-to-x86asm:latest`
-- Health: `/health`
-- MCP JSON-RPC: `/mcp`
+## The task in depth
 
-Deployed automatically from `main` via the `sync-hf-spaces` workflow.
+The agent’s deliverable is a **shared library** built from **`.s` / `.asm`** sources under **`/app/asm-port/`**, exporting symbols such as **`XML_ParserCreate`** so the upstream **expat** test suite can link against it. There is **no C compiler** in the agent environment; the verifier may compile reference C code for comparison. Scoring combines **weighted test pass rates** with **benchmark timing ratios** (reference time vs agent time) into a single **`score`** in **`reward.json`**, with explicit anti-cheat checks (no `dlopen` of system libexpat, no smuggled C core files, etc.). The server treats that file in **`reward_json_score`** mode with anchors **`(0.0, 1.0)`**.
+
+## How this maps to the monorepo
+
+- **`tasks/libexpat-to-x86asm/`** — Instructions, encrypted or staged toolchain bundles as designed, **`tests/`** with **`test.sh`**, **`compute_reward.py`**, and benchmark XML generators.
+- **`frontier_swe_env/tasks/libexpat_to_x86asm.py`** — **`TaskConfig`**: workspace **`/app/asm-port`**, gate script, verifier command, JSON path and anchors, CPU/memory hints, and judge context strings.
+- **`spaces/libexpat-to-x86asm/`** — This Space and manifest.
+
+See [**Task assets and runtime configuration**](https://github.com/3xcaffeine/frontier-swe-openenv#task-assets-and-runtime-configuration) in the root README.
+
+## Features
+
+- **Assembly port workspace**: `/app/asm-port` with staged toolchain and bundles (see gate checks in manifest).
+- **Structured L1**: Normalised score from `reward.json`; gates for writable workspace, headers, `nasm` / `as` / `ld`, and staged artifacts.
+- **LLM rubric layers**: L2 code review and L3 plan review when grader env vars are set.
+- **MCP tools**: `submit_plan`, `submit_subtask`, `get_status`, `advance`.
+
+## HTTP API
+
+| Endpoint | Notes |
+| --- | --- |
+| `GET /health` | Liveness. |
+| `POST /reset`, `POST /step`, `GET /state` | OpenEnv Gym-style control. |
+| `POST /mcp` | OpenEnv JSON-RPC MCP. |
+| `/tools/mcp` | FastMCP Streamable HTTP. |
+
+## Quick start (Docker)
+
+```bash
+docker run --rm -p 8000:8000 \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-libexpat-to-x86asm:latest
+```
+
+This task is CPU- and memory-sensitive; the manifest requests **4 CPUs** and **8192 MiB** where the platform allows.
+
+```bash
+docker run --rm -p 8000:8000 \
+  -e FSWE_GRADER_MODEL=... \
+  -e FSWE_GRADER_API_URL=... \
+  -e FSWE_GRADER_API_KEY=... \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-libexpat-to-x86asm:latest
+```
+
+## Python client (host)
+
+```python
+import asyncio
+from frontier_swe_env.client import FrontierSweEnv
+from frontier_swe_env.models import FrontierSweAction
+
+
+async def main():
+    client = FrontierSweEnv(base_url="http://localhost:8000")
+    await client.connect()
+    try:
+        await client.reset()
+        await client.step(FrontierSweAction(message="Continue the assembly port."))
+    finally:
+        await client.close()
+
+
+asyncio.run(main())
+```
+
+## Task manifest
+
+[`openenv.yaml`](openenv.yaml) — episode timeout, L1 timeout, reward field anchors, rubric layers, metrics. Task sources: `tasks/libexpat-to-x86asm/`.
+
+## Deployment
+
+- **Image**: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-libexpat-to-x86asm:latest`
+- **Source**: [3xcaffeine/frontier-swe-openenv](https://github.com/3xcaffeine/frontier-swe-openenv)
+- **Sync**: HF Space updated from `main` after successful GHCR build.
+
+Benchmark context: [FrontierSWE — libexpat to x86-64 assembly](https://www.frontierswe.com/libexpat-to-x86asm).
diff --git a/spaces/notebook/README.md b/spaces/notebook/README.md
index 9979b1f..2256687 100644
--- a/spaces/notebook/README.md
+++ b/spaces/notebook/README.md
@@ -10,12 +10,84 @@ pinned: false
 
 # Frontier SWE — Notebook Compression
 
-OpenEnv-shaped FastAPI service hosting the notebook-compression task.
+OpenEnv-shaped **FastAPI** service for the **notebook-compression** task: build a fit / compress / decompress pipeline for Jupyter notebooks inside a Linux workspace, with multi-layer rubric scoring and a structured `reward.json` written by the verifier.
 
-- Source repo: <https://github.com/3xcaffeine/frontier-swe-openenv>
-- Container image: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-notebook:latest`
-- Health: `/health`
-- MCP JSON-RPC: `/mcp`
+## The task in depth
 
-Deployed automatically from `main` via the `sync-hf-spaces` workflow.
+The agent needs to ship an executable **`/app/run`** with three subcommands: **`fit`** (train or build artifacts from a **visible** corpus only), **`compress`**, and **`decompress`**. At scoring time the agent does not see the hidden corpus: the verifier checks **byte-for-byte** recovery of every notebook file. Compression quality is summarised as a geometric mean of size ratios; hard failures (round-trip mismatch, crashes, invalid `reward.json` status) collapse the L1 signal to zero. That logic lives in the repo under [`tasks/notebook-compression/tests/`](https://github.com/3xcaffeine/frontier-swe-openenv/tree/main/tasks/notebook-compression/tests) (shell driver plus [`compute_reward.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/tasks/notebook-compression/tests/compute_reward.py)), which writes **`/logs/verifier/reward.json`** for the server to read.
 
+## How this maps to the monorepo
+
+- **`tasks/notebook-compression/`** — Authoritative instructions, verifier, and reward computation; copied into the image (for example **`/opt/verifier/test.sh`** and data mounts).
+- **`frontier_swe_env/tasks/notebook_compression.py`** — Registers **`TaskConfig`** with `l1_score_mode="reward_json"`, the container test command, long L1 timeouts, gate path, and prose for L2/L3 judges. The running server selects it when `FSWE_TASK_NAME` is `notebook` or `notebook-compression` (see [`__init__.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/frontier_swe_env/tasks/__init__.py)).
+- **`spaces/notebook/`** — This Space: thin Dockerfile, this README, and **`openenv.yaml`** describing the same episode for Hugging Face and external tooling.
+
+For the full picture of how task directories and Python configs interact, see the root README section [**Task assets and runtime configuration**](https://github.com/3xcaffeine/frontier-swe-openenv#task-assets-and-runtime-configuration).
+
+## Features
+
+- **Long-horizon SWE**: Plan subtasks, edit code under the configured workspace, submit for scoring.
+- **Composite rubric**: Shell gate checks → structured L1 from `/logs/verifier/reward.json` → optional LLM code review (L2) and plan review (L3) → weighted episode reward.
+- **MCP tools**: `submit_plan`, `submit_subtask`, `get_status`, `advance` (same contract as other Frontier SWE Spaces).
+- **Dual MCP transports**: OpenEnv `POST /mcp` and Streamable HTTP `/tools/mcp` for adapters.
+
+## HTTP API
+
+| Endpoint | Notes |
+| --- | --- |
+| `GET /health` | Liveness for orchestration and HF health checks. |
+| `POST /reset`, `POST /step`, `GET /state` | OpenEnv Gym-style control. |
+| `POST /mcp` | OpenEnv JSON-RPC MCP. |
+| `/tools/mcp` | FastMCP Streamable HTTP (POST + GET/SSE). |
+
+## Quick start (Docker)
+
+```bash
+docker run --rm -p 8000:8000 \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-notebook:latest
+```
+
+Optional grader configuration for LLM rubric layers:
+
+```bash
+docker run --rm -p 8000:8000 \
+  -e FSWE_GRADER_MODEL=... \
+  -e FSWE_GRADER_API_URL=... \
+  -e FSWE_GRADER_API_KEY=... \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-notebook:latest
+```
+
+## Python client (host)
+
+From the [source repository](https://github.com/3xcaffeine/frontier-swe-openenv), with dependencies installed:
+
+```python
+import asyncio
+from frontier_swe_env.client import FrontierSweEnv
+from frontier_swe_env.models import FrontierSweAction
+
+
+async def main():
+    client = FrontierSweEnv(base_url="http://localhost:8000")
+    await client.connect()
+    try:
+        await client.reset()
+        await client.step(FrontierSweAction(message="Continue the task."))
+    finally:
+        await client.close()
+
+
+asyncio.run(main())
+```
+
+## Task manifest
+
+OpenEnv metadata for judges and tooling: [`openenv.yaml`](openenv.yaml) in this Space (mirrors `spaces/notebook/openenv.yaml` in the GitHub repo). Task sources: `tasks/notebook-compression/`.
+
+## Deployment
+
+- **Image**: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-notebook:latest`
+- **Source**: [3xcaffeine/frontier-swe-openenv](https://github.com/3xcaffeine/frontier-swe-openenv)
+- **Sync**: Pushed from `main` by the repository’s HF Spaces sync workflow after GHCR builds succeed.
+
+Benchmark context: [FrontierSWE — Notebook compression](https://www.frontierswe.com/notebook-compression).
diff --git a/spaces/postgres/README.md b/spaces/postgres/README.md
index 944cbad..89705fb 100644
--- a/spaces/postgres/README.md
+++ b/spaces/postgres/README.md
@@ -8,13 +8,88 @@ app_port: 8000
 pinned: false
 ---
 
-# Frontier SWE — Postgres SQLite Wire Adapter
+# Frontier SWE — Postgres / SQLite Wire Adapter
 
-OpenEnv-shaped FastAPI service hosting the postgres-sqlite-wire-adapter task.
+OpenEnv-shaped **FastAPI** service for the **postgres-sqlite-wire-adapter** task: implement a PostgreSQL wire-protocol-compatible server in **Zig** backed by **SQLite**, with gate checks, a graded test runner, and composite rubric scoring.
 
-- Source repo: <https://github.com/3xcaffeine/frontier-swe-openenv>
-- Container image: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-postgres:latest`
-- Health: `/health`
-- MCP JSON-RPC: `/mcp`
+## The task in depth
 
-Deployed automatically from `main` via the `sync-hf-spaces` workflow.
+The workspace is **`/app/postgres-sqlite`**. The agent grows a Zig project that mimics enough **`postgres` / `pg_ctl` / `initdb`** behaviour and the **Frontend/Backend protocol** so that real PostgreSQL clients can connect and run a large scripted compatibility matrix. **L1** is driven by a visible test script whose stdout looks like **`Total: N/M passed`**; the shared rubric parses that as a pass ratio (see `l1_score_mode="ratio"`). Hidden or stronger checks can live alongside the same task pack under [`tasks/postgres-sqlite-wire-adapter/tests/`](https://github.com/3xcaffeine/frontier-swe-openenv/tree/main/tasks/postgres-sqlite-wire-adapter/tests). Unlike the JSON-heavy tasks, there is no requirement for `reward.json` unless you extend the verifier that way.
+
+## How this maps to the monorepo
+
+- **`tasks/postgres-sqlite-wire-adapter/`** — Stubs, instructions, **`pg_compat_test.sh`**, smoke tests, and hidden verifier assets copied into the image.
+- **`frontier_swe_env/tasks/pg.py`** — **`TaskConfig`** for this task: Zig workspace path, **`bash /app/gate_checks.sh`**, **`PG_PORT=55432 bash /app/pg_compat_test.sh`** as the L1 command, regex pattern for totals, timeouts, and judge-facing descriptions.
+- **`spaces/postgres/`** — Space wrapper and **`openenv.yaml`** aligned with the same episode.
+
+More detail: [**Task assets and runtime configuration**](https://github.com/3xcaffeine/frontier-swe-openenv#task-assets-and-runtime-configuration) in the root README.
+
+## Features
+
+- **Systems programming focus**: Zig workspace under `/app/postgres-sqlite`, verifier and hidden tests shipped in the image.
+- **L1 scoring**: Regex ratio over test runner output (`Total: N/M passed`) plus gate script.
+- **LLM-assisted layers**: L2 code review and L3 plan review when grader env vars are set.
+- **MCP tools**: `submit_plan`, `submit_subtask`, `get_status`, `advance`.
+
+## HTTP API
+
+| Endpoint | Notes |
+| --- | --- |
+| `GET /health` | Liveness. |
+| `POST /reset`, `POST /step`, `GET /state` | OpenEnv Gym-style control. |
+| `POST /mcp` | OpenEnv JSON-RPC MCP. |
+| `/tools/mcp` | FastMCP Streamable HTTP. |
+
+## Quick start (Docker)
+
+```bash
+docker run --rm -p 8000:8000 \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-postgres:latest
+```
+
+With grader API for full rubric:
+
+```bash
+docker run --rm -p 8000:8000 \
+  -e FSWE_GRADER_MODEL=... \
+  -e FSWE_GRADER_API_URL=... \
+  -e FSWE_GRADER_API_KEY=... \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-postgres:latest
+```
+
+## Baseline script
+
+The repo ships [`scripts/run_baseline.py`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/scripts/run_baseline.py) for a full WebSocket episode against a running container (defaults to `http://localhost:8000`).
+
+## Python client (host)
+
+```python
+import asyncio
+from frontier_swe_env.client import FrontierSweEnv
+from frontier_swe_env.models import FrontierSweAction
+
+
+async def main():
+    client = FrontierSweEnv(base_url="http://localhost:8000")
+    await client.connect()
+    try:
+        await client.reset()
+        await client.step(FrontierSweAction(message="Implement the next milestone."))
+    finally:
+        await client.close()
+
+
+asyncio.run(main())
+```
+
+## Task manifest
+
+[`openenv.yaml`](openenv.yaml) — workspace, timeouts, rubric layers, and metrics. Task sources: `tasks/postgres-sqlite-wire-adapter/`.
+
+## Deployment
+
+- **Image**: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-postgres:latest`
+- **Source**: [3xcaffeine/frontier-swe-openenv](https://github.com/3xcaffeine/frontier-swe-openenv)
+- **Sync**: HF Space payload is assembled from this directory on `main` after GHCR builds.
+
+Benchmark context: [FrontierSWE — PostgreSQL on SQLite](https://www.frontierswe.com/postgres-sqlite-wire-adapter).
diff --git a/spaces/type-checker/README.md b/spaces/type-checker/README.md
index 2e228d6..7c86439 100644
--- a/spaces/type-checker/README.md
+++ b/spaces/type-checker/README.md
@@ -10,11 +10,84 @@ pinned: false
 
 # Frontier SWE — Dependent Type Checker
 
-OpenEnv-shaped FastAPI service hosting the dependent-type-checker task.
+OpenEnv-shaped **FastAPI** service for the **dependent-type-checker** task: implement a Martin-Löf-style dependently typed language **type checker** in **Rust** (`cargo build --release`), scored on correctness gates and speedup versus a reference implementation via `/logs/verifier/reward.json`.
 
-- Source repo: <https://github.com/3xcaffeine/frontier-swe-openenv>
-- Container image: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-dependent-type-checker:latest`
-- Health: `/health`
-- MCP JSON-RPC: `/mcp`
+## The task in depth
 
-Deployed automatically from `main` via the `sync-hf-spaces` workflow.
+The agent edits **`/app/type-checker/`** (Cargo project) and must produce a release binary that type-checks `.sexp` programs for a language with dependent functions, inductive families, cumulativity, and related features spelled out in **`instruction.md`**. The verifier (**`bash /opt/verifier/test.sh`**) enforces anti-cheat rules, checks accept/reject corpus rates, then measures speedups vs a reference implementation on fixed workloads. It writes **`/logs/verifier/reward.json`** with a numeric **`score`** and optional **`additional_data.reason`** on hard fail. Python config uses **`l1_score_mode="reward_json_score"`** with anchors **`(0.0, 2.0)`** so the server normalises that scalar into the shared \([0,1]\) L1 channel.
+
+## How this maps to the monorepo
+
+- **`tasks/dependent-type-checker/`** — Full formal spec, corpora, reference implementation pieces, and verifier scripts under **`tests/`**.
+- **`frontier_swe_env/tasks/dependent_type_checker.py`** — Registers **`TaskConfig`** (`dependent-type-checker` / alias `type-checker`), build command, verifier timeout, JSON field names, and training vs demo instruction loading (demo can pull [`instruction.md`](https://github.com/3xcaffeine/frontier-swe-openenv/blob/main/tasks/dependent-type-checker/instruction.md) from the repo when present on the host).
+- **`spaces/type-checker/`** — This Space; GHCR image name uses **`frontier-swe-dependent-type-checker`**.
+
+Architecture overview: [**Task assets and runtime configuration**](https://github.com/3xcaffeine/frontier-swe-openenv#task-assets-and-runtime-configuration).
+
+## Features
+
+- **Rust workspace**: `/app/type-checker` with release binary expected by the verifier.
+- **Structured L1**: Score from `reward.json` (normalised with configured anchors, hard-fail signals documented in manifest).
+- **Gate checks**: Workspace, `Cargo.toml`, toolchain, and successful release build.
+- **MCP tools**: `submit_plan`, `submit_subtask`, `get_status`, `advance`.
+
+## HTTP API
+
+| Endpoint | Notes |
+| --- | --- |
+| `GET /health` | Liveness. |
+| `POST /reset`, `POST /step`, `GET /state` | OpenEnv Gym-style control. |
+| `POST /mcp` | OpenEnv JSON-RPC MCP. |
+| `/tools/mcp` | FastMCP Streamable HTTP. |
+
+## Quick start (Docker)
+
+The GHCR image name uses `dependent-type-checker` (the workflow task id), while this Hugging Face Space repo id uses `type-checker`.
+
+```bash
+docker run --rm -p 8000:8000 \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-dependent-type-checker:latest
+```
+
+With grader API:
+
+```bash
+docker run --rm -p 8000:8000 \
+  -e FSWE_GRADER_MODEL=... \
+  -e FSWE_GRADER_API_URL=... \
+  -e FSWE_GRADER_API_KEY=... \
+  ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-dependent-type-checker:latest
+```
+
+## Python client (host)
+
+```python
+import asyncio
+from frontier_swe_env.client import FrontierSweEnv
+from frontier_swe_env.models import FrontierSweAction
+
+
+async def main():
+    client = FrontierSweEnv(base_url="http://localhost:8000")
+    await client.connect()
+    try:
+        await client.reset()
+        await client.step(FrontierSweAction(message="Work on the type checker."))
+    finally:
+        await client.close()
+
+
+asyncio.run(main())
+```
+
+## Task manifest
+
+[`openenv.yaml`](openenv.yaml) — build command, L1 timeouts, reward anchors, rubric. Task sources: `tasks/dependent-type-checker/`.
+
+## Deployment
+
+- **Image**: `ghcr.io/3xcaffeine/frontier-swe-openenv/frontier-swe-dependent-type-checker:latest`
+- **Source**: [3xcaffeine/frontier-swe-openenv](https://github.com/3xcaffeine/frontier-swe-openenv)
+- **Sync**: Deployed from `main` via the repository HF Spaces workflow.
+
+Benchmark context: [FrontierSWE — Dependent type checker](https://www.frontierswe.com/dependent-type-checker).
diff --git a/training/README.md b/training/README.md
new file mode 100644
index 0000000..5572ba8
--- /dev/null
+++ b/training/README.md
@@ -0,0 +1,453 @@
+# HCAPO training pipeline
+
+This document describes the **HCAPO-inspired** training flow used for Frontier SWE trajectory fine-tuning: how **episode rewards** are defined, how **hindsight** scores become **step advantages**, what the **training dataset** contains, and what **training / runtime** adjustments were made for **Qwen** models and **Hugging Face GPU** Spaces.
+
+For a short end-to-end recipe (datasets on the Hub, Trackio, launch commands), see the **Training** section in the [root README](../README.md).
+
+---
+
+## Design rationale
+
+### Why not online RL (e.g. GRPO on the live environment)?
+
+Episodes often last on the order of **45–90+ minutes**. Online methods that need **many fresh rollouts per policy update** are **impractical**: orchestration, verifier time, and failures dominate before the optimiser sees enough data. We **collect trajectories once**, score them **offline**, build a **static** dataset, then fine-tune.
+
+### Why not plain DPO or scalar reward-weighted SFT?
+
+- **DPO** wants preference-style contrasts; our logs are **single** multi-turn trajectories with tools, not natural pairs per step.
+- **Scalar reward-weighted SFT** applies **one weight per episode** and does not say **which assistant turns** helped. **HCAPO-style** credit assigns **macro** (trajectory) and **micro** (hindsight) signals per step.
+
+### Relation to the [HCAPO paper](https://arxiv.org/abs/2603.08754) (2603.08754)
+
+There is **no official end-to-end** public repo for the full paper stack (ALFWorld + WebShop + Search QA + multi-GPU online GRPO + generative verification). **Appendix B** of the [HTML version](https://arxiv.org/html/2603.08754v1) is essentially runnable pseudocode (rollouts, \(\pi_{\text{hind}}\), \(\rho_t\), composite advantage, PPO-style update). Helpful forks: [Awesome-GRPO](https://github.com/GITrans/Awesome-GRPO), [direct-preference-optimization](https://github.com/eric-mitchell/direct-preference-optimization) (PPO/GRPO helpers).
+
+| Paper (conceptual) | This repo |
+| --- | --- |
+| Online GRPO-style RL | **Offline** pipeline: [`collect_trajectories.py`](../scripts/collect_trajectories.py) → hindsight → [`build_hcapo_dataset.py`](../scripts/build_hcapo_dataset.py) → [`train_hcapo.py`](train_hcapo.py) |
+| Terminal reward emphasis | **Dense** `plan_score` + `frozen_scores` in prompts and in \(Q^H\) when dense mode is on ([`compute_hindsight_scores.py`](../scripts/compute_hindsight_scores.py)) |
+| Generic step alignment | **MCP tool boundaries**: [`map_steps_to_subtasks()`](../scripts/compute_hindsight_scores.py) unwraps outer `mcp` calls, parses `submit_plan` / `advance`, assigns **phase** and **subtask_id** |
+| PPO-clipped policy gradient | **Step-weighted SFT**: combined advantages → JSONL → weighted CE in `HCAPOTrainer` |
+| Generic logprob API | **SGLang** native `/generate`, `logprob_start_len`, bounded action scoring, retries ([`score_step_logprobs()`](../scripts/compute_hindsight_scores.py)) |
+
+---
+
+## Pipeline overview
+
+1. **Collect trajectories** — [`scripts/collect_trajectories.py`](../scripts/collect_trajectories.py). Each `trajectories/episode_NNN/` holds `result.json`, `pi_session.jsonl`, logs, and later `hindsight_scores.json`.
+
+2. **Backfill or read episode reward** — `result.json` stores final reward and subtask scores. If an episode does not reach `DONE`, [`scripts/backfill_rewards.py`](../scripts/backfill_rewards.py) (and collection-time logic in `collect_trajectories.py`) can fill **`episode_reward`** from captured state.
+
+3. **Compute hindsight scores** — [`scripts/compute_hindsight_scores.py`](../scripts/compute_hindsight_scores.py) calls SGLang’s native **`/generate`** (via `httpx`) to score original assistant actions under hindsight context; writes **`hindsight_scores.json`**.
+
+4. **Build and train** — [`scripts/build_hcapo_dataset.py`](../scripts/build_hcapo_dataset.py) merges trajectory-level advantages with step-level hindsight and writes `datasets/hcapo_train.jsonl`. [`train_hcapo.py`](train_hcapo.py) runs weighted SFT (Unsloth + TRL). [`launch_hf_space.sh`](../scripts/launch_hf_space.sh) wraps HF Space / dataset upload flows.
+
+---
+
+## Episode reward
+
+The scalar **\(R\)** stored in trajectories and used by the dataset builder matches the **episode rubric** in code ([`EpisodeRubric.compute`](../frontier_swe_env/rubrics/episode_rubric.py)):
+
+```text
+R = plan_weight   * plan_score
+  + subtask_weight * subtask_mean
+  + completion_weight * completion
+  + tool_weight   * tool_density
+```
+
+With default weights (`TaskConfig`): **0.25 / 0.60 / 0.10 / 0.05**:
+
+```text
+plan_count      = max(len(plan), 1)
+subtask_mean    = mean(frozen subtask scores, padded with 0.0 to plan_count)
+completion      = min(number_of_frozen_scores / plan_count, 1.0)
+tool_density      = min(tool_call_count / (5 * plan_count), 1.0)
+```
+
+**\(R\)** is treated as lying in **[0, 1]** for reporting (and filtering with `--min-reward`).
+
+Planning-only episodes can still get a small **\(R\)** via **`tool_density`**. Under **dense** hindsight scoring, steps often still carry **\(r_t = 0\)** until there is a nonzero **`plan_score`** or **`frozen_scores[subtask_id]`**, so they contribute little after advantage clipping.
+
+---
+
+## Step-to-subtask mapping
+
+[`map_steps_to_subtasks()`](../scripts/compute_hindsight_scores.py) assigns each **assistant** message:
+
+- **Planning** — until a **`submit_plan`** tool call succeeds (JSON tool response, no error prefix).
+- **Executing** — after a successful plan; **`advance`** (on success) moves the current subtask index.
+
+Per-step metadata includes:
+
+```json
+{
+  "phase": "executing",
+  "subtask_id": "S2",
+  "subtask_reward": 0.13
+}
+```
+
+**`subtask_reward`** is **`plan_score`** in planning, else **`frozen_scores[subtask_id]`** in executing.
+
+**Outer `mcp` wrapper:** Pi/OpenEnv may emit tool calls under an outer function name `mcp` with nested JSON naming the real tool (e.g. `openenv_submit_plan`). [`_extract_effective_tool_names()`](../scripts/compute_hindsight_scores.py) unwraps that so transitions key off **`submit_plan`**, **`advance`**, etc.
+
+---
+
+## Hindsight prompt
+
+For each assistant action, the scorer appends a block (see `HINDSIGHT_TEMPLATE` in [`compute_hindsight_scores.py`](../scripts/compute_hindsight_scores.py)) including:
+
+```text
+Final reward
+Phase reached
+Plan score
+Subtask scores (summary)
+Subtasks completed / plan count
+Current subtask
+Current subtask score
+```
+
+That text is **post-hoc** (not visible during the original rollout). The scoring model then receives a forward request whose labels are used only to read **input-token logprobs** for the **original** assistant tokens.
+
+---
+
+## Hindsight scoring via SGLang (`/generate`)
+
+The script uses SGLang’s native **`POST .../generate`** with **`httpx.AsyncClient`**, not the OpenAI-compatible chat-completions path with `echo` + `logprobs` on the **full** prompt (which can force huge logits tensors and **OOM the server**).
+
+Payload highlights:
+
+```text
+return_logprob     = true
+logprob_start_len  = prefix_len + skipped_action_tokens
+```
+
+Here **`skipped_action_tokens`** trims the start of the **action** so only the last **`min(action_len, max_logprob_tokens)`** action tokens are scored—reducing work from roughly **`seq_len × vocab`** to **`max_logprob_tokens × vocab`** for the logprob slice.
+
+**CLI defaults** (see argparse in [`compute_hindsight_scores.py`](../scripts/compute_hindsight_scores.py)):
+
+```text
+--concurrency        1
+--max-context        32768
+--max-logprob-tokens 2048    # increase (e.g. 4096) for longer actions if the server allows
+--batch-size         4
+```
+
+**Retries:** exponential backoff on 500 / 502 / 503 / 504 / 204 and OOM-like error strings (`_MAX_RETRIES`, `_RETRY_BASE_DELAY`).
+
+---
+
+## Hindsight scoring formulae
+
+Let **`mean_logprob_t`** be the mean log-probability of the **scored** action token suffix under the hindsight-augmented prefix.
+
+```text
+pi_hind_t = exp(mean_logprob_t / T_temp)     # default T_temp = 5.0
+pi_mean   = mean_t(pi_hind_t)
+rho_raw_t = pi_hind_t / pi_mean
+rho_t     = clip(rho_raw_t, c_min, c_max)   # defaults 0.8, 1.2
+```
+
+**Dense rewards (default):**
+
+```text
+Q_H_t = rho_t * gamma^(group_end(t) - t) * r_t
+```
+
+- **`r_t`**: dense step reward (`subtask_reward` above).
+- **`group_end(t)`**: last step index in the same **subtask id** (or planning phase bucket).
+
+**Terminal fallback** (`--no-dense-rewards`):
+
+```text
+Q_H_t = rho_t * gamma^(T - 1 - t) * R
+```
+
+**Temporal smoothing** (`--alpha`, default `0.5`):
+
+```text
+Q_smooth_(T-1) = Q_H_(T-1)
+Q_smooth_t       = alpha * Q_H_t + (1 - alpha) * Q_smooth_(t+1)   # backward pass
+```
+
+[`build_hcapo_dataset.py`](../scripts/build_hcapo_dataset.py) uses **`q_h_smoothed`** unless **`--no-smooth`**.
+
+---
+
+## HCAPO advantage construction
+
+Episodes must pass **`--min-reward`** and contain **`hindsight_scores.json`**.
+
+### Trajectory (macro) advantage
+
+```text
+A_grpo_i = (R_i - mean(R)) / std(R)
+```
+
+If **`std(R) == 0`**, the code uses **`1.0`** instead ([`compute_grpo_advantages()`](../scripts/build_hcapo_dataset.py)).
+
+### Hindsight (micro) advantage
+
+Over **all kept steps** in the batch:
+
+```text
+mu_h    = mean(q_h_smoothed_t)
+sigma_h = std(q_h_smoothed_t)
+A_micro_t = (q_h_smoothed_t - mu_h) / sigma_h
+```
+
+**Do-no-harm:** if **`A_grpo_i > 0`**, then **`A_micro_t ← max(A_micro_t, 0)`**.
+
+### Combined advantage and JSONL weights
+
+```text
+A_hcapo_t = A_grpo_i + omega * A_micro_t          # default omega = 1.0
+w_t_raw   = max(A_hcapo_t, 0)
+w_t       = w_t_raw / mean(w_t_raw | w_t_raw > 0)
+```
+
+Rows where **all** **`w_t`** are zero are dropped.
+
+---
+
+## Dataset format
+
+`datasets/hcapo_train.jsonl` — one JSON object per episode (example shape):
+
+```json
+{
+  "messages": [...],
+  "step_advantages": [1.23, 0.87, 1.45],
+  "step_message_indices": [1, 4, 7],
+  "_episode_id": 12,
+  "_reward": 0.4058,
+  "_grpo_advantage": 0.91,
+  "_num_steps": 67
+}
+```
+
+Example summary from a **pg-01** run (`hcapo_summary.json` after build):
+
+```text
+total_episodes_loaded = 20
+episodes_in_dataset   = 14
+total_steps           = 1414
+nonzero_steps         = 1391
+min_reward            = 0.05
+omega                 = 1.0
+use_smoothed          = true
+```
+
+(Exact counts depend on your local `trajectories/` and flags.)
+
+---
+
+## Training loss
+
+**HCAPOTrainer** ([`train_hcapo.py`](train_hcapo.py)) applies **step-weighted** cross-entropy on **assistant** tokens only. Conceptually, for token position **`j`** belonging to assistant step **`t`**:
+
+```text
+CE_j            = cross_entropy(logits_j, label_j)
+weighted_loss   = sum_j w_t(j) * CE_j / sum_j w_t(j) * mask_j
+```
+
+Only labels with supervision (and assistant spans) contribute; **`ignore_index = -100`** drops non-target positions. Long sequences are summed in **chunks** (e.g. 256 positions) inside **`compute_loss`** to cap peak memory.
+
+---
+
+## Training adjustments (Qwen, Unsloth, HF)
+
+### Qwen 3.5 / 3.6 architecture and wrappers
+
+Many Qwen 3.x checkpoints use **`Qwen3_5ForConditionalGeneration`**: a multimodal module tree that still includes **`language_model`** + **`lm_head`** for text. With **PEFT / Unsloth**, you often get:
+
+```text
+PeftModelForCausalLM
+  └── LoraModel
+        └── Qwen3_5ForConditionalGeneration
+              ├── model (Qwen3_5Model)
+              │     └── language_model  ← text backbone for loss
+              └── lm_head
+```
+
+[`_get_backbone_and_lm_head()`](train_hcapo.py) unwraps **PeftModel → LoraModel → inner CausalLM**, then uses **`.model`** as the transformer backbone and follows **`.language_model`** when present so **`lm_head.in_features`** matches **hidden states**.
+
+Reported sizes (for sanity checks):
+
+```text
+Qwen3.5-4B:   hidden_size = 2560,  vocab_size = 248320
+Qwen3.6-27B: hidden_size = 5120,  vocab_size = 248320
+```
+
+[`_remove_qwen_vision_mappings()`](train_hcapo.py) strips vision-related **`auto_map`** entries so Unsloth does not treat a text-only checkpoint as a vision pipeline.
+
+### Chat template and `assistant_masks`
+
+Transformers only fills **`assistant_masks`** when the Jinja template wraps assistant generations with:
+
+```jinja
+{% generation %}
+...
+{% endgeneration %}
+```
+
+Qwen templates may omit this. The trainer **patches the tokenizer chat template in memory** (see [`_ensure_generation_chat_template()`](train_hcapo.py)) so **`apply_chat_template(..., return_assistant_tokens_mask=True)`** works in one pass—important for long Pi sessions.
+
+### Pre-tokenization vs `formatting_func`
+
+Unsloth’s SFT path often wants a **`formatting_func`** when there is no plain **`text`** column. We **pre-tokenize** rows to **`input_ids`** + **`assistant_masks`** + **`step_advantages`** so Unsloth can skip conversational re-formatting at train time. After that, **`assistant_only_loss`** is set **`False`** in **`SFTConfig`**; the **HCAPO collator** enforces assistant-only regions via masks.
+
+### HCAPO data collator
+
+[`_build_hcapo_data_collator()`](train_hcapo.py):
+
+1. Strips metadata columns before the base collator runs.
+2. Uses **`assistant_masks`** so non-assistant positions are **`ignore_index`**.
+3. Finds contiguous **assistant label spans** in **`labels`**.
+4. Assigns each span the corresponding **`step_advantages`** entry.
+5. Adds **`step_weights`** to the batch for **`HCAPOTrainer`**.
+
+If Unsloth swaps the collator during init, the trainer **re-applies** the HCAPO collator so **`step_weights`** are not dropped.
+
+### Chunked backbone + `lm_head` projection
+
+For **27B × long context**, a single **`model(**inputs)`** that returns full **`[batch, seq, vocab]`** logits can exceed **A100 80GB**. The custom **`compute_loss`** path:
+
+1. Runs the **text backbone** with **`use_cache=False`**.
+2. Drops the large activations that are not needed for the next chunk.
+3. Applies **`lm_head`** in **chunks** (default width **256** tokens).
+4. Accumulates weighted CE numerator and denominator across chunks.
+
+Peak logits memory scales like **`O(chunk × vocab)`** instead of **`O(seq × vocab)`**.
+
+### Liger
+
+**`liger-kernel>=0.7.0`** is a project dependency. Fused kernels can still help **inside** transformer blocks during the backbone forward. The **custom** loss path does **not** call Liger’s fused CE for the final weighted loss (we need arbitrary **`step_weights`** per position).
+
+### Adapter vs merged weights
+
+Prefer saving the **LoRA adapter** (`save_merged_16bit: false` in config) to avoid multi‑tens‑of‑GB merged checkpoints. Load **base + adapter** at inference.
+
+### No QLoRA for the A100 Qwen 3.6 recipe
+
+The reference HF config keeps **`load_in_4bit: false`** for the 27B Space run so training stays on the **bf16 LoRA** path without 4-bit quant quirks on this stack.
+
+---
+
+## Configurations
+
+Paths are wired in [`launch_hf_space.sh`](../scripts/launch_hf_space.sh) and copied in [`Dockerfile.train`](Dockerfile.train):
+
+| File | Role |
+| --- | --- |
+| [`hcapo_config_4090_q35_4b.json`](hcapo_config_4090_q35_4b.json) | Local **4090** smoke: **`Qwen/Qwen3.5-4B`**, **`max_seq_length` 1024**, **`num_train_epochs` 1**, **`per_device_train_batch_size` 1**, **`gradient_accumulation_steps` 8**, **`warmup_steps` 5**, **`load_in_4bit` false**. |
+| [`hcapo_config_a100_q36_27b.json`](hcapo_config_a100_q36_27b.json) | **A100** HF recipe: **`Qwen/Qwen3.6-27B`**, **`max_seq_length` 16384**, **`num_train_epochs` 3**, **`per_device_train_batch_size` 1**, **`gradient_accumulation_steps` 4**, **`warmup_steps` 2**, **`load_in_4bit` false**, **`save_merged_16bit` false**. |
+
+**Step budget:** with **`per_device_train_batch_size = 1`** and **`gradient_accumulation_steps = 4`**, Hugging Face / TRL advance the optimiser roughly **`len(train_dataloader) // 4`** times per epoch (exact rounding depends on version and **`drop_last`**). For **~14** JSONL rows that is on the order of **three** updates per epoch, so **three epochs → ~nine** global steps unless **`--max-steps`** or a larger dataset changes the schedule. If Trackio shows a different total (e.g. **18**), compare the **`max_steps`** / dataset size / launch overrides for that run.
+
+---
+
+## HF Spaces behaviour
+
+### Health check (port **7860**)
+
+Spaces expect HTTP on **7860** within the startup window. [`Dockerfile.train`](Dockerfile.train) starts a tiny background server before training:
+
+```bash
+uv run python -m http.server 7860 &>/dev/null &
+```
+
+### Container lifecycle
+
+Training should **not** `exec` into the trainer as **PID 1**: when the process exits, the container dies and the Space may restart. The image keeps **bash** as PID **1**, runs training, then **`sleep infinity`** so the Space stays up until you pause or delete it.
+
+```bash
+huggingface-cli space pause <user>/<space-name>
+```
+
+### Dependencies
+
+Training extras live under **`[project.optional-dependencies] training`** in [`pyproject.toml`](../pyproject.toml). The training image installs with:
+
+```text
+uv sync --frozen --no-dev --extra training
+```
+
+### Naming (example)
+
+| Artefact | Example id |
+| --- | --- |
+| Dataset repo | `fswe-hcapo-pg-01-trajectories` |
+| Adapter output repo | `fswe-hcapo-pg-01-qwen36-27b` |
+| Trackio Space | `<user>/fswe-hcapo-pg-01-monitor` |
+| Trackio project | `fswe-hcapo-pg-01` |
+| Run name | `fswe-hcapo-pg-01-qwen36-27b` |
+
+Set **`report_to = trackio`**, **`TRACKIO_SPACE_ID`**, **`TRACKIO_PROJECT_NAME`**, and optionally the compatibility aliases **`TRACKIO_SPACE`**, **`TRACKIO_PROJECT`** (see [`train_hcapo.py`](train_hcapo.py) argparse / env handling).
+
+---
+
+## Typical commands
+
+```bash
+uv run python scripts/build_hcapo_dataset.py \
+  --input-dir trajectories \
+  --output-dir datasets \
+  --min-reward 0.05 \
+  --omega 1.0
+```
+
+```bash
+./scripts/launch_hf_space.sh --upload-dataset
+./scripts/launch_hf_space.sh --max-steps 1
+./scripts/launch_hf_space.sh --with-dataset-upload --max-steps 1
+./scripts/launch_hf_space.sh
+./scripts/launch_hf_space.sh --delete
+```
+
+---
+
+## Troubleshooting
+
+### Planning-only episodes with reward **0.05**
+
+Backfill / rubric can assign a small **\(R\)** via **`tool_density`**, but dense **`r_t`** on steps may stay **0** until a plan and subtask scores exist—little HCAPO signal after clipping.
+
+### OOM on first training step
+
+If failure is inside **`cross_entropy`** on full logits, ensure the **chunked backbone + `lm_head`** path is active (see **`HCAPOTrainer.compute_loss`**). Fallback: lower **`max_seq_length`**.
+
+### `RuntimeError` … `lm_head` / hidden mismatch
+
+Usually means the resolved “backbone” was still a **full CausalLM** instead of **`Qwen3_5TextModel`**. Check [`_get_backbone_and_lm_head()`](train_hcapo.py) unwrapping.
+
+### SGLang OOM during hindsight
+
+Avoid full-prompt logprob modes; keep **`/generate`** + **`logprob_start_len`** + a modest **`--max-logprob-tokens`**.
+
+### Space killed before training finishes
+
+Ensure the **7860** stub server is running and the main process is not **`exec`**’d as the only PID without a follow-up **`sleep`**.
+
+### Wrong Trackio project
+
+Verify **`REPORT_TO`**, **`TRACKIO_SPACE_ID`**, **`TRACKIO_PROJECT_NAME`**, **`RUN_NAME`**, and the **`TRACKIO_*`** aliases.
+
+---
+
+## File map
+
+| Stage | Script / artefact |
+| --- | --- |
+| Collect | [`scripts/collect_trajectories.py`](../scripts/collect_trajectories.py) |
+| Backfill reward | [`scripts/backfill_rewards.py`](../scripts/backfill_rewards.py) |
+| Hindsight | [`scripts/compute_hindsight_scores.py`](../scripts/compute_hindsight_scores.py) |
+| Build JSONL | [`scripts/build_hcapo_dataset.py`](../scripts/build_hcapo_dataset.py) |
+| Train | [`training/train_hcapo.py`](train_hcapo.py) |
+| HF Space | [`scripts/launch_hf_space.sh`](../scripts/launch_hf_space.sh), [`Dockerfile.train`](Dockerfile.train) |
+
+---
+
+## References
+
+- HCAPO paper: [arXiv:2603.08754](https://arxiv.org/abs/2603.08754), [HTML + Appendix B](https://arxiv.org/html/2603.08754v1).
+- Root README: [Training (offline RL)](../README.md#training-offline-rl).