Draft: Lecture 5 — Reasoning Training & Inference-Time Scaling#327
Draft: Lecture 5 — Reasoning Training & Inference-Time Scaling#327natolambert wants to merge 2 commits intomainfrom
Conversation
New lecture covering the reasoning model landscape and RLVR implementation details that differ from standard RLHF RL. - ~53 slides: intro/RLVR recap, model landscape (grouped by lesson), recipe changes (difficulty filtering, no KL, async infra, etc.), looking ahead - Add 23 bib entries to teach/course/refs.bib for reasoning papers - Add lecture-label metadata to chapter 7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Major restructure based on reviewer feedback: - Reorder to method-first: RLVR foundations → recipe changes → model landscape (was landscape → recipe) - Replace goldfish poem with math thinking-tokens example - Tighten claims: "often no RM needed", "stability is much more tractable", "same policy-gradient family" - Add glossary slide (pass@K, DAPO, CISPO, MTP, IFEval, GPQA) - Add failure-modes slide (6 common failure patterns) - Move pre-o1 research before model landscape - Rename "model table" → "landscape" - Compress landscape: cut 4 standalone model slides, mention inline Replace all duplicated-slide reveals with colloquium PR #25 animations: - <!-- animate: bullets --> for incremental list reveals - <!-- step --> for punchlines and progressive content Point colloquium dep at PR #25 branch for testing animations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5cb43f063
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # TODO: pin back to a PyPI release when colloquium development slows down. | ||
| "colloquium @ git+https://github.com/natolambert/colloquium.git", | ||
| # Testing PR #25 (animations) — revert to HEAD after merge | ||
| "colloquium @ git+https://github.com/natolambert/colloquium.git@refs/pull/25/head", |
There was a problem hiding this comment.
Pin colloquium dependency to immutable revision
The teach extra now points to refs/pull/25/head, which is a mutable GitHub PR ref; if that PR branch is force-pushed, closed, or garbage-collected, pip/uv install .[teach] can fail or silently pull different code over time. This makes slide builds non-reproducible and can break onboarding/CI unexpectedly, so this should be pinned to a stable tag or commit SHA instead of a moving PR ref.
Useful? React with 👍 / 👎.
Summary
New lecture 5 covering Chapter 7 (Reasoning Training & Inference-Time Scaling), built with colloquium.
Changes
teach/course/lec5-chap7.md— ~53 slides organized as:teach/course/refs.bib— 23 new bib entries for reasoning papersbook/chapters/07-reasoning.md— Addedlecture-labelmetadata🤖 Generated with Claude Code