Skip to content

Add PUMA: Semantic-Preserving Early Exit for Reasoning Models#10

Open
ZhishanQ wants to merge 1 commit into
testtimescaling:mainfrom
ZhishanQ:add-puma
Open

Add PUMA: Semantic-Preserving Early Exit for Reasoning Models#10
ZhishanQ wants to merge 1 commit into
testtimescaling:mainfrom
ZhishanQ:add-puma

Conversation

@ZhishanQ
Copy link
Copy Markdown

Adding PUMA as a new row in the main taxonomy table in `index.html`.

PUMA fits the inference-time / "How Well: Token Cost + Speedup" corner of the test-time-scaling landscape — it studies when to stop scaling rather than how to scale up. It uses reasoning-level semantic redundancy (via a lightweight fine-tuned Qwen3-Embedding-0.6B detector) as the candidate-exit signal, with an answer-verification window confirming exits.

Taxonomy classification (please feel free to adjust)

Field Value Reasoning
What Internal Modifies the reasoning trajectory in place (early-exit) rather than parallel/sequential sampling
SFT The LRM is frozen; only a small auxiliary embedding model is trained
RL
STI Redundancy-Aware Early Exit The detector intervenes mid-decoding to flag candidate exit points
SEA No tree/graph search
VER Trial-Answer Verifier The answer-verification window confirms whether a flagged candidate exit is safe
AGG
Where Math, Code, General MATH-500, AIME24/25, OlympiadBench, GPQA-Diamond + LiveCodeBench + MathVista, MathVision
How Well Pass@1, Token Cost, Speedup 26.2% average token reduction; 1.40× / 1.28× speedup on DS-7B/14B

The PUMA detector is contrastively trained, so SFT could arguably be marked ✓ — but I went with ✗ because the LRM itself is untouched and the framework is plug-and-play. Happy to flip if you'd prefer.

Links

I did not modify `papers.json` (which currently lists only the survey itself) or `arxiv_citations.json` (bot-managed). Let me know if I should add to papers.json too.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant