ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

💡 Key Findings | 📈 Scaling Results | 🔥 Models(infer & SFT) | 📝 Open Source List

Updates

[2025-11-01] 🎉 Released the parathinker-math-6K dataset and training scripts.
[2025-10-02] 🚀 Updated the inference engine and released the improved ParaThinker-1.5B model.

🌟 About

Recent advances in Large Language Models (LLMs) have been driven by test-time compute scaling - a strategy that improves reasoning by generating longer, sequential thought processes.
However, this approach hits a bottleneck where further computation offers only marginal gains, due to "Tunnel Vision" where imperfect initial steps lock the model into suboptimal paths.
We introduce ParaThinker, an end-to-end framework that trains LLMs to generate multiple, diverse reasoning paths in parallel and synthesize them into a superior final answer.
Scaling compute in parallel (width) proves more effective and efficient than sequentially (depth).

💡 Key Findings

Here are the core insights from our analysis and evaluations:

📈 Superior Accuracy Gains: On challenging reasoning benchmarks (AIME 2024/2025, AMC 2023, MATH-500), ParaThinker achieves 12.3% improvement for 1.5B models and 7.5% for 7B models on average with 8 parallel paths.

✅ Overcomes Tunnel Vision: The bottleneck in sequential reasoning arises from early token choices committing to flawed paths; parallelism enables diverse exploration to break through.

🧠 Native Parallelism in a Single Pass: Using specialized control tokens (), thought-specific positional embeddings, and two-phase attention, ParaThinker generates and integrates paths end-to-end without external verifiers.

⚡ Minimal Latency Overhead: Adds only 7.1% latency on average, leveraging batching for hardware efficiency; 16 paths take <2x time of a single path.

🧱 Scalable SFT Training: Supervised fine-tuning with paths from a teacher model enables generalization to more paths at inference.

🔁 Smaller Models Outperform Larger Ones: ParaThinker-equipped small LLMs surpass larger sequential counterparts, offering a new scaling dimension.

We would release the full code for training and inference, along with evaluation scripts. Checkpoints for ParaThinker-1.5B are available on 🤗 HuggingFace.

📈 Scaling Results

Evaluated on math reasoning tasks, scaling parallel paths P from 1 to 8.

🔥 Models

ParaThinker models based on DeepSeek-R1-Distill-Qwen versions:

Model	Description	Download
ParaThinker-1.5B	Fine-tuned for parallel reasoning	🤗 Leslie04/ParaThinker-1.5B
ParaThinker-7B	Higher-capacity for complex tasks	🤗 Leslie04/ParaThinker-7B (coming soon)

Usage Example with Inference Engine for ParaThinker

For efficient parallel inference using our customized vLLM engine, refer to the Inference Submodule README. This submodule implements the native parallel thinking inference engine, leveraging PagedAttention for KV cache reuse. Also see the quick start example in inference/examples/parathinker/example.py for usage.

Quick Start: SFT for ParaThinker-1.5B

We use custom LLaMA-Factory to train native parallel thinking model.

Build Conda Environment: The following is a simplest script to build a conda environment for ParaThinker training:

set -e 

eval "$(conda shell.bash hook)"
if ! conda env list | grep -q "parathinker-sft"; then
    conda create -n parathinker-sft python=3.11
fi

conda activate parathinker-sft

cd ./train/LLaMA-Factory
pip install -e ".[torch,metrics]"

cd ../transformers
pip install -e .

Dataset Installation and SFT Running: Install parathinker-math-6K dataset and then use the example training script to quickly start a SFT on deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B.

📝 Open Source List

Inference Engine based on vLLM
ParaThinker-1.5B Model
ParaThinker-7B Model
SFT dataset and training script based on llama-factory
Evaluation script

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
inference @ d30de3c		inference @ d30de3c
train		train
.gitmodules		.gitmodules
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Updates

🌟 About

💡 Key Findings

📈 Scaling Results

🔥 Models

Usage Example with Inference Engine for ParaThinker

Quick Start: SFT for ParaThinker-1.5B

📝 Open Source List

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MobileLLM/ParaThinker

Folders and files

Latest commit

History

Repository files navigation

ParaThinker: Native Parallel Thinking as a New Paradigm to Scale LLM Test-time Compute

Updates

🌟 About

💡 Key Findings

📈 Scaling Results

🔥 Models

Usage Example with Inference Engine for ParaThinker

Quick Start: SFT for ParaThinker-1.5B

📝 Open Source List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages