Repo for training GPT O3 style multi step web search capability into open source models
Any code involving GPU (inferencing or training model) is run using SkyPilot and RunPod
collect_sft.pycan be run locally to collect trajectories from gpt models, using web search tools to answer queries from a selection of Q&A evals (BrowseComp, SimpleQA, HotpotQA and MS Marco)run_sft.pyis the skypilot code that runssft.py, the SFT training script that uses the trajectories collected earlier to clone the behaviour of the GPT model- in practice this step improves tool call accuracy, and increases the number of times the model searches for information before answering
run_train.pyis the skypilot code that runstrain.py, the RL training script that uses GRPO on either a base model or an SFT checkpoint, reward is binary for outcome correctnessrun_eval.pyis the skypilot code that runseval.py, which loads a training checkpoint from hf hub and evaluates on either BrowseComp or SimpleQA
Copy .env.example into .env and add your API keys
This repo uses uv, use this link to set up uv, then run uv sync to install the project deps