- Clone the repository
git clone --recurse-submodules --shallow-submodules --depth 50 https://github.com/MedARC-AI/med-lm-train.git
cd med-lm-trainOr if you already have the repo cloned without submodules:
git submodule update --init --recursive --depth 50- Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env- Install dependencies
uv syncFor flash attention support:
uv sync --extra flash-attn-2 # flash-attn 2
uv sync --extra flash-attn-3 # flash-attn 2 + 3 (use for H100s)
uv sync --extra flash-attn-4 # flash-attn 2, 3, & 4 (use for B200s)medarc_slurm is a CLI tool that generates and submits single-node SLURM jobs for PRIME-RL SFT and RL training. It is based on PRIME-RL's built-in rl_slurm and sft_slurm commands but adapted for shared-node environments where jobs don't neccesarily have exclusive access to the machine.
# SFT: single torchrun job
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2
# RL: splits GPUs between vLLM inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 1 --infer-gpus 2
# RL: share a single GPU between inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --single-gpu
# SFT: low-priority queue + email notifications + resume from latest checkpoint
medarc_slurm sft --config config.toml \
--output-dir runs/my-sft \
--gpus 2 \
--priority low \
--mail all \
--mail-user email@domain.com \
--slurm-resume
# Validate an RL submission (including dependency syntax) without creating a job
medarc_slurm rl --config config.toml \
--output-dir runs/my-rl \
--train-gpus 1 \
--infer-gpus 2 \
--dependency afterok:123456 \
--test-onlyGenerated artifacts are written to --output-dir:
sft.shorrl.sh— the SLURM batch scriptconfigs/— resolved TOML subconfigs passed to each component
You can pass PRIME-RL config overrides directly as extra flags (for example --wandb.project my-proj --wandb.name my-run). You may also insert -- before passthrough overrides for readability, but it is optional. To layer multiple PRIME-RL configs, repeat --config with later files overriding earlier ones.
medarc_slurm now defaults --account to training. You can override it with --account <name>.
Email mode is --mail all or --mail begin_end (with --mail-user).
Use --dependency "<expr>" to pass SLURM dependencies and --test-only to run sbatch validation without submitting.
Run medarc_slurm sft --help or medarc_slurm rl --help for more details on available options.
Each example has its own README with setup instructions, SFT/RL commands, and eval steps:
| Example | GPUs | Description |
|---|---|---|
| reverse_text | 1 (shared) | Single-GPU SFT + RL on a toy text reversal task |
| hendrycks_sanity | 4 | Multi-GPU RL on Hendrycks MATH (sanity subset) |
| alphabet_sort | 8 | Full-node RL on alphabet sorting |
All examples use medarc_slurm to generate and submit single-node SLURM jobs. Start with reverse_text to verify your setup.