med-lm-train

Setup + Installation

Clone the repository

git clone --recurse-submodules --shallow-submodules --depth 50 https://github.com/MedARC-AI/med-lm-train.git
cd med-lm-train

Or if you already have the repo cloned without submodules:

git submodule update --init --recursive --depth 50

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

Install dependencies

uv sync

For flash attention support:

uv sync --extra flash-attn-2    # flash-attn 2
uv sync --extra flash-attn-3    # flash-attn 2 + 3 (use for H100s)
uv sync --extra flash-attn-4    # flash-attn 2, 3, & 4 (use for B200s)

medarc_slurm

medarc_slurm is a CLI tool that generates and submits single-node SLURM jobs for PRIME-RL SFT and RL training. It is based on PRIME-RL's built-in rl_slurm and sft_slurm commands but adapted for shared-node environments where jobs don't neccesarily have exclusive access to the machine.

# SFT: single torchrun job
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2

# RL: splits GPUs between vLLM inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 1 --infer-gpus 2

# RL: share a single GPU between inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --single-gpu

# SFT: low-priority queue + email notifications + resume from latest checkpoint
medarc_slurm sft --config config.toml \
  --output-dir runs/my-sft \
  --gpus 2 \
  --priority low \
  --mail all \
  --mail-user email@domain.com \
  --slurm-resume

# Validate an RL submission (including dependency syntax) without creating a job
medarc_slurm rl --config config.toml \
  --output-dir runs/my-rl \
  --train-gpus 1 \
  --infer-gpus 2 \
  --dependency afterok:123456 \
  --test-only

Generated artifacts are written to --output-dir:

sft.sh or rl.sh — the SLURM batch script
configs/ — resolved TOML subconfigs passed to each component

You can pass PRIME-RL config overrides directly as extra flags (for example --wandb.project my-proj --wandb.name my-run). You may also insert -- before passthrough overrides for readability, but it is optional. To layer multiple PRIME-RL configs, repeat --config with later files overriding earlier ones.

medarc_slurm now defaults --account to training. You can override it with --account <name>. Email mode is --mail all or --mail begin_end (with --mail-user). Use --dependency "<expr>" to pass SLURM dependencies and --test-only to run sbatch validation without submitting.

Run medarc_slurm sft --help or medarc_slurm rl --help for more details on available options.

Examples

Each example has its own README with setup instructions, SFT/RL commands, and eval steps:

Example	GPUs	Description
reverse_text	1 (shared)	Single-GPU SFT + RL on a toy text reversal task
hendrycks_sanity	4	Multi-GPU RL on Hendrycks MATH (sanity subset)
alphabet_sort	8	Full-node RL on alphabet sorting

All examples use medarc_slurm to generate and submit single-node SLURM jobs. Start with reverse_text to verify your setup.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
examples		examples
medarc_rl		medarc_rl
prime-rl @ 99c02bb		prime-rl @ 99c02bb
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

med-lm-train

Setup + Installation

medarc_slurm

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

med-lm-train

Setup + Installation

medarc_slurm

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages