Skip to content

MedARC-AI/med-lm-train

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

med-lm-train

Setup + Installation

  1. Clone the repository
git clone --recurse-submodules --shallow-submodules --depth 50 https://github.com/MedARC-AI/med-lm-train.git
cd med-lm-train

Or if you already have the repo cloned without submodules:

git submodule update --init --recursive --depth 50
  1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
  1. Install dependencies
uv sync

For flash attention support:

uv sync --extra flash-attn-2    # flash-attn 2
uv sync --extra flash-attn-3    # flash-attn 2 + 3 (use for H100s)
uv sync --extra flash-attn-4    # flash-attn 2, 3, & 4 (use for B200s)

medarc_slurm

medarc_slurm is a CLI tool that generates and submits single-node SLURM jobs for PRIME-RL SFT and RL training. It is based on PRIME-RL's built-in rl_slurm and sft_slurm commands but adapted for shared-node environments where jobs don't neccesarily have exclusive access to the machine.

# SFT: single torchrun job
medarc_slurm sft --config config.toml --output-dir runs/my-sft --gpus 2

# RL: splits GPUs between vLLM inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --train-gpus 1 --infer-gpus 2

# RL: share a single GPU between inference and training
medarc_slurm rl --config config.toml --output-dir runs/my-rl --single-gpu

# SFT: low-priority queue + email notifications + resume from latest checkpoint
medarc_slurm sft --config config.toml \
  --output-dir runs/my-sft \
  --gpus 2 \
  --priority low \
  --mail all \
  --mail-user email@domain.com \
  --slurm-resume

# Validate an RL submission (including dependency syntax) without creating a job
medarc_slurm rl --config config.toml \
  --output-dir runs/my-rl \
  --train-gpus 1 \
  --infer-gpus 2 \
  --dependency afterok:123456 \
  --test-only

Generated artifacts are written to --output-dir:

  • sft.sh or rl.sh — the SLURM batch script
  • configs/ — resolved TOML subconfigs passed to each component

You can pass PRIME-RL config overrides directly as extra flags (for example --wandb.project my-proj --wandb.name my-run). You may also insert -- before passthrough overrides for readability, but it is optional. To layer multiple PRIME-RL configs, repeat --config with later files overriding earlier ones.

medarc_slurm now defaults --account to training. You can override it with --account <name>. Email mode is --mail all or --mail begin_end (with --mail-user). Use --dependency "<expr>" to pass SLURM dependencies and --test-only to run sbatch validation without submitting.

Run medarc_slurm sft --help or medarc_slurm rl --help for more details on available options.

Examples

Each example has its own README with setup instructions, SFT/RL commands, and eval steps:

Example GPUs Description
reverse_text 1 (shared) Single-GPU SFT + RL on a toy text reversal task
hendrycks_sanity 4 Multi-GPU RL on Hendrycks MATH (sanity subset)
alphabet_sort 8 Full-node RL on alphabet sorting

All examples use medarc_slurm to generate and submit single-node SLURM jobs. Start with reverse_text to verify your setup.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors