This guide will walk you through the steps to set up a local GPU environment for Marin. By "local", we mean a machine that you run jobs on directly, as opposed to using Ray's autoscaler to launch a cluster of GPU nodes. Similar steps will let you run Marin on a cloud GPU environment using Ray's autoscaler, but we defer that to a future guide.
Make sure you've followed the installation guide to do the basic installation.
In addition to the prerequisites from the basic installation, we have GPU-specific dependencies:
- CUDA Toolkit (version 12.1 or higher)
- cuDNN (version 9.1 or higher)
We assume you are running Ubuntu 24.04.
Install CUDA 12.9.0:
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_575.51.03_linux.run
sudo sh cuda_12.9.0_575.51.03_linux.runInstall cuDNN 9.9.0 (Instructions from NVIDIA's cuDNN download page):
wget https://developer.download.nvidia.com/compute/cudnn/9.10.0/local_installers/cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.10.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn
sudo apt-get -y install cudnn-cuda-12
sudo apt-get -y install nvidia-cuda-toolkitVerify your setup by checking the CUDA version:
nvcc --versionMarin uses JAX as a core library. Install Python dependencies for CUDA 12.x via uv:
uv sync --extra=cuda12See JAX's installation guide for more options.
Now you can run an experiment.
Let's start by running the tiny model training script (GPU version) experiments/tutorials/train_tiny_model_gpu.py:
export MARIN_PREFIX=local_store
export WANDB_ENTITY=...
python3 experiments/tutorials/train_tiny_model_gpu.py --prefix local_storeThe prefix is the directory where the output will be saved. It can be a local directory or anything fsspec supports,
such as s3:// or gs://.
Let's take a look at the script.
Whereas the CPU version
requests resources=CpuOnlyConfig(num_cpus=1),
the GPU version
requests resources=GpuConfig(gpu_count=1):
nano_train_config = SimpleTrainConfig(
# Here we define the hardware resources we need.
resources=GpuConfig(gpu_count=1),
train_batch_size=128,
num_train_steps=1000,
learning_rate=6e-4,
weight_decay=0.1,
)To scale up, you can use Ray's autoscaler to launch a cluster of GPU nodes. We defer that to a future guide, but you can see Ray's autoscaler documentation for more information.