This repo provides code for benchmarking Stable Diffusion using Streaming, Composer, and MosaicML Cloud. The benchmarking results are presented in this blog post, but the table is duplicated below.
| Number of A100s | Throughput (images / second) | Days to Train on MosaicML Cloud | A100-hours | Approx. Cost on MosaicML Cloud |
|---|---|---|---|---|
| 8 | 128.2 | 258.83 | 49,696 | $99,000 |
| 16 | 254.0 | 130.63 | 50,166 | $100,000 |
| 32 | 485.7 | 68.33 | 52,470 | $105,000 |
| 64 | 912.2 | 36.38 | 55,875 | $110,000 |
| 128 | 1618.4 | 20.5 | 62,987 | $125,000 |
| 256 | 2,589.4 | 12.83 | 78,735 | $160,000 |
In this repo, you will find:
benchmark.py- defines the Stable DiffusionComposerModeland the ComposerTrainer.data.py- defines the MosaicML Stremaing LAION dataset and a synthetic dataset as an alternative to streaming data.ema.py- a memory-efficient version of Composer's EMA algorithmmcloud.yaml- examples of how to use MosaicML Cloud to launch a training run.
If you are interested in using the MosaicML Cloud, sign up for a demo here!
Install required dependencies using pip install -r requirements.txt
If you would like to use xFormers install it using (we specify a commit we know will work):
pip install -v -U git+https://github.com/facebookresearch/xformers.git@3df785ce54114630155621e2be1c2fa5037efa27#egg=xformersTo benchmark without using a streaming dataset:
composer benchmark.py --use_ema --use_synth_data --device_train_microbatch_size 4device_train_microbatch_size should be 4 when using a NVIDIA 40GB A100 GPUs and xFormers. If you are not using xFormers, device_train_microbatch_size should be 2. If using a smaller GPU, adjust device_train_microbatch_size as needed
To log benchmark results, set up a Weights and Biases account, then specify the --wandb_name and --wandb_project arguments.
If you want to benchmark using a streaming dataset, specify the --remote argument:
composer benchmark.py --use_ema --device_train_microbatch_size 4 --remote s3://my-bucket/laion/mdsIf you run into any problems with the code, please file Github issues directly to this repo.
If you want train diffusion models on MosaicML Cloud, schedule a demo online or email us at [email protected]