Skip to content

Commit 1961b00

Browse files
authored
Merge pull request #2081 from bghira/main
merge
2 parents db0d7ab + 5df36cb commit 1961b00

142 files changed

Lines changed: 10087 additions & 224 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/publish-pypi.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ jobs:
1616
runs-on: ubuntu-latest
1717
strategy:
1818
matrix:
19-
python-version: ["3.11", "3.12"]
19+
python-version: ["3.12", "3.13"]
2020

2121
steps:
2222
- uses: actions/checkout@v4
@@ -51,7 +51,7 @@ jobs:
5151
- name: Set up Python
5252
uses: actions/setup-python@v4
5353
with:
54-
python-version: "3.11"
54+
python-version: "3.12"
5555

5656
- name: Install build dependencies
5757
run: |
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
name: Push Cog Image
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
image:
7+
description: "Target Replicate image (e.g., r8.im/simpletuner/z-image)"
8+
required: false
9+
default: "r8.im/simpletuner/z-image"
10+
11+
jobs:
12+
push:
13+
runs-on: ubuntu-latest
14+
env:
15+
REPLICATE_API_TOKEN: ${{ secrets.REPLICATE_API_KEY }}
16+
steps:
17+
- name: Checkout repository
18+
uses: actions/checkout@v4
19+
with:
20+
submodules: recursive
21+
fetch-depth: 0
22+
23+
- name: Install Cog CLI
24+
run: |
25+
set -euo pipefail
26+
curl -fsSL https://raw.githubusercontent.com/replicate/cog/main/tools/install.sh -o /tmp/install-cog.sh
27+
chmod +x /tmp/install-cog.sh
28+
INSTALL_DIR="$HOME/.local/bin" /tmp/install-cog.sh
29+
echo "$HOME/.local/bin" >> "$GITHUB_PATH"
30+
cog --version
31+
32+
- name: Push image to Replicate
33+
env:
34+
TARGET_IMAGE: ${{ github.event.inputs.image }}
35+
run: |
36+
if [ -z "$REPLICATE_API_TOKEN" ]; then
37+
echo "REPLICATE_API_KEY secret is missing" >&2
38+
exit 1
39+
fi
40+
cog login --token "$REPLICATE_API_TOKEN"
41+
cog push "${TARGET_IMAGE:-r8.im/simpletuner/z-image}"

.github/workflows/python-tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
- name: Set up Python
2424
uses: actions/setup-python@v2
2525
with:
26-
python-version: 3.11
26+
python-version: 3.12
2727

2828
- name: Install Dependencies
2929
run: python -m pip install --upgrade pip && pip install -e .[test]

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# SimpleTuner needs CU141
22
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
33

4-
ARG PYTHON_VERSION=3.11
4+
ARG PYTHON_VERSION=3.12
55

66
# Prevent commands from blocking for input during build
77
ENV DEBIAN_FRONTEND=noninteractive

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ SimpleTuner provides comprehensive training support across multiple diffusion mo
5454
- **Multi-GPU training** - Distributed training across multiple GPUs with automatic optimization
5555
- **Advanced caching** - Image, video, audio, and caption embeddings cached to disk for faster training
5656
- **Aspect bucketing** - Support for varied image/video sizes and aspect ratios
57+
- **Concept sliders** - Slider-friendly targeting for LoRA/LyCORIS/full (via LyCORIS `full`) with positive/negative/neutral sampling and per-prompt strength; see [Slider LoRA guide](/documentation/SLIDER_LORA.md)
5758
- **Memory optimization** - Most models trainable on 24G GPU, many on 16G with optimizations
5859
- **DeepSpeed & FSDP2 integration** - Train large models on smaller GPUs with optim/grad/parameter sharding, context parallel attention, gradient checkpointing, and optimizer state offload
5960
- **S3 training** - Train directly from cloud storage (Cloudflare R2, Wasabi S3)
@@ -127,6 +128,8 @@ Detailed quickstart guides are available for all supported models:
127128
- **[Sana Guide](/documentation/quickstart/SANA.md)** - Lightweight flow-matching model
128129
- **[Lumina2 Guide](/documentation/quickstart/LUMINA2.md)** - 2B parameter flow-matching model
129130
- **[Kwai Kolors Guide](/documentation/quickstart/KOLORS.md)** - SDXL-based with ChatGLM encoder
131+
- **[LongCat-Image Guide](/documentation/quickstart/LONGCAT_IMAGE.md)** - 6B bilingual flow-matching model with Qwen-2.5-VL encoder
132+
- **[LongCat-Image Edit Guide](/documentation/quickstart/LONGCAT_EDIT.md)** - Image editing flavour requiring reference latents
130133
- **[LTX Video Guide](/documentation/quickstart/LTXVIDEO.md)** - Video diffusion training
131134
- **[Hunyuan Video 1.5 Guide](/documentation/quickstart/HUNYUANVIDEO.md)** - 8.3B flow-matching T2V/I2V with SR stages
132135
- **[Wan Video Guide](/documentation/quickstart/WAN.md)** - Video flow-matching with TREAD support

documentation/DEEPSPEED.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ In v3.0, this support has been greatly improved, with a WebUI configuration buil
2424
| GPU GI CI PID Type Process name GPU Memory |
2525
| ID ID Usage |
2626
|=============================================================================|
27-
| 0 N/A N/A 11500 C ...uner/.venv/bin/python3.11 9232MiB |
27+
| 0 N/A N/A 11500 C ...uner/.venv/bin/python3.12 9232MiB |
2828
+-----------------------------------------------------------------------------+
2929
```
3030

documentation/DREAMBOOTH.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,30 @@ Alternatively, one might use the real name of their subject, or a 'similar enoug
222222

223223
After a number of training experiments, it seems as though a 'similar enough' celebrity is the best choice, especially if prompting the model for the person's real name ends up looking dissimilar.
224224

225+
# Scheduled Sampling (Rollout)
226+
227+
When training on small datasets like in Dreambooth, models can quickly overfit to the "perfect" noise added during training. This leads to **exposure bias**: the model learns to denoise perfect inputs but fails when faced with its own slightly imperfect outputs during inference.
228+
229+
**Scheduled Sampling (Rollout)** addresses this by occasionally letting the model generate its own noisy latents for a few steps during the training loop. Instead of training on pure Gaussian noise + signal, it trains on "rollout" samples that contain the model's own previous errors. This teaches the model to correct itself, leading to more robust and stable subject generation.
230+
231+
> 🟢 This feature is experimental but highly recommended for small datasets where overfitting or "frying" is common.
232+
> ⚠️ Enabling rollout increases compute requirements, as the model must perform extra inference steps during the training loop.
233+
234+
To enable it, add these keys to your `config.json`:
235+
236+
```json
237+
{
238+
"scheduled_sampling_max_step_offset": 10,
239+
"scheduled_sampling_probability": 1.0,
240+
"scheduled_sampling_ramp_steps": 1000,
241+
"scheduled_sampling_sampler": "unipc"
242+
}
243+
```
244+
245+
* `scheduled_sampling_max_step_offset`: How many steps to generate. A small value (e.g., 5-10) is often enough.
246+
* `scheduled_sampling_probability`: How often to apply this technique (0.0 to 1.0).
247+
* `scheduled_sampling_ramp_steps`: Ramp up the probability over the first N steps to avoid destabilizing early training.
248+
225249
# Exponential moving average (EMA)
226250

227251
A second model can be trained in parallel to your checkpoint, nearly for free - only the resulting system memory (by default) is consumed, rather than more VRAM.

documentation/INSTALL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@ git clone --branch=release https://github.com/bghira/SimpleTuner.git
3535

3636
cd SimpleTuner
3737

38-
# if python --version shows 3.11 you can just also use the 'python' command here.
39-
python3.11 -m venv .venv
38+
# if python --version shows 3.11 will have to upgrade to 3.12.
39+
python3.12 -m venv .venv
4040

4141
source .venv/bin/activate
4242
```

documentation/OPTIONS.md

Lines changed: 80 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -619,6 +619,74 @@ See the [DATALOADER.md](DATALOADER.md#automatic-dataset-oversubscription) guide
619619
- **What**: Train a model using a more gradual weighting on the loss landscape.
620620
- **Why**: When training pixel diffusion models, they will simply degrade without using a specific loss weighting schedule. This is the case with DeepFloyd, where soft-min-snr-gamma was found to essentially be mandatory for good results. You may find success with latent diffusion model training, but in small experiments, it was found to potentially produce blurry results.
621621

622+
### `--diff2flow_enabled`
623+
624+
- **What**: Enable the Diffusion-to-Flow bridge for epsilon or v-prediction models.
625+
- **Why**: Allows models trained with standard diffusion objectives to use flow-matching targets (noise - latents) without changing the model architecture.
626+
- **Note**: Experimental feature.
627+
628+
### `--diff2flow_loss`
629+
630+
- **What**: Train with Flow Matching loss instead of the native prediction loss.
631+
- **Why**: When enabled alongside `--diff2flow_enabled`, this calculates the loss against the flow target (noise - latents) instead of the model's native target (epsilon or velocity).
632+
- **Note**: Requires `--diff2flow_enabled`.
633+
634+
### `--scheduled_sampling_max_step_offset`
635+
636+
- **What**: Maximum number of steps to "roll out" during training.
637+
- **Why**: Enables Scheduled Sampling (Rollout), where the model generates its own inputs for a few steps during training. This helps the model learn to correct its own errors and reduces exposure bias.
638+
- **Default**: 0 (disabled). Set to a positive integer (e.g., 5 or 10) to enable.
639+
640+
### `--scheduled_sampling_strategy`
641+
642+
- **What**: Strategy for choosing the rollout offset.
643+
- **Choices**: `uniform`, `biased_early`, `biased_late`.
644+
- **Default**: `uniform`.
645+
- **Why**: Controls the distribution of rollout lengths. `uniform` samples evenly; `biased_early` favors shorter rollouts; `biased_late` favors longer rollouts.
646+
647+
### `--scheduled_sampling_probability`
648+
649+
- **What**: Probability of applying a non-zero rollout offset for a given sample.
650+
- **Default**: 0.0.
651+
- **Why**: Controls how often scheduled sampling is applied. A value of 0.0 disables it even if `max_step_offset` is > 0. A value of 1.0 applies it to every sample.
652+
653+
### `--scheduled_sampling_prob_start`
654+
655+
- **What**: Initial probability for scheduled sampling at the start of the ramp.
656+
- **Default**: 0.0.
657+
658+
### `--scheduled_sampling_prob_end`
659+
660+
- **What**: Final probability for scheduled sampling at the end of the ramp.
661+
- **Default**: 0.5.
662+
663+
### `--scheduled_sampling_ramp_steps`
664+
665+
- **What**: Number of steps to ramp the probability from `prob_start` to `prob_end`.
666+
- **Default**: 0 (no ramp).
667+
668+
### `--scheduled_sampling_start_step`
669+
670+
- **What**: Global step to start the scheduled sampling ramp.
671+
- **Default**: 0.0.
672+
673+
### `--scheduled_sampling_ramp_shape`
674+
675+
- **What**: Shape of the probability ramp.
676+
- **Choices**: `linear`, `cosine`.
677+
- **Default**: `linear`.
678+
679+
### `--scheduled_sampling_sampler`
680+
681+
- **What**: The solver used for the rollout generation steps.
682+
- **Choices**: `unipc`, `euler`, `dpm`, `rk4`.
683+
- **Default**: `unipc`.
684+
685+
### `--scheduled_sampling_order`
686+
687+
- **What**: The order of the solver used for rollout.
688+
- **Default**: 2.
689+
622690
---
623691

624692
## 🔄 Checkpointing and Resumption
@@ -745,6 +813,7 @@ usage: train.py [-h] --model_family
745813
[--vae_cache_scan_behaviour {recreate,sync}]
746814
[--vae_enable_slicing [VAE_ENABLE_SLICING]]
747815
[--vae_enable_tiling [VAE_ENABLE_TILING]]
816+
[--vae_enable_patch_conv [VAE_ENABLE_PATCH_CONV]]
748817
[--vae_batch_size VAE_BATCH_SIZE]
749818
[--caption_dropout_probability CAPTION_DROPOUT_PROBABILITY]
750819
[--tokenizer_max_length TOKENIZER_MAX_LENGTH]
@@ -782,7 +851,7 @@ usage: train.py [-h] --model_family
782851
[--validation_guidance_skip_layers_stop VALIDATION_GUIDANCE_SKIP_LAYERS_STOP]
783852
[--validation_guidance_skip_scale VALIDATION_GUIDANCE_SKIP_SCALE]
784853
[--validation_lycoris_strength VALIDATION_LYCORIS_STRENGTH]
785-
[--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc,dpm++}]
854+
[--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc,dpm++,perflow}]
786855
[--validation_num_video_frames VALIDATION_NUM_VIDEO_FRAMES]
787856
[--validation_resolution VALIDATION_RESOLUTION]
788857
[--validation_seed_source {cpu,gpu}]
@@ -909,7 +978,7 @@ usage: train.py [-h] --model_family
909978
[--rescale_betas_zero_snr [RESCALE_BETAS_ZERO_SNR]]
910979
[--webhook_config WEBHOOK_CONFIG]
911980
[--webhook_reporting_interval WEBHOOK_REPORTING_INTERVAL]
912-
[--distillation_method {lcm,dcm}]
981+
[--distillation_method {lcm,dcm,dmd,perflow}]
913982
[--distillation_config DISTILLATION_CONFIG]
914983
[--ema_validation {none,ema_only,comparison}]
915984
[--local_rank LOCAL_RANK] [--ltx_train_mode {t2v,i2v}]
@@ -1083,6 +1152,10 @@ options:
10831152
PEFT LoRA training mode
10841153
--singlora_ramp_up_steps SINGLORA_RAMP_UP_STEPS
10851154
Number of ramp-up steps for SingLoRA
1155+
--slider_lora_target [SLIDER_LORA_TARGET]
1156+
Route LoRA training to slider-friendly targets
1157+
(self-attn + conv/time embeddings). Only affects
1158+
standard PEFT LoRA.
10861159
--init_lora INIT_LORA
10871160
Specify an existing LoRA or LyCORIS safetensors file
10881161
to initialize the adapter
@@ -1118,6 +1191,9 @@ options:
11181191
Enable VAE attention slicing for memory efficiency
11191192
--vae_enable_tiling [VAE_ENABLE_TILING]
11201193
Enable VAE tiling for large images
1194+
--vae_enable_patch_conv [VAE_ENABLE_PATCH_CONV]
1195+
Enable patch-based 3D conv for HunyuanVideo VAE to
1196+
reduce peak VRAM (slight slowdown)
11211197
--vae_batch_size VAE_BATCH_SIZE
11221198
Batch size for VAE encoding during caching
11231199
--caption_dropout_probability CAPTION_DROPOUT_PROBABILITY
@@ -1201,7 +1277,7 @@ options:
12011277
Scale guidance strength when applying layer skipping
12021278
--validation_lycoris_strength VALIDATION_LYCORIS_STRENGTH
12031279
Strength multiplier for LyCORIS validation
1204-
--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc,dpm++}
1280+
--validation_noise_scheduler {ddim,ddpm,euler,euler-a,unipc,dpm++,perflow}
12051281
Noise scheduler for validation
12061282
--validation_num_video_frames VALIDATION_NUM_VIDEO_FRAMES
12071283
Number of frames for video validation
@@ -1585,7 +1661,7 @@ options:
15851661
Path to webhook configuration file
15861662
--webhook_reporting_interval WEBHOOK_REPORTING_INTERVAL
15871663
Interval for webhook reports (seconds)
1588-
--distillation_method {lcm,dcm}
1664+
--distillation_method {lcm,dcm,dmd,perflow}
15891665
Method for model distillation
15901666
--distillation_config DISTILLATION_CONFIG
15911667
Path to distillation configuration file

0 commit comments

Comments
 (0)