Releases · bghira/SimpleTuner

19 Dec 03:30

bghira

v3.3.1

1d20509

v3.3.1

What's Changed

flux2: do not bypass the special model loader by @bghira in #2170
(#2030) scheduled dataset sampling by @bghira in #2167
GLANCE: better code example by @bghira in #2171
TwinFlow: do not initialise neg time embed when disabled by @bghira in #2174
UI (datasets): remove ControlNet conditioning option from selections when CN is disabled; select reference_strict by default otherwise by @bghira in #2177
add missing LayerSync support to kandinsky5 video by @bghira in #2179
qwen-edit: fix text embed cache generation with image context; disable image embeddings for multi-conditioning input by @bghira in #2176
chroma 4d text embed fix by @bghira in #2181
ensure edit-v2 either uses 1:1 or 0 image embeds by @bghira in #2186
upload zip: preserve subdirs by @bghira in #2189
allow simpletuner server env=... to auto-start training after webUI launches by @bghira in #2191
add more indicators to dataset page when conditioning parameters are not set by @bghira in #2192
Git-based configuration sync across SimpleTuner nodes (wip) by @bghira in #2172
Z-Image-Omni with optional SigLIP conditioning support, TREAD, LayerSync, CFG layer skip, fp16 clamping, and TwinFlow by @bghira in #2183
(#2182) add --peft_lora_target_modules for arbitrary layer definition by @bghira in #2193
(#2190) add webUI onboarding config to "simpletuner configure" by @bghira in #2194
merge by @bghira in #2196
(#2173) remove early check for CREPA since we are using LayerSync features with certain configs by @bghira in #2195
(#2187) better image resizing for validation inputs when validation resolution != training resolution by @bghira in #2197
adjust default resolution on dataset page to equal --resolution, and ensure min/max/target down sample size are equal by @bghira in #2198
merge by @bghira in #2199

Full Changelog: v3.3.0...v3.3.1

Contributors

bghira

Assets 2

16 Dec 21:51

bghira

v3.3.0

2e018b5

v3.3.0 - TwinFlow, LayerSync, and Flux.2 edit training

Features

TwinFlow, a distillation method that works on most flow-matching arch and converges in much less time than typical distillation
LayerSync, a self-regularisation method for practically all transformer models supported in SimpleTuner
CREPA can combine forces with LayerSync to self-regulate instead of using DINO features
Flux.2 can now accept conditioning datasets
Custom flow-matching timesteps can be provided for training, allowing configuration of "Glance" style training runs
WebUI: better path handling for datasets, sensible defaults will be set instead of requiring the user to figure it out
CLI: When configuring dataset cache directories, you can now use {id}, {output_dir} in addition to {model_family} to make dynamic paths that adjust automatically based on these attributes

Bugfixes

WebUI: Search box race condition resolved that prevented items from highlighting, or subsections from expanding

What's Changed

TwinFlow self-directed distillation by @bghira in #2159
(#2136) add --flow_custom_timesteps with Glance "distillation" example by @bghira in #2160
flux2: adjust comfyUI lora export format to use their custom keys instead of generic LoRA layout by @bghira in #2162
[webUI] refactoring validation and default paths for text embed and VAE caches by @bghira in #2163
flux2: support conditioning datasets by @bghira in #2164
fix search box race condition that prevented expanding subsection or highlighting results by @bghira in #2165
LayerSync + CREPA adaptation by @bghira in #2161
merge by @bghira in #2166

Full Changelog: v3.2.3...v3.3.0

Contributors

bghira

Assets 2

15 Dec 18:25

bghira

v3.2.3

a1528dc

v3.2.3

Features

--musubi_blocks_to_swap feature ported from musubi-tuner, adapted for SimpleTuner's Diffusers frankenstein build
LongCat Video 13.6B (needs a lot of system memory, block swapping, or ramtorch)
ROCm updated to torch 2.9.1 (still stuck on ROCm 6.4 though)
Exposed int4-torchao as a quant option, centric around NVIDIA cards but if you go the distance and enable FBGEMM-GENAI on ROCm, it'll work there too
--quantize_via=pipeline a new opt-in mode that will quantize when loading straight from disk, which should get rid of the ballooning system memory consumption before moving to GPU
Load .gguf models with straight-through quantisation (reduced system memory use at startup) by pointing --pretrained_transformer_model_name_or_path straight to the .gguf file
ReflexFlow is now the default mode for scheduled sampling on flow-matching models

Bugfixes

Torch compile validation now works with Ramtorch and Validation LoRA adapter(s)
Ramtorch fixes for validation with PEFT LoRA training
Ramtorch fixes for full model training (no LoRA/LyCORIS)
Support python 3.13 for ROCm systems
Better error message with exploding due to lack of bitsandbytes (ROCm, Apple)
Resolve stuck subprocess consuming VRAM when exit fails because the trainer is busy and cannot respond in time
Fix scrollbars on dataloader UI path fields consuming space
ReflexFlow ADR sign calculation fixed, no longer breaks model and pushes toward noise
Fix for --text_encoder_x_precision that was not correctly quantising any text encoders when launched via webUI
Fix for multigpu training with Ramtorch (DDP tensors)
Added ffmpeg as a system / container dependency in the webui/api tutorials
Flux2 text encoder OOM (related to text encoder precision) fixed
Minor QoL improvements to web interface

What's Changed

ROCm: torch 2.9.1 dependencies by @bghira in #2147
LongCat Video 13.6B by @bghira in #2083
ROCm: support py 3.13 by @bghira in #2148
musubi blocks to swap requires model to remain on CPU by @bghira in #2149
resolve error when using bitsandbytes quant level without it being installed by @bghira in #2150
ramtorch: do not move full model to accelerator by @bghira in #2151
ramtorch: enable use of validation adapters during full model training by @bghira in #2152
fix validation for compiled model with validation adapter LoRA by @bghira in #2153
honour request to stop training and terminate subprocess when accelerate is used to launch by @bghira in #2154
prevent scrollbars from consuming too much space by @bghira in #2155
ReflexFlow: fix sign of ADR calc, resolving extremely high loss and noise by @bghira in #2156
merge by @bghira in #2157

Full Changelog: v3.2.2...v3.2.3

Contributors

bghira

Assets 2

12 Dec 19:01

bghira

v3.2.2

1d02264

v3.2.2 - Better video training, built-in doc links for UI, and TE quant fix

Features

CREPA for better motion alignment when training on videos
Documentation links added to all options in the dataset page and elsewhere that lead to the online Github docs
Speed statistics now published to webhook & visible in webUI
ReflexFlow now enabled by default for flow-matching models when scheduled_sampling is enabled
grad_absmax is now visible via webhook and webUI

Bugfixes

Text encoder precision level is now honoured by the API / webUI launch
Flux2 quantised text encoder now loads without resorting to fp32
HunyuanVideo 1.5 fixes for LoRA training
lr_end is now correctly numeric after saving config
Better validation for most common dataset config errors on dataset UI page
Renaming datasets no longer glitches out in UI
Correctly write epoch statistics out to checkpoint when saving by epoch interval
No longer filling in pretrained_model_name_or_path incorrectly when bootstrapping an environment from a model config via UI

What's Changed

HunyuanVideo 1.5: refactor to use Diffusers v0.36.0 implementation by @bghira in #2115
(#2116) add iterationtracker for calculating throughput and publishing rate statistics via webhook by @bghira in #2118
emit grad absmax to webhook integration, display in UI by @bghira in #2120
add more dataloader options for configuration via UI, and docLinks for all options by @bghira in #2121
add doc links across the whole UI by @bghira in #2122
lr_end should be coerced into numeric by @bghira in #2126
dataset uploading via webui / api for local backend by @bghira in #2128
add better immediately-visible error state validation for common issues on dataset configurations by @bghira in #2129
(#2124) use prefixed temporary dataset name instead of duplicating by @bghira in #2130
(#2127) bump epoch inside checkpoint after writing when track-by-epoch is in use by @bghira in #2131
(#2123) CREPA: Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models (arXiv:2506.09229v2) by @bghira in #2132
CREPA: add docLinks to UI by @bghira in #2135
support running via Cog by @bghira in #2028
add ReflexFlow enhancements to scheduled sampling rollout for flow-matching models (2512.04904v1) by @bghira in #2133
ReflexFlow: enable by default when flow-matching scheduled sampling is enabled by @bghira in #2138
add ffmpeg to the deps for webui tutorial by @bghira in #2140
do not fill in pretrained_model_name_or_path with the model flavour default upon environment creation by @bghira in #2141
[UI] prevent quant from being used for full training; prevent LoRA combined with DeepSpeed by @bghira in #2142
fix text encoders not quantising when launched via API or WebUI by @bghira in #2144
merge by @bghira in #2145
Bump version from 3.2.1 to 3.2.2 by @bghira in #2146

Full Changelog: v3.2.1...v3.2.2

Contributors

bghira

Assets 2

09 Dec 04:39

bghira

v3.2.1

c85df6b

v3.2.1 - bugfix release

Bugfixes

UI update improvements behind Cloudflare tunnels (cache busting)
Text prompt cache lookup failure fixed when using instanceprompt or captions with random extra space at the end
Hunyuanvideo 1.5 PEFT LoRA mixin no longer missing
Weights & Biases scatterplot log spam fixed
Hunyuanvideo 1.5 VAE optimisations for Conv3D patchifying and temporal roll
When validation was disabled, text encoders did not unload properly
Webhook spam when none configured

Multigpu fixes

Manual GPU selection via WebUI was not working
MultiGPU logging improved, duplicates removed, ANSI codes stripped
Batch-parallel multigpu validations no longer deadlock
Memory use on rank > 0 reduced by 15% (on A100 80G, about 10G of VRAM wasted by T5 or Qwen3 etc)

What's Changed

break cache for more scripts by @bghira in #2084
fix prompt replacement when using scheduled sampling by @bghira in #2085
reduce noisy scatter plotting for wandb by @bghira in #2087
align computation strip() and retrieval which does not strip() by @bghira in #2088
merge by @bghira in #2089
hunyuanvideo-1.5: add Peft mixin (#2091) by @bghira in #2092
Update huggingface.py to respect HF_HOME by @StableLlama in #2096
add VAE rolling option for hunyuanvideo 3D VAE, ported from comfyUI by @bghira in #2093
remove s from default config path by @bghira in #2099
manual GPU selection assignment fix, ensuring correct GPU is used for job by @bghira in #2100
multigpu logging fixes: remove duplicates, strip ANSI codes by @bghira in #2101
send multigpu validation trigger through shared file by @bghira in #2102
multigpu validations: batch-parallel, assistant LoRA fix by @bghira in #2103
reload on subprocesses instead of returning empty buckets for hf dataset by @bghira in #2104
multigpu validation: improve schedule check to avoid hang by @bghira in #2109
memory use optimisations; disable grad calc for text encoder by @bghira in #2110
use official diffusers paths for hv 1.5 by @bghira in #2097
(#2106) kandinsky i2i scale factor should be the size scale factor, not shift + scale by @bghira in #2111
(#2105) reduce noise by not spamming the webhook when none is configured by @bghira in #2112
merge by @bghira in #2113
Bump version from 3.2.0 to 3.2.1 by @bghira in #2114

Full Changelog: v3.2.0...v3.2.1

Contributors

bghira and StableLlama

Assets 2

05 Dec 17:39

bghira

v3.2.0

1961b00

v3.2.0 - py3.12 minimum, dataset page redesign, LongCat Image, diff2flow

Bugfixes

CLIP evaluation datasets will preprocess correctly, useful for Qwen now
hunyuanvideo 1.5 VAE now more efficient, thanks to kohya-ss patch logic being ported
perflow has been redesigned and integrated fully, no longer partially unavailable
memory usage on crash should be reclaimed fully

Features

Longcat Image 6B, t2i and edit flavours. Quickstart is available in documentation/quickstart
MuonClip optimiser as an experimental option which uses a novel attention layer integration for stability
ModelSpec v1.0.1 now written to all saved model outputs (EMA, checkpoints, LyCORIS and LoRA)
diff2flow for DDPMs like SD1x, SD2x, DeepFloyd, SDXL, Stable Cascade (stage C) and PixArt Sigma (600M, 900M, MoE)
Ostris' de-turbo and turbo assistant lora v2 now easily selectable via webUI
Concept slider LoRA training across all model architectures (incl. Z-Image)
New Dataset page layout option in webUI, more intuitive layout for detailed view
Redesigned perflow distillation mechanism, now includes ODE endpoint pre-caching
Scheduled sampling for all models, massively improving training quality through reduction of exposure bias

What's Changed

add turbo-ostris-v2 flavour for zimage new assistant LoRA by @bghira in #2069
eval dataset type needs full pre-processing chain by @bghira in #2066
MuonClip for transformer models by @bghira in #2068
deprecate and remove python 3.11 support by @bghira in #2071
ModelSpec support for LoRA, checkpoints, EMA, and LyCORIS model metadata by @bghira in #2070
diff2flow and sequential training-time sampling by @bghira in #2053
hunyuanvideo-1.5: opt-in efficient patch-based Conv3D path for autoencoder w/ per-frame sliced attention and reduced causal masking by @bghira in #2073
PeRFlow: integrate segmented reflow distiller as backend option w/ ODE cache provider (arXiv:2405.20320) by @bghira in #2072
add ostris de-turbo model_flavour by @bghira in #2076
use detail blocks to clean up docs by @bghira in #2077
concept slider lycoris / lora by @bghira in #2075
dataloader builder redesign by @bghira in #2078
stop fetchers, unload model components, reclaim memory at multiple exit/crash points by @bghira in #2080
add Longcat Image 6B by @bghira in #2082
merge by @bghira in #2081

Full Changelog: v3.1.6...v3.2.0

Contributors

bghira

Assets 2

03 Dec 19:19

bghira

v3.1.6

db0d7ab

v3.1.6 - comfyUI LoRA format support, FSDP2+LyCORIS compat

Python deprecation notice

The next release (3.2) will no longer support python 3.11 based installs, instead a minimum python 3.12 will be expected.

What's Changed

when checkpointing by epoch, set state BEFORE checkpoint by @bghira in #2041
pixart controlnet validation fix by @bghira in #2042
resolve background being white with white text in some browsers (Chrome) by @bghira in #2040
add --lora_format to allow switching to comfyUI style outputs by @bghira in #2032
fix indent issue that caused error when loading lora with comfyui format enabled by @bghira in #2044
log when we load assistant lora by @bghira in #2043
solve timeout / async DOM access issue when opt-in to create a new dataset config by @bghira in #2045
dataset builder: relocate save button to action bar, fix dropdown config selector sync by @bghira in #2046
add post_checkpoint_script entrypoint for external checkpoint renaming, cataloguing etc by @bghira in #2047
add log line for loading assistant lora at trainin time by @bghira in #2048
remove now-unnecessary check for comfyUI format etc by @bghira in #2049
fix issue where epoch-based validation tracking triggers on step 2 by @bghira in #2054
by default show quantize_via if lora mode enabled, and use a helper to set field visibility instead by @bghira in #2056
Z-Image: port fp16 clamping from ComfyUI in case RTX 20xx users need mixed_precision=fp16 by @bghira in #2058
evaluation needs mu calculation for dynamic shifting models, which we will delegate to the model implementation by @bghira in #2059
evaluation datasets should not require alignment with batch size or GPU count by @bghira in #2060
(#2052) add --eval_epoch_interval for fractional epoch tracking and periodic scheduled CLIP evaluations by @bghira in #2061
dataset can detect missing metadata entries and trigger a scan by @bghira in #2062
luminance values may be partially unavailable during collate by @bghira in #2065
fsdp2: compatibility with LyCORIS by @bghira in #2064
(#2024) add swanlab integration by @bghira in #2063
attempt stronger unload for torchao text encoder by @bghira in #2055
merge by @bghira in #2067

Full Changelog: v3.1.5...v3.1.6

Contributors

bghira

Assets 2

01 Dec 00:25

bghira

v3.1.5

d803db6

v3.1.5 - precision and loss masking fixes

Important notice

This release is a required quality/precision upgrade for users of the following models:

auraflow
cosmos2
flux.1
kandinsky5 image

Technical details

Tensor dtype standardization and device placement:

All occurrences where tensors are created or cast (such as timesteps, latent image IDs, text IDs, and guidance values) are now explicitly set to torch.float32 and placed on the correct device, replacing previous usage of variable or inferred dtypes. This affects model and pipeline files for auraflow, cosmos, flux, and kandinsky5_image.

Masking fix

The logic for duplicating metadata between datasets in the data backend factory is updated to allow the "mask" conditioning type (in addition to "reference_strict") and to improve log messages for clarity.

Assets 2

29 Nov 23:34

bghira

v3.1.4

7d37b51

v3.1.4 - Flux2, Z-Image Turbo, Flux Schnell, Hunyuan Video 1.5

What's Changed

Kandinsky5 validation image previews by @bghira in #2022
ramtorch support for rocm and cuda devices by @bghira in #2027
FLUX.2-Dev with TREAD and text encoder / vae optional group offload by @bghira in #2026
HunyuanVideo 1.5 by @bghira in #2023
Z-Image by @bghira in #2029
add link to z-image guide and note that flux schnell trains correctly now by @bghira in #2031
merge by @bghira in #2033

Full Changelog: v3.1.3...v3.1.4

Contributors

bghira

Assets 2

24 Nov 00:36

bghira

v3.1.3

290df51

v3.1.3 - Kandinsky5, checkpoint providers (S3, Azure, Backblaze, Dropbox), background uploads

New Features

Kandinsky5 Video & Image training - t2i, t2v, i2v, i2i
FSDP2 grad norm clipping
Optional background uploads for checkpoints to avoid blocking training
Optional external script execution to run validation via cloud or 2nd GPU
Custom Accelerate tracker support via simpletuner/custom-trackers/ plugin dir

Bugfixes

Training config wizard now replaces the config correctly
Qwen Image validations should now work reliably again
Web UI issue with event lifecycle not showing / clearing prematurely

What's Changed

(#916) add --push_to_hub_background to publish models asynchronously by @bghira in #2003
support S3, Backblaze, Azure Blob and Dropbox checkpoint publishing targets by @bghira in #2004
clean up test outputs and fix minor/sporadic issues by @bghira in #2006
config wizard should overwrite entire environment it replaces by @bghira in #2007
add --validation_method which defaults to simpletuner-local; add --validation_external_script for user-provided path and arguments to run validation pipeline with by @bghira in #2008
(#1730) support custom accelerate trackers by @bghira in #2010
qwen edit: v2 should not instantiate embed processor; v1 should cache pixel grid and collect captions for later processing during ref-image embedding; ref-images should be embedded instead of target-images by @bghira in #2009
Revert "qwen edit: v2 should not instantiate embed processor; v1 should cache pixel grid and collect captions for later processing during ref-image embedding; ref-images should be embedded instead of target-images" by @bghira in #2012
merge by @bghira in #2011
ace-step: demo config by @bghira in #2014
qwen-image does not have image parameter on encode_prompt by @bghira in #2015
training event lifecycle was hidden when training state was running by @bghira in #2016
qwen-image: validation should pack tensors when input is not by @bghira in #2017
add problem-solving tips to AGENTS.md by @bghira in #2018
remove --cache_clear_validation_prompts as currently the prompts are replaced at every startup by @bghira in #2019
Kandinsky-5 Video and Image model training (T2V, I2V, T2I, I2I) by @bghira in #2013
FSDP2 support for clip max grad norm by @bghira in #2020
merge by @bghira in #2021

Full Changelog: v3.1.2...v3.1.3

Contributors

bghira

Assets 2

Releases: bghira/SimpleTuner

v3.3.1

What's Changed

Contributors

Uh oh!

v3.3.0 - TwinFlow, LayerSync, and Flux.2 edit training

Features

Bugfixes

What's Changed

Contributors

Uh oh!

v3.2.3

Features

Bugfixes

What's Changed

Contributors

Uh oh!

v3.2.2 - Better video training, built-in doc links for UI, and TE quant fix

Features

Bugfixes

What's Changed

Contributors

Uh oh!

v3.2.1 - bugfix release

Bugfixes

Multigpu fixes

What's Changed

Contributors

Uh oh!

v3.2.0 - py3.12 minimum, dataset page redesign, LongCat Image, diff2flow

Bugfixes

Features

What's Changed

Contributors

Uh oh!

v3.1.6 - comfyUI LoRA format support, FSDP2+LyCORIS compat

Python deprecation notice

What's Changed

Contributors

Uh oh!

v3.1.5 - precision and loss masking fixes

Important notice

Technical details

Masking fix

Uh oh!

v3.1.4 - Flux2, Z-Image Turbo, Flux Schnell, Hunyuan Video 1.5

What's Changed

Contributors

Uh oh!

v3.1.3 - Kandinsky5, checkpoint providers (S3, Azure, Backblaze, Dropbox), background uploads

New Features

Bugfixes

What's Changed

Contributors

Uh oh!