Skip to content

Releases: bghira/SimpleTuner

v3.3.1

19 Dec 03:30
1d20509

Choose a tag to compare

What's Changed

  • flux2: do not bypass the special model loader by @bghira in #2170
  • (#2030) scheduled dataset sampling by @bghira in #2167
  • GLANCE: better code example by @bghira in #2171
  • TwinFlow: do not initialise neg time embed when disabled by @bghira in #2174
  • UI (datasets): remove ControlNet conditioning option from selections when CN is disabled; select reference_strict by default otherwise by @bghira in #2177
  • add missing LayerSync support to kandinsky5 video by @bghira in #2179
  • qwen-edit: fix text embed cache generation with image context; disable image embeddings for multi-conditioning input by @bghira in #2176
  • chroma 4d text embed fix by @bghira in #2181
  • ensure edit-v2 either uses 1:1 or 0 image embeds by @bghira in #2186
  • upload zip: preserve subdirs by @bghira in #2189
  • allow simpletuner server env=... to auto-start training after webUI launches by @bghira in #2191
  • add more indicators to dataset page when conditioning parameters are not set by @bghira in #2192
  • Git-based configuration sync across SimpleTuner nodes (wip) by @bghira in #2172
  • Z-Image-Omni with optional SigLIP conditioning support, TREAD, LayerSync, CFG layer skip, fp16 clamping, and TwinFlow by @bghira in #2183
  • (#2182) add --peft_lora_target_modules for arbitrary layer definition by @bghira in #2193
  • (#2190) add webUI onboarding config to "simpletuner configure" by @bghira in #2194
  • merge by @bghira in #2196
  • (#2173) remove early check for CREPA since we are using LayerSync features with certain configs by @bghira in #2195
  • (#2187) better image resizing for validation inputs when validation resolution != training resolution by @bghira in #2197
  • adjust default resolution on dataset page to equal --resolution, and ensure min/max/target down sample size are equal by @bghira in #2198
  • merge by @bghira in #2199

Full Changelog: v3.3.0...v3.3.1

v3.3.0 - TwinFlow, LayerSync, and Flux.2 edit training

16 Dec 21:51
2e018b5

Choose a tag to compare

Features

  • TwinFlow, a distillation method that works on most flow-matching arch and converges in much less time than typical distillation
  • LayerSync, a self-regularisation method for practically all transformer models supported in SimpleTuner
  • CREPA can combine forces with LayerSync to self-regulate instead of using DINO features
  • Flux.2 can now accept conditioning datasets
  • Custom flow-matching timesteps can be provided for training, allowing configuration of "Glance" style training runs
  • WebUI: better path handling for datasets, sensible defaults will be set instead of requiring the user to figure it out
  • CLI: When configuring dataset cache directories, you can now use {id}, {output_dir} in addition to {model_family} to make dynamic paths that adjust automatically based on these attributes

Bugfixes

  • WebUI: Search box race condition resolved that prevented items from highlighting, or subsections from expanding

What's Changed

  • TwinFlow self-directed distillation by @bghira in #2159
  • (#2136) add --flow_custom_timesteps with Glance "distillation" example by @bghira in #2160
  • flux2: adjust comfyUI lora export format to use their custom keys instead of generic LoRA layout by @bghira in #2162
  • [webUI] refactoring validation and default paths for text embed and VAE caches by @bghira in #2163
  • flux2: support conditioning datasets by @bghira in #2164
  • fix search box race condition that prevented expanding subsection or highlighting results by @bghira in #2165
  • LayerSync + CREPA adaptation by @bghira in #2161
  • merge by @bghira in #2166

Full Changelog: v3.2.3...v3.3.0

v3.2.3

15 Dec 18:25
a1528dc

Choose a tag to compare

Features

  • --musubi_blocks_to_swap feature ported from musubi-tuner, adapted for SimpleTuner's Diffusers frankenstein build
  • LongCat Video 13.6B (needs a lot of system memory, block swapping, or ramtorch)
  • ROCm updated to torch 2.9.1 (still stuck on ROCm 6.4 though)
  • Exposed int4-torchao as a quant option, centric around NVIDIA cards but if you go the distance and enable FBGEMM-GENAI on ROCm, it'll work there too
  • --quantize_via=pipeline a new opt-in mode that will quantize when loading straight from disk, which should get rid of the ballooning system memory consumption before moving to GPU
  • Load .gguf models with straight-through quantisation (reduced system memory use at startup) by pointing --pretrained_transformer_model_name_or_path straight to the .gguf file
  • ReflexFlow is now the default mode for scheduled sampling on flow-matching models

Bugfixes

  • Torch compile validation now works with Ramtorch and Validation LoRA adapter(s)
  • Ramtorch fixes for validation with PEFT LoRA training
  • Ramtorch fixes for full model training (no LoRA/LyCORIS)
  • Support python 3.13 for ROCm systems
  • Better error message with exploding due to lack of bitsandbytes (ROCm, Apple)
  • Resolve stuck subprocess consuming VRAM when exit fails because the trainer is busy and cannot respond in time
  • Fix scrollbars on dataloader UI path fields consuming space
  • ReflexFlow ADR sign calculation fixed, no longer breaks model and pushes toward noise
  • Fix for --text_encoder_x_precision that was not correctly quantising any text encoders when launched via webUI
  • Fix for multigpu training with Ramtorch (DDP tensors)
  • Added ffmpeg as a system / container dependency in the webui/api tutorials
  • Flux2 text encoder OOM (related to text encoder precision) fixed
  • Minor QoL improvements to web interface

What's Changed

  • ROCm: torch 2.9.1 dependencies by @bghira in #2147
  • LongCat Video 13.6B by @bghira in #2083
  • ROCm: support py 3.13 by @bghira in #2148
  • musubi blocks to swap requires model to remain on CPU by @bghira in #2149
  • resolve error when using bitsandbytes quant level without it being installed by @bghira in #2150
  • ramtorch: do not move full model to accelerator by @bghira in #2151
  • ramtorch: enable use of validation adapters during full model training by @bghira in #2152
  • fix validation for compiled model with validation adapter LoRA by @bghira in #2153
  • honour request to stop training and terminate subprocess when accelerate is used to launch by @bghira in #2154
  • prevent scrollbars from consuming too much space by @bghira in #2155
  • ReflexFlow: fix sign of ADR calc, resolving extremely high loss and noise by @bghira in #2156
  • merge by @bghira in #2157

Full Changelog: v3.2.2...v3.2.3

v3.2.2 - Better video training, built-in doc links for UI, and TE quant fix

12 Dec 19:01
1d02264

Choose a tag to compare

Features

  • CREPA for better motion alignment when training on videos
  • Documentation links added to all options in the dataset page and elsewhere that lead to the online Github docs
  • Speed statistics now published to webhook & visible in webUI
  • ReflexFlow now enabled by default for flow-matching models when scheduled_sampling is enabled
  • grad_absmax is now visible via webhook and webUI

Bugfixes

  • Text encoder precision level is now honoured by the API / webUI launch
  • Flux2 quantised text encoder now loads without resorting to fp32
  • HunyuanVideo 1.5 fixes for LoRA training
  • lr_end is now correctly numeric after saving config
  • Better validation for most common dataset config errors on dataset UI page
  • Renaming datasets no longer glitches out in UI
  • Correctly write epoch statistics out to checkpoint when saving by epoch interval
  • No longer filling in pretrained_model_name_or_path incorrectly when bootstrapping an environment from a model config via UI

What's Changed

  • HunyuanVideo 1.5: refactor to use Diffusers v0.36.0 implementation by @bghira in #2115
  • (#2116) add iterationtracker for calculating throughput and publishing rate statistics via webhook by @bghira in #2118
  • emit grad absmax to webhook integration, display in UI by @bghira in #2120
  • add more dataloader options for configuration via UI, and docLinks for all options by @bghira in #2121
  • add doc links across the whole UI by @bghira in #2122
  • lr_end should be coerced into numeric by @bghira in #2126
  • dataset uploading via webui / api for local backend by @bghira in #2128
  • add better immediately-visible error state validation for common issues on dataset configurations by @bghira in #2129
  • (#2124) use prefixed temporary dataset name instead of duplicating by @bghira in #2130
  • (#2127) bump epoch inside checkpoint after writing when track-by-epoch is in use by @bghira in #2131
  • (#2123) CREPA: Cross-Frame Representation Alignment for Fine-Tuning Video Diffusion Models (arXiv:2506.09229v2) by @bghira in #2132
  • CREPA: add docLinks to UI by @bghira in #2135
  • support running via Cog by @bghira in #2028
  • add ReflexFlow enhancements to scheduled sampling rollout for flow-matching models (2512.04904v1) by @bghira in #2133
  • ReflexFlow: enable by default when flow-matching scheduled sampling is enabled by @bghira in #2138
  • add ffmpeg to the deps for webui tutorial by @bghira in #2140
  • do not fill in pretrained_model_name_or_path with the model flavour default upon environment creation by @bghira in #2141
  • [UI] prevent quant from being used for full training; prevent LoRA combined with DeepSpeed by @bghira in #2142
  • fix text encoders not quantising when launched via API or WebUI by @bghira in #2144
  • merge by @bghira in #2145
  • Bump version from 3.2.1 to 3.2.2 by @bghira in #2146

Full Changelog: v3.2.1...v3.2.2

v3.2.1 - bugfix release

09 Dec 04:39
c85df6b

Choose a tag to compare

Bugfixes

  • UI update improvements behind Cloudflare tunnels (cache busting)
  • Text prompt cache lookup failure fixed when using instanceprompt or captions with random extra space at the end
  • Hunyuanvideo 1.5 PEFT LoRA mixin no longer missing
  • Weights & Biases scatterplot log spam fixed
  • Hunyuanvideo 1.5 VAE optimisations for Conv3D patchifying and temporal roll
  • When validation was disabled, text encoders did not unload properly
  • Webhook spam when none configured

Multigpu fixes

  • Manual GPU selection via WebUI was not working
  • MultiGPU logging improved, duplicates removed, ANSI codes stripped
  • Batch-parallel multigpu validations no longer deadlock
  • Memory use on rank > 0 reduced by 15% (on A100 80G, about 10G of VRAM wasted by T5 or Qwen3 etc)

What's Changed

  • break cache for more scripts by @bghira in #2084
  • fix prompt replacement when using scheduled sampling by @bghira in #2085
  • reduce noisy scatter plotting for wandb by @bghira in #2087
  • align computation strip() and retrieval which does not strip() by @bghira in #2088
  • merge by @bghira in #2089
  • hunyuanvideo-1.5: add Peft mixin (#2091) by @bghira in #2092
  • Update huggingface.py to respect HF_HOME by @StableLlama in #2096
  • add VAE rolling option for hunyuanvideo 3D VAE, ported from comfyUI by @bghira in #2093
  • remove s from default config path by @bghira in #2099
  • manual GPU selection assignment fix, ensuring correct GPU is used for job by @bghira in #2100
  • multigpu logging fixes: remove duplicates, strip ANSI codes by @bghira in #2101
  • send multigpu validation trigger through shared file by @bghira in #2102
  • multigpu validations: batch-parallel, assistant LoRA fix by @bghira in #2103
  • reload on subprocesses instead of returning empty buckets for hf dataset by @bghira in #2104
  • multigpu validation: improve schedule check to avoid hang by @bghira in #2109
  • memory use optimisations; disable grad calc for text encoder by @bghira in #2110
  • use official diffusers paths for hv 1.5 by @bghira in #2097
  • (#2106) kandinsky i2i scale factor should be the size scale factor, not shift + scale by @bghira in #2111
  • (#2105) reduce noise by not spamming the webhook when none is configured by @bghira in #2112
  • merge by @bghira in #2113
  • Bump version from 3.2.0 to 3.2.1 by @bghira in #2114

Full Changelog: v3.2.0...v3.2.1

v3.2.0 - py3.12 minimum, dataset page redesign, LongCat Image, diff2flow

05 Dec 17:39
1961b00

Choose a tag to compare

Bugfixes

  • CLIP evaluation datasets will preprocess correctly, useful for Qwen now
  • hunyuanvideo 1.5 VAE now more efficient, thanks to kohya-ss patch logic being ported
  • perflow has been redesigned and integrated fully, no longer partially unavailable
  • memory usage on crash should be reclaimed fully

Features

  • Longcat Image 6B, t2i and edit flavours. Quickstart is available in documentation/quickstart
  • MuonClip optimiser as an experimental option which uses a novel attention layer integration for stability
  • ModelSpec v1.0.1 now written to all saved model outputs (EMA, checkpoints, LyCORIS and LoRA)
  • diff2flow for DDPMs like SD1x, SD2x, DeepFloyd, SDXL, Stable Cascade (stage C) and PixArt Sigma (600M, 900M, MoE)
  • Ostris' de-turbo and turbo assistant lora v2 now easily selectable via webUI
  • Concept slider LoRA training across all model architectures (incl. Z-Image)
  • New Dataset page layout option in webUI, more intuitive layout for detailed view
  • Redesigned perflow distillation mechanism, now includes ODE endpoint pre-caching
  • Scheduled sampling for all models, massively improving training quality through reduction of exposure bias

What's Changed

  • add turbo-ostris-v2 flavour for zimage new assistant LoRA by @bghira in #2069
  • eval dataset type needs full pre-processing chain by @bghira in #2066
  • MuonClip for transformer models by @bghira in #2068
  • deprecate and remove python 3.11 support by @bghira in #2071
  • ModelSpec support for LoRA, checkpoints, EMA, and LyCORIS model metadata by @bghira in #2070
  • diff2flow and sequential training-time sampling by @bghira in #2053
  • hunyuanvideo-1.5: opt-in efficient patch-based Conv3D path for autoencoder w/ per-frame sliced attention and reduced causal masking by @bghira in #2073
  • PeRFlow: integrate segmented reflow distiller as backend option w/ ODE cache provider (arXiv:2405.20320) by @bghira in #2072
  • add ostris de-turbo model_flavour by @bghira in #2076
  • use detail blocks to clean up docs by @bghira in #2077
  • concept slider lycoris / lora by @bghira in #2075
  • dataloader builder redesign by @bghira in #2078
  • stop fetchers, unload model components, reclaim memory at multiple exit/crash points by @bghira in #2080
  • add Longcat Image 6B by @bghira in #2082
  • merge by @bghira in #2081

Full Changelog: v3.1.6...v3.2.0

v3.1.6 - comfyUI LoRA format support, FSDP2+LyCORIS compat

03 Dec 19:19
db0d7ab

Choose a tag to compare

Python deprecation notice

The next release (3.2) will no longer support python 3.11 based installs, instead a minimum python 3.12 will be expected.

What's Changed

  • when checkpointing by epoch, set state BEFORE checkpoint by @bghira in #2041
  • pixart controlnet validation fix by @bghira in #2042
  • resolve background being white with white text in some browsers (Chrome) by @bghira in #2040
  • add --lora_format to allow switching to comfyUI style outputs by @bghira in #2032
  • fix indent issue that caused error when loading lora with comfyui format enabled by @bghira in #2044
  • log when we load assistant lora by @bghira in #2043
  • solve timeout / async DOM access issue when opt-in to create a new dataset config by @bghira in #2045
  • dataset builder: relocate save button to action bar, fix dropdown config selector sync by @bghira in #2046
  • add post_checkpoint_script entrypoint for external checkpoint renaming, cataloguing etc by @bghira in #2047
  • add log line for loading assistant lora at trainin time by @bghira in #2048
  • remove now-unnecessary check for comfyUI format etc by @bghira in #2049
  • fix issue where epoch-based validation tracking triggers on step 2 by @bghira in #2054
  • by default show quantize_via if lora mode enabled, and use a helper to set field visibility instead by @bghira in #2056
  • Z-Image: port fp16 clamping from ComfyUI in case RTX 20xx users need mixed_precision=fp16 by @bghira in #2058
  • evaluation needs mu calculation for dynamic shifting models, which we will delegate to the model implementation by @bghira in #2059
  • evaluation datasets should not require alignment with batch size or GPU count by @bghira in #2060
  • (#2052) add --eval_epoch_interval for fractional epoch tracking and periodic scheduled CLIP evaluations by @bghira in #2061
  • dataset can detect missing metadata entries and trigger a scan by @bghira in #2062
  • luminance values may be partially unavailable during collate by @bghira in #2065
  • fsdp2: compatibility with LyCORIS by @bghira in #2064
  • (#2024) add swanlab integration by @bghira in #2063
  • attempt stronger unload for torchao text encoder by @bghira in #2055
  • merge by @bghira in #2067

Full Changelog: v3.1.5...v3.1.6

v3.1.5 - precision and loss masking fixes

01 Dec 00:25
d803db6

Choose a tag to compare

Important notice

This release is a required quality/precision upgrade for users of the following models:

  • auraflow
  • cosmos2
  • flux.1
  • kandinsky5 image

Technical details

Tensor dtype standardization and device placement:

  • All occurrences where tensors are created or cast (such as timesteps, latent image IDs, text IDs, and guidance values) are now explicitly set to torch.float32 and placed on the correct device, replacing previous usage of variable or inferred dtypes. This affects model and pipeline files for auraflow, cosmos, flux, and kandinsky5_image.

Masking fix

  • The logic for duplicating metadata between datasets in the data backend factory is updated to allow the "mask" conditioning type (in addition to "reference_strict") and to improve log messages for clarity.

v3.1.4 - Flux2, Z-Image Turbo, Flux Schnell, Hunyuan Video 1.5

29 Nov 23:34
7d37b51

Choose a tag to compare

What's Changed

Full Changelog: v3.1.3...v3.1.4

v3.1.3 - Kandinsky5, checkpoint providers (S3, Azure, Backblaze, Dropbox), background uploads

24 Nov 00:36
290df51

Choose a tag to compare

New Features

  • Kandinsky5 Video & Image training - t2i, t2v, i2v, i2i
  • FSDP2 grad norm clipping
  • Optional background uploads for checkpoints to avoid blocking training
  • Optional external script execution to run validation via cloud or 2nd GPU
  • Custom Accelerate tracker support via simpletuner/custom-trackers/ plugin dir

Bugfixes

  • Training config wizard now replaces the config correctly
  • Qwen Image validations should now work reliably again
  • Web UI issue with event lifecycle not showing / clearing prematurely

What's Changed

  • (#916) add --push_to_hub_background to publish models asynchronously by @bghira in #2003
  • support S3, Backblaze, Azure Blob and Dropbox checkpoint publishing targets by @bghira in #2004
  • clean up test outputs and fix minor/sporadic issues by @bghira in #2006
  • config wizard should overwrite entire environment it replaces by @bghira in #2007
  • add --validation_method which defaults to simpletuner-local; add --validation_external_script for user-provided path and arguments to run validation pipeline with by @bghira in #2008
  • (#1730) support custom accelerate trackers by @bghira in #2010
  • qwen edit: v2 should not instantiate embed processor; v1 should cache pixel grid and collect captions for later processing during ref-image embedding; ref-images should be embedded instead of target-images by @bghira in #2009
  • Revert "qwen edit: v2 should not instantiate embed processor; v1 should cache pixel grid and collect captions for later processing during ref-image embedding; ref-images should be embedded instead of target-images" by @bghira in #2012
  • merge by @bghira in #2011
  • ace-step: demo config by @bghira in #2014
  • qwen-image does not have image parameter on encode_prompt by @bghira in #2015
  • training event lifecycle was hidden when training state was running by @bghira in #2016
  • qwen-image: validation should pack tensors when input is not by @bghira in #2017
  • add problem-solving tips to AGENTS.md by @bghira in #2018
  • remove --cache_clear_validation_prompts as currently the prompts are replaced at every startup by @bghira in #2019
  • Kandinsky-5 Video and Image model training (T2V, I2V, T2I, I2I) by @bghira in #2013
  • FSDP2 support for clip max grad norm by @bghira in #2020
  • merge by @bghira in #2021

Full Changelog: v3.1.2...v3.1.3