Release v3.3.3 - more memory optimisations · bghira/SimpleTuner

Features

SDNQ quantisation engine for weights and optimisers
Musubi block swap expanded to cover auraflow, chroma, longcat-image, lumina2, omnigen, hidream, sana, sd3, and z-image
Kandinsky5 memory-efficient VAE now used instead of Diffusers' HunyuanVideo implementation (runs on consumer hw)
resolution_frames bucket strategy for video training so that multi-length dataset is possible with just a single config entry
WebUI: Training configuration wizard now allows filling in the number of checkpoints to keep
metadata will be written to the model / LoRA checkpoint for ComfyUI LoRA Auto Trigger Words node to make use of
OmniGen & Lumina2: TREAD, TwinFlow, and LayerSync
Qwen Image: experimental tiled attention support that avoids OOM in attention calc (disabled, have to enter the code to enable it for now)

Bugfixes

RamTorch
- Now applies to text encoders properly (incl CLIP)
- Extended to support Conv2D and Embedding layers (eg. SDXL offload)
- Compatibility with Quanto (tested with int2, int4, int8-quanto)
- System memory use reduction by not calculating gradients when requires_grad=False
Text encoder memory not unloading fixed for Qwen Image
No more quantize_via pipeline error when no quantisation is enabled
Qwen Image batch size > 1 training fixed (padded)
ROCm: bypass PyTorch bug for building kernels, enabling full Quanto compatibility (int2, int4, int8, fp8)

What's Changed

add metadata for ComfyUI-Lora-Auto-Trigger-Words node by @bghira in #2222
auraflow: implement musubi block swap by @bghira in #2227
chroma: implement musubi block swap by @bghira in #2228
longcat image: implement musubi block swap by @bghira in #2230
modernise lumina2 implementation with TREAD, block swapping, twinflow and layersync by @bghira in #2231
modernise omnigen implementation with TREAD, block swapping, twinflow and layersync by @bghira in #2232
pixart: implement musubi block swap by @bghira in #2233
add qwen-edit-2511 support, and an edit-v2+ flavour which enables 2511 features on 2509 by @bghira in #2223
hidream: implement musubi block swap by @bghira in #2234
sana & sanavideo: implement musubi block swap by @bghira in #2235
sd3: implement musubi block swap by @bghira in #2236
z-image turbo & omni: implement musubi block swap by @bghira in #2237
use kandinsky5 optimised VAE with added temporal roll and chunked conv3d by @bghira in #2229
when preparing model with offload enabled, do not move to accelerator by @bghira in #2238
docs: document SIMPLETUNER_JOB_ID env var for webhook job_id by @rafstahelin in #2239
sdnq quant engine by @bghira in #2225
fix error str vs int comparison by @bghira in #2241
fix error when quantize_via=pipeline but no_change level was provided by @bghira in #2242
ramtorch: when using it for text encoders, do not move to gpu by @bghira in #2244
add resolution_frames bucket strategy for video datasets so that different lengths can exist in one dataset by @bghira in #2240
add checkpoints total limit to wizard by @bghira in #2243
qwen image: fix padding for text embeds by @bghira in #2246
quanto: fix ROCm compiler error for int2-quanto; fix for RamTorch compatibility by @bghira in #2248
qwen image: tiled attention fallback when we hit OOM by @bghira in #2249
ramtorch: fix for gradient memory ballooning; fix text encoder application; extend for Conv2D and Embedding offload by @bghira in #2250
merge by @bghira in #2251

New Contributors

@rafstahelin made their first contribution in #2239

Full Changelog: v3.3.2...v3.3.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.3.3 - more memory optimisations

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Features

Bugfixes

What's Changed

New Contributors

Contributors

Uh oh!