You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: documentation/DATALOADER.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -348,13 +348,40 @@ A video dataset should be a folder of (eg. mp4) video files and the usual method
348
348
```
349
349
350
350
- In the `video` subsection, we have the following keys we can set:
351
-
-`num_frames` (optional, int) is how many seconds of data we'll train on.
351
+
-`num_frames` (optional, int) is how many frames of data we'll train on.
352
352
- At 25 fps, 125 frames is 5 seconds of video, standard output. This should be your target.
353
353
-`min_frames` (optional, int) determines the minimum length of a video that will be considered for training.
354
354
- This should be at least equal to `num_frames`. Not setting it ensures it'll be equal.
355
355
-`max_frames` (optional, int) determines the maximum length of a video that will be considered for training.
356
356
-`is_i2v` (optional, bool) determines whether i2v training will be done on a dataset.
357
357
- This is set to True by default for LTX. You can disable it, however.
358
+
-`bucket_strategy` (optional, string) determines how videos are grouped into buckets:
359
+
-`aspect_ratio` (default): Bucket by spatial aspect ratio only (e.g., `1.78`, `0.75`). Same behavior as image datasets.
360
+
-`resolution_frames`: Bucket by resolution and frame count in `WxH@F` format (e.g., `1920x1080@125`). Useful for training on datasets with varying resolutions and durations.
361
+
-`frame_interval` (optional, int) when using `bucket_strategy: "resolution_frames"`, frame counts are rounded down to the nearest multiple of this value. Set this to your model's required frame count factor (some models require `num_frames - 1` to be divisible by a certain value).
362
+
363
+
**Note:** When using `bucket_strategy: "resolution_frames"` with `num_frames` set, you'll get a single frame bucket and videos shorter than `num_frames` will be discarded. Unset `num_frames` if you want multiple frame buckets with fewer discards.
364
+
365
+
Example using `resolution_frames` bucketing for mixed-resolution video datasets:
366
+
367
+
```json
368
+
{
369
+
"id": "mixed-resolution-videos",
370
+
"type": "local",
371
+
"dataset_type": "video",
372
+
"resolution": 720,
373
+
"resolution_type": "pixel_area",
374
+
"instance_data_dir": "datasets/videos",
375
+
"video": {
376
+
"bucket_strategy": "resolution_frames",
377
+
"frame_interval": 25,
378
+
"min_frames": 25,
379
+
"max_frames": 250
380
+
}
381
+
}
382
+
```
383
+
384
+
This configuration will create buckets like `1280x720@100`, `1920x1080@125`, `640x480@75`, etc. Videos are grouped by their training resolution and frame count (rounded to nearest 25 frames).
Copy file name to clipboardExpand all lines: documentation/OPTIONS.md
+64Lines changed: 64 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -233,6 +233,59 @@ TorchAO includes generally-available 4bit and 8bit optimisers: `ao-adamw8bit`, `
233
233
234
234
It also provides two optimisers that are directed toward Hopper (H100 or better) users: `ao-adamfp8`, and `ao-adamwfp8`
235
235
236
+
#### SDNQ (SD.Next Quantization Engine)
237
+
238
+
[SDNQ](https://github.com/disty0/sdnq) is a quantization library optimized for training that works across all platforms: AMD (ROCm), Apple (MPS), and NVIDIA (CUDA). It provides quantized training with stochastic rounding and quantized optimizer states for memory efficiency.
239
+
240
+
##### Recommended Precision Levels
241
+
242
+
**For full finetuning** (model weights are updated):
243
+
-`uint8-sdnq` - Best balance of memory savings and training quality
244
+
-`uint16-sdnq` - Higher precision for maximum quality (e.g., Stable Cascade)
245
+
-`int16-sdnq` - Signed 16-bit alternative
246
+
-`fp16-sdnq` - Quantized FP16, maximum precision with SDNQ benefits
247
+
248
+
**For LoRA training** (frozen base model weights):
249
+
-`int8-sdnq` - Signed 8-bit, good general purpose choice
**Note:**`int7-sdnq` is available but not recommended (slow and not much smaller than int8).
254
+
255
+
**Important:** Below 5-bit precision, SDNQ automatically enables SVD (Singular Value Decomposition) with 8 steps to maintain quality. SVD takes longer to quantize and is non-deterministic, which is why Disty0 provides pre-quantized SVD models on HuggingFace. SVD adds compute overhead during training, so avoid for full finetuning where weights are actively updated.
256
+
257
+
**Key features:**
258
+
- Cross-platform: Works identically on AMD, Apple, and NVIDIA hardware
259
+
- Training-optimized: Uses stochastic rounding to reduce quantization error accumulation
260
+
- Memory efficient: Supports quantized optimizer state buffers
261
+
- Decoupled matmul: Weight precision and matmul precision are independent (INT8/FP8/FP16 matmul available)
262
+
263
+
##### SDNQ Optimisers
264
+
265
+
SDNQ includes optimizers with optional quantized state buffers for additional memory savings:
266
+
267
+
-`sdnq-adamw` - AdamW with quantized state buffers (uint8, group_size=32)
268
+
-`sdnq-adamw+no_quant` - AdamW without quantized states (for comparison)
269
+
-`sdnq-adafactor` - Adafactor with quantized state buffers
270
+
-`sdnq-came` - CAME optimizer with quantized state buffers
271
+
-`sdnq-lion` - Lion optimizer with quantized state buffers
272
+
-`sdnq-muon` - Muon optimizer with quantized state buffers
273
+
-`sdnq-muon+quantized_matmul` - Muon with INT8 matmul in zeropower computation
274
+
275
+
All SDNQ optimizers use stochastic rounding by default and can be configured with `--optimizer_config` for custom settings like `use_quantized_buffers=false` to disable state quantization.
276
+
277
+
**Muon-specific options:**
278
+
-`use_quantized_matmul` - Enable INT8/FP8/FP16 matmul in zeropower_via_newtonschulz5
-`zeropower_dtype` - Precision for zeropower computation (ignored when `use_quantized_matmul=True`)
281
+
- Prefix args with `muon_` or `adamw_` to set different values for Muon vs AdamW fallback
282
+
283
+
**Pre-quantized models:** Disty0 provides pre-quantized uint4 SVD models at [huggingface.co/collections/Disty0/sdnq](https://huggingface.co/collections/Disty0/sdnq). Load these normally, then convert with `convert_sdnq_model_to_training()` after importing SDNQ (SDNQ must be imported before loading to register with Diffusers).
284
+
285
+
**Note on checkpointing:** SDNQ training models are saved in both native PyTorch format (`.pt`) for training resumption and safetensors format for inference. The native format is required for proper training resumption as SDNQ's `SDNQTensor` class uses custom serialization.
286
+
287
+
**Disk space tip:** To save disk space, you can keep only the quantized weights and use SDNQ's [dequantize_sdnq_training.py](https://github.com/Disty0/sdnq/blob/main/scripts/dequantize_sdnq_training.py) script to dequantize when needed for inference.
288
+
236
289
### `--quantization_config`
237
290
238
291
-**What**: JSON object or file path describing Diffusers `quantization_config` overrides when using `--quantize_via=pipeline`.
@@ -312,6 +365,17 @@ Using `--sageattention_usage` to enable training with SageAttention should be en
312
365
-**What**: Uploads to Hugging Face Hub from a background worker so checkpoint pushes do not pause the training loop.
313
366
-**Why**: Keeps training and validation running while Hub uploads proceed asynchronously. Final uploads are still awaited before the run exits so failures surface.
314
367
368
+
### `--webhook_config`
369
+
370
+
-**What**: Configuration for webhook targets (e.g., Discord, custom endpoints) to receive real-time training events.
371
+
-**Why**: Allows you to monitor training runs with external tools and dashboards, receiving notifications at key training stages.
372
+
-**Notes**: The `job_id` field in webhook payloads can be populated by setting the `SIMPLETUNER_JOB_ID` environment variable before training:
373
+
```bash
374
+
export SIMPLETUNER_JOB_ID="my-training-run-name"
375
+
python train.py
376
+
```
377
+
This is useful for monitoring tools receiving webhooks from multiple training runs to identify which config sent each event. If SIMPLETUNER_JOB_ID is not set, job_id will be null in webhook payloads.
Copy file name to clipboardExpand all lines: documentation/quickstart/LONGCAT_VIDEO.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,6 +83,12 @@ Or launch the Web UI and submit a job with the same config.
83
83
- For image‑to‑video runs, include a conditioning image per sample; it is placed in the first latent frame and kept fixed during sampling.
84
84
- LongCat‑Video is 30 fps by design. The default 93 frames is ~3.1 s; if you change frame counts, keep `(frames - 1) % 4 == 0` and remember duration scales with fps.
85
85
86
+
### Video bucket strategy
87
+
88
+
In your dataset's `video` section, you can configure how videos are grouped:
89
+
-`bucket_strategy`: `aspect_ratio` (default) groups by spatial aspect ratio. `resolution_frames` groups by `WxH@F` format (e.g., `480x832@93`) for mixed-resolution/duration datasets.
90
+
-`frame_interval`: When using `resolution_frames`, round frame counts to this interval (e.g., set to 4 to match the VAE temporal stride).
Copy file name to clipboardExpand all lines: documentation/quickstart/LTXVIDEO.md
+7-2Lines changed: 7 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -373,7 +373,8 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
373
373
"repeats": 0,
374
374
"video": {
375
375
"num_frames": 125,
376
-
"min_frames": 125
376
+
"min_frames": 125,
377
+
"bucket_strategy": "aspect_ratio"
377
378
}
378
379
},
379
380
{
@@ -392,13 +393,17 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
392
393
> See caption_strategy options and requirements in [DATALOADER.md](../DATALOADER.md#caption_strategy).
393
394
394
395
- In the `video` subsection, we have the following keys we can set:
395
-
-`num_frames` (optional, int) is how many seconds of data we'll train on.
396
+
-`num_frames` (optional, int) is how many frames of data we'll train on.
396
397
- At 25 fps, 125 frames is 5 seconds of video, standard output. This should be your target.
397
398
-`min_frames` (optional, int) determines the minimum length of a video that will be considered for training.
398
399
- This should be at least equal to `num_frames`. Not setting it ensures it'll be equal.
399
400
-`max_frames` (optional, int) determines the maximum length of a video that will be considered for training.
400
401
-`is_i2v` (optional, bool) determines whether i2v training will be done on a dataset.
401
402
- This is set to True by default for LTX. You can disable it, however.
403
+
-`bucket_strategy` (optional, string) determines how videos are grouped into buckets:
404
+
-`aspect_ratio` (default): Group by spatial aspect ratio only (e.g., `1.78`, `0.75`).
405
+
-`resolution_frames`: Group by resolution and frame count in `WxH@F` format (e.g., `768x512@125`). Useful for mixed-resolution/duration datasets.
406
+
-`frame_interval` (optional, int) when using `resolution_frames`, round frame counts to this interval. Set this to your model's required frame count factor.
0 commit comments