bghira
diff --git a/‎documentation/DATALOADER.md‎
Lines changed: 28 additions & 1 deletion b/‎documentation/DATALOADER.md‎
Lines changed: 28 additions & 1 deletion
diff --git a/‎documentation/OPTIONS.md‎
Lines changed: 64 additions & 0 deletions b/‎documentation/OPTIONS.md‎
Lines changed: 64 additions & 0 deletions
diff --git a/‎documentation/quickstart/HUNYUANVIDEO.md‎
Lines changed: 11 additions & 1 deletion b/‎documentation/quickstart/HUNYUANVIDEO.md‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎documentation/quickstart/KANDINSKY5_VIDEO.md‎
Lines changed: 11 additions & 1 deletion b/‎documentation/quickstart/KANDINSKY5_VIDEO.md‎
Lines changed: 11 additions & 1 deletion
diff --git a/‎documentation/quickstart/LONGCAT_VIDEO.md‎
Lines changed: 6 additions & 0 deletions b/‎documentation/quickstart/LONGCAT_VIDEO.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎documentation/quickstart/LTXVIDEO.md‎
Lines changed: 7 additions & 2 deletions b/‎documentation/quickstart/LTXVIDEO.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎documentation/quickstart/SANAVIDEO.md‎
Lines changed: 6 additions & 1 deletion b/‎documentation/quickstart/SANAVIDEO.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎documentation/quickstart/WAN.md‎
Lines changed: 7 additions & 2 deletions b/‎documentation/quickstart/WAN.md‎
Lines changed: 7 additions & 2 deletions
diff --git a/‎setup.py‎
Lines changed: 1 addition & 0 deletions b/‎setup.py‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎simpletuner/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎simpletuner/__init__.py‎
Lines changed: 1 addition & 1 deletion
@@ -348,13 +348,40 @@ A video dataset should be a folder of (eg. mp4) video files and the usual method
 ```
 
 - In the `video` subsection, we have the following keys we can set:
-  - `num_frames` (optional, int) is how many seconds of data we'll train on.
+  - `num_frames` (optional, int) is how many frames of data we'll train on.
     - At 25 fps, 125 frames is 5 seconds of video, standard output. This should be your target.
   - `min_frames` (optional, int) determines the minimum length of a video that will be considered for training.
     - This should be at least equal to `num_frames`. Not setting it ensures it'll be equal.
   - `max_frames` (optional, int) determines the maximum length of a video that will be considered for training.
   - `is_i2v` (optional, bool) determines whether i2v training will be done on a dataset.
     - This is set to True by default for LTX. You can disable it, however.
+  - `bucket_strategy` (optional, string) determines how videos are grouped into buckets:
+    - `aspect_ratio` (default): Bucket by spatial aspect ratio only (e.g., `1.78`, `0.75`). Same behavior as image datasets.
+    - `resolution_frames`: Bucket by resolution and frame count in `WxH@F` format (e.g., `1920x1080@125`). Useful for training on datasets with varying resolutions and durations.
+  - `frame_interval` (optional, int) when using `bucket_strategy: "resolution_frames"`, frame counts are rounded down to the nearest multiple of this value. Set this to your model's required frame count factor (some models require `num_frames - 1` to be divisible by a certain value).
+
+**Note:** When using `bucket_strategy: "resolution_frames"` with `num_frames` set, you'll get a single frame bucket and videos shorter than `num_frames` will be discarded. Unset `num_frames` if you want multiple frame buckets with fewer discards.
+
+Example using `resolution_frames` bucketing for mixed-resolution video datasets:
+
+```json
+{
+  "id": "mixed-resolution-videos",
+  "type": "local",
+  "dataset_type": "video",
+  "resolution": 720,
+  "resolution_type": "pixel_area",
+  "instance_data_dir": "datasets/videos",
+  "video": {
+      "bucket_strategy": "resolution_frames",
+      "frame_interval": 25,
+      "min_frames": 25,
+      "max_frames": 250
+  }
+}
+```
+
+This configuration will create buckets like `1280x720@100`, `1920x1080@125`, `640x480@75`, etc. Videos are grouped by their training resolution and frame count (rounded to nearest 25 frames).
 
 
 ##### Configuration
 
@@ -233,6 +233,59 @@ TorchAO includes generally-available 4bit and 8bit optimisers: `ao-adamw8bit`, `
 
 It also provides two optimisers that are directed toward Hopper (H100 or better) users: `ao-adamfp8`, and `ao-adamwfp8`
 
+#### SDNQ (SD.Next Quantization Engine)
+
+[SDNQ](https://github.com/disty0/sdnq) is a quantization library optimized for training that works across all platforms: AMD (ROCm), Apple (MPS), and NVIDIA (CUDA). It provides quantized training with stochastic rounding and quantized optimizer states for memory efficiency.
+
+##### Recommended Precision Levels
+
+**For full finetuning** (model weights are updated):
+- `uint8-sdnq` - Best balance of memory savings and training quality
+- `uint16-sdnq` - Higher precision for maximum quality (e.g., Stable Cascade)
+- `int16-sdnq` - Signed 16-bit alternative
+- `fp16-sdnq` - Quantized FP16, maximum precision with SDNQ benefits
+
+**For LoRA training** (frozen base model weights):
+- `int8-sdnq` - Signed 8-bit, good general purpose choice
+- `int6-sdnq`, `int5-sdnq` - Lower precision, smaller memory
+- `uint5-sdnq`, `uint4-sdnq`, `uint3-sdnq`, `uint2-sdnq` - Aggressive compression
+
+**Note:** `int7-sdnq` is available but not recommended (slow and not much smaller than int8).
+
+**Important:** Below 5-bit precision, SDNQ automatically enables SVD (Singular Value Decomposition) with 8 steps to maintain quality. SVD takes longer to quantize and is non-deterministic, which is why Disty0 provides pre-quantized SVD models on HuggingFace. SVD adds compute overhead during training, so avoid for full finetuning where weights are actively updated.
+
+**Key features:**
+- Cross-platform: Works identically on AMD, Apple, and NVIDIA hardware
+- Training-optimized: Uses stochastic rounding to reduce quantization error accumulation
+- Memory efficient: Supports quantized optimizer state buffers
+- Decoupled matmul: Weight precision and matmul precision are independent (INT8/FP8/FP16 matmul available)
+
+##### SDNQ Optimisers
+
+SDNQ includes optimizers with optional quantized state buffers for additional memory savings:
+
+- `sdnq-adamw` - AdamW with quantized state buffers (uint8, group_size=32)
+- `sdnq-adamw+no_quant` - AdamW without quantized states (for comparison)
+- `sdnq-adafactor` - Adafactor with quantized state buffers
+- `sdnq-came` - CAME optimizer with quantized state buffers
+- `sdnq-lion` - Lion optimizer with quantized state buffers
+- `sdnq-muon` - Muon optimizer with quantized state buffers
+- `sdnq-muon+quantized_matmul` - Muon with INT8 matmul in zeropower computation
+
+All SDNQ optimizers use stochastic rounding by default and can be configured with `--optimizer_config` for custom settings like `use_quantized_buffers=false` to disable state quantization.
+
+**Muon-specific options:**
+- `use_quantized_matmul` - Enable INT8/FP8/FP16 matmul in zeropower_via_newtonschulz5
+- `quantized_matmul_dtype` - Matmul precision: `int8` (consumer GPUs), `fp8` (datacenter), `fp16`
+- `zeropower_dtype` - Precision for zeropower computation (ignored when `use_quantized_matmul=True`)
+- Prefix args with `muon_` or `adamw_` to set different values for Muon vs AdamW fallback
+
+**Pre-quantized models:** Disty0 provides pre-quantized uint4 SVD models at [huggingface.co/collections/Disty0/sdnq](https://huggingface.co/collections/Disty0/sdnq). Load these normally, then convert with `convert_sdnq_model_to_training()` after importing SDNQ (SDNQ must be imported before loading to register with Diffusers).
+
+**Note on checkpointing:** SDNQ training models are saved in both native PyTorch format (`.pt`) for training resumption and safetensors format for inference. The native format is required for proper training resumption as SDNQ's `SDNQTensor` class uses custom serialization.
+
+**Disk space tip:** To save disk space, you can keep only the quantized weights and use SDNQ's [dequantize_sdnq_training.py](https://github.com/Disty0/sdnq/blob/main/scripts/dequantize_sdnq_training.py) script to dequantize when needed for inference.
+
 ### `--quantization_config`
 
 - **What**: JSON object or file path describing Diffusers `quantization_config` overrides when using `--quantize_via=pipeline`.
@@ -312,6 +365,17 @@ Using `--sageattention_usage` to enable training with SageAttention should be en
 - **What**: Uploads to Hugging Face Hub from a background worker so checkpoint pushes do not pause the training loop.
 - **Why**: Keeps training and validation running while Hub uploads proceed asynchronously. Final uploads are still awaited before the run exits so failures surface.
 
+### `--webhook_config`
+
+- **What**: Configuration for webhook targets (e.g., Discord, custom endpoints) to receive real-time training events.
+- **Why**: Allows you to monitor training runs with external tools and dashboards, receiving notifications at key training stages.
+- **Notes**: The `job_id` field in webhook payloads can be populated by setting the `SIMPLETUNER_JOB_ID` environment variable before training:
+  ```bash
+  export SIMPLETUNER_JOB_ID="my-training-run-name"
+  python train.py
+  ```
+This is useful for monitoring tools receiving webhooks from multiple training runs to identify which config sent each event. If SIMPLETUNER_JOB_ID is not set, job_id will be null in webhook payloads.
+
 ### `--publishing_config`
 
 - **What**: Optional JSON/dict/file path describing non-Hugging Face publishing targets (S3-compatible storage, Backblaze B2, Azure Blob Storage, Dropbox).
 
@@ -186,7 +186,8 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
     "video": {
         "num_frames": 61,
         "min_frames": 61,
-        "frame_rate": 24
+        "frame_rate": 24,
+        "bucket_strategy": "aspect_ratio"
     },
     "repeats": 10
   },
@@ -201,6 +202,15 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
 ]
 ```
 
+In the `video` subsection:
+- `num_frames`: Target frame count for training. Must satisfy `(frames - 1) % 4 == 0`.
+- `min_frames`: Minimum video length (shorter videos are discarded).
+- `max_frames`: Maximum video length filter.
+- `bucket_strategy`: How videos are grouped into buckets:
+  - `aspect_ratio` (default): Group by spatial aspect ratio only.
+  - `resolution_frames`: Group by `WxH@F` format (e.g., `854x480@61`) for mixed-resolution/duration datasets.
+- `frame_interval`: When using `resolution_frames`, round frame counts to this interval.
+
 > See caption_strategy options and requirements in [DATALOADER.md](../DATALOADER.md#caption_strategy).
 
 - **Text Embed Caching**: Highly recommended. Hunyuan uses a large LLM text encoder. Caching saves significant VRAM during training.
 
@@ -136,7 +136,8 @@ Video datasets require careful setup. Create `config/multidatabackend.json`:
     "video": {
         "num_frames": 61,
         "min_frames": 61,
-        "frame_rate": 24
+        "frame_rate": 24,
+        "bucket_strategy": "aspect_ratio"
     },
     "repeats": 10
   },
@@ -151,6 +152,15 @@ Video datasets require careful setup. Create `config/multidatabackend.json`:
 ]
 ```
 
+In the `video` subsection:
+- `num_frames`: Target frame count for training.
+- `min_frames`: Minimum video length (shorter videos are discarded).
+- `max_frames`: Maximum video length filter.
+- `bucket_strategy`: How videos are grouped into buckets:
+  - `aspect_ratio` (default): Group by spatial aspect ratio only.
+  - `resolution_frames`: Group by `WxH@F` format (e.g., `1920x1080@61`) for mixed-resolution/duration datasets.
+- `frame_interval`: When using `resolution_frames`, round frame counts to this interval.
+
 > See caption_strategy options and requirements in [DATALOADER.md](../DATALOADER.md#caption_strategy).
 
 #### Directory setup
 
@@ -83,6 +83,12 @@ Or launch the Web UI and submit a job with the same config.
 - For image‑to‑video runs, include a conditioning image per sample; it is placed in the first latent frame and kept fixed during sampling.
 - LongCat‑Video is 30 fps by design. The default 93 frames is ~3.1 s; if you change frame counts, keep `(frames - 1) % 4 == 0` and remember duration scales with fps.
 
+### Video bucket strategy
+
+In your dataset's `video` section, you can configure how videos are grouped:
+- `bucket_strategy`: `aspect_ratio` (default) groups by spatial aspect ratio. `resolution_frames` groups by `WxH@F` format (e.g., `480x832@93`) for mixed-resolution/duration datasets.
+- `frame_interval`: When using `resolution_frames`, round frame counts to this interval (e.g., set to 4 to match the VAE temporal stride).
+
 ---
 
 ## 5) Validation & inference
 
@@ -373,7 +373,8 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
     "repeats": 0,
     "video": {
         "num_frames": 125,
-        "min_frames": 125
+        "min_frames": 125,
+        "bucket_strategy": "aspect_ratio"
     }
   },
   {
@@ -392,13 +393,17 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
 > See caption_strategy options and requirements in [DATALOADER.md](../DATALOADER.md#caption_strategy).
 
 - In the `video` subsection, we have the following keys we can set:
-  - `num_frames` (optional, int) is how many seconds of data we'll train on.
+  - `num_frames` (optional, int) is how many frames of data we'll train on.
     - At 25 fps, 125 frames is 5 seconds of video, standard output. This should be your target.
   - `min_frames` (optional, int) determines the minimum length of a video that will be considered for training.
     - This should be at least equal to `num_frames`. Not setting it ensures it'll be equal.
   - `max_frames` (optional, int) determines the maximum length of a video that will be considered for training.
   - `is_i2v` (optional, bool) determines whether i2v training will be done on a dataset.
     - This is set to True by default for LTX. You can disable it, however.
+  - `bucket_strategy` (optional, string) determines how videos are grouped into buckets:
+    - `aspect_ratio` (default): Group by spatial aspect ratio only (e.g., `1.78`, `0.75`).
+    - `resolution_frames`: Group by resolution and frame count in `WxH@F` format (e.g., `768x512@125`). Useful for mixed-resolution/duration datasets.
+  - `frame_interval` (optional, int) when using `resolution_frames`, round frame counts to this interval. Set this to your model's required frame count factor.
 
 Then, create a `datasets` directory:
 
 
@@ -308,7 +308,8 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
     "repeats": 0,
     "video": {
         "num_frames": 81,
-        "min_frames": 81
+        "min_frames": 81,
+        "bucket_strategy": "aspect_ratio"
     }
   },
   {
@@ -331,6 +332,10 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
   - `min_frames` (optional, int) determines the minimum length of a video that will be considered for training.
   - `max_frames` (optional, int) determines the maximum length of a video that will be considered for training.
   - `is_i2v` (optional, bool) determines whether i2v training will be done on a dataset.
+  - `bucket_strategy` (optional, string) determines how videos are grouped into buckets:
+    - `aspect_ratio` (default): Group by spatial aspect ratio only (e.g., `1.78`, `0.75`).
+    - `resolution_frames`: Group by resolution and frame count in `WxH@F` format (e.g., `832x480@81`). Useful for mixed-resolution/duration datasets.
+  - `frame_interval` (optional, int) when using `resolution_frames`, round frame counts to this interval.
 
 Then, create a `datasets` directory:
 
 
@@ -542,7 +542,8 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
     "repeats": 0,
     "video": {
         "num_frames": 75,
-        "min_frames": 75
+        "min_frames": 75,
+        "bucket_strategy": "aspect_ratio"
     }
   },
   {
@@ -593,11 +594,15 @@ Create a `--data_backend_config` (`config/multidatabackend.json`) document conta
 </details>
 
 - In the `video` subsection, we have the following keys we can set:
-  - `num_frames` (optional, int) is how many seconds of data we'll train on.
+  - `num_frames` (optional, int) is how many frames of data we'll train on.
     - At 15 fps, 75 frames is 5 seconds of video, standard output. This should be your target.
   - `min_frames` (optional, int) determines the minimum length of a video that will be considered for training.
     - This should be at least equal to `num_frames`. Not setting it ensures it'll be equal.
   - `max_frames` (optional, int) determines the maximum length of a video that will be considered for training.
+  - `bucket_strategy` (optional, string) determines how videos are grouped into buckets:
+    - `aspect_ratio` (default): Group by spatial aspect ratio only (e.g., `1.78`, `0.75`).
+    - `resolution_frames`: Group by resolution and frame count in `WxH@F` format (e.g., `832x480@75`). Useful for mixed-resolution/duration datasets.
+  - `frame_interval` (optional, int) when using `resolution_frames`, round frame counts to this interval.
 <!--  - `is_i2v` (optional, bool) determines whether i2v training will be done on a dataset.
     - This is set to True by default for Wan 2.1. You can disable it, however.
 -->
 
@@ -265,6 +265,7 @@ def _collect_package_files(*directories: str):
     "peft-singlora>=0.2.0",
     "cryptography>=41.0.0",
     "torchcodec>=0.8.1",
+    "sdnq>=0.1.2",
 ]
 
 platform_deps_for_install = get_platform_dependencies()
 
@@ -79,4 +79,4 @@ def _suppress_swigvarlink(message, *args, **kwargs):
 warnings.warn = _suppress_swigvarlink
 
 
-__version__ = "3.3.2"
+__version__ = "3.3.3"
Original file line number	Diff line number	Diff line change
`@@ -265,6 +265,7 @@ def _collect_package_files(*directories: str):`
`265`	`265`	`"peft-singlora>=0.2.0",`
`266`	`266`	`"cryptography>=41.0.0",`
`267`	`267`	`"torchcodec>=0.8.1",`
	`268`	`+ "sdnq>=0.1.2",`
`268`	`269`	`]`
`269`	`270`
`270`	`271`	`platform_deps_for_install = get_platform_dependencies()`
Original file line number	Diff line number	Diff line change
`@@ -79,4 +79,4 @@ def _suppress_swigvarlink(message, args, *kwargs):`
`79`	`79`	`warnings.warn = _suppress_swigvarlink`
`80`	`80`
`81`	`81`
`82`		`-__version__ = "3.3.2"`
	`82`	`+__version__ = "3.3.3"`