Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
94 changes: 94 additions & 0 deletions IMPROVEMENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Qwen-Image Specific Improvements

This document outlines the architecture-specific improvements made to optimize DyPE for Qwen-Image models.

## Key Improvements

### 1. **Intelligent Model Structure Detection**
- Added `_detect_qwen_model_structure()` function that automatically detects:
- Transformer/diffusion_model location
- Positional embedder path (`pos_embed` vs `pe_embedder`)
- Patch size from model config
- VAE scale factor
- Base training resolution
- Eliminates hardcoded assumptions and adapts to different Qwen model variants

### 2. **Qwen-Specific Parameter Extraction**
- **Patch Size Detection**: Automatically extracts `patch_size` from model config (defaults to 2 for MMDiT)
- **VAE Scale Factor**: Detects actual VAE downsampling factor (typically 8x)
- **Base Resolution**: Attempts to detect from model config, falls back to 1024
- **Axes Dimensions**: Extracts from model or uses Qwen-Image defaults `[16, 56, 56]`

### 3. **Optimized Base Patches Calculation**
```python
# Old: Hardcoded calculation
self.base_patches = (self.base_resolution // 8) // 2

# New: Uses detected patch_size and base_resolution
self.base_patches = (self.base_resolution // vae_scale_factor) // patch_size
```
- More accurate for different Qwen model variants
- Adapts to actual model architecture

### 4. **Enhanced Positional Embedding Class**
- Added `base_resolution` and `patch_size` parameters to `QwenPosEmbed`
- Better device-aware dtype selection (handles MPS, NPU, CUDA)
- Improved comments explaining Qwen-specific behavior
- More robust handling of different tensor formats

### 5. **Improved Scheduler Compatibility**
- Better fallback for non-Flux schedulers (FlowMatch, etc.)
- Conservative scaling approach for unknown scheduler types
- More robust error handling with `AttributeError` instead of bare `except`

### 6. **Better Sequence Length Calculation**
```python
# Now uses detected vae_scale_factor and patch_size
latent_h, latent_w = height // vae_scale_factor, width // vae_scale_factor
padded_h = math.ceil(latent_h / patch_size) * patch_size
padded_w = math.ceil(latent_w / patch_size) * patch_size
image_seq_len = (padded_h // patch_size) * (padded_w // patch_size)
```
- More accurate for Qwen's specific architecture
- Accounts for both VAE downsampling and patch-based downsampling

### 7. **Enhanced Timestep Handling**
- Better handling of different timestep formats (tensor, scalar, etc.)
- More robust normalization logic
- Improved error handling for edge cases

### 8. **Architecture-Aware Defaults**
- Qwen-Image specific defaults:
- `axes_dim = [16, 56, 56]` (MMDiT standard)
- `theta = 10000` (RoPE base frequency)
- `patch_size = 2` (MMDiT patch size)
- `vae_scale_factor = 8` (standard VAE downsampling)

## Benefits

1. **Better Compatibility**: Works with different Qwen-Image model variants
2. **More Accurate**: Uses actual model parameters instead of assumptions
3. **Robust**: Better error handling and fallbacks
4. **Optimized**: Qwen-specific optimizations for better performance
5. **Maintainable**: Clear structure detection makes debugging easier

## Testing Recommendations

When testing with your Qwen-Image model:

1. Check console output for detected parameters (add logging if needed)
2. Verify patch_size matches your model (typically 2 for MMDiT)
3. Verify base_resolution matches training resolution
4. Test with different resolutions to ensure proper extrapolation
5. Monitor for any warnings about fallback behavior

## Future Enhancements

Potential further improvements:

1. **MSRoPE Integration**: Qwen uses Multimodal Scalable RoPE - could add specific support
2. **Aspect Ratio Presets**: Qwen supports specific aspect ratios - could add presets
3. **Text Rendering Optimization**: Qwen excels at text - could add text-specific optimizations
4. **Multi-Image Support**: Qwen-Image-Edit supports multi-image - could extend for that
5. **Config File Support**: Allow users to override detected parameters via config

28 changes: 23 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,12 @@ It works by taking advantage of the spectral progression inherent to the diffusi
<p><sub><i>A simple, single-node integration to patch your FLUX model for high-resolution generation.</i></sub></p>
</div>

This node provides a seamless, "plug-and-play" integration of DyPE into any FLUX-based workflow.
This node provides a seamless, "plug-and-play" integration of DyPE into FLUX-based and Qwen-Image workflows. Two specialized nodes are available: `DyPE for FLUX` for FLUX models and `DyPE for Qwen-Image` for Qwen-Image models, each optimized for their respective architectures.

**✨ Key Features:**
* **True High-Resolution Generation:** Push FLUX models to 4096x4096 and beyond while maintaining global coherence and fine detail.
* **Single-Node Integration:** Simply place the `DyPE for FLUX` node after your model loader to patch the model. No complex workflow changes required.
* **True High-Resolution Generation:** Push FLUX and Qwen-Image models to 4096x4096 and beyond while maintaining global coherence and fine detail.
* **Dual Node Support:** Two specialized nodes available - `DyPE for FLUX` and `DyPE for Qwen-Image` - each optimized for their respective architectures.
* **Single-Node Integration:** Simply place the appropriate DyPE node after your model loader to patch the model. No complex workflow changes required.
* **Full Compatibility:** Works seamlessly with your existing ComfyUI workflows, samplers, schedulers, and other optimization nodes like Self-Attention or quantization.
* **Fine-Grained Control:** Exposes key DyPE hyperparameters, allowing you to tune the algorithm's strength and behavior for optimal results at different target resolutions.
* **Zero Inference Overhead:** DyPE's adjustments happen on-the-fly with negligible performance impact.
Expand Down Expand Up @@ -75,15 +76,32 @@ Alternatively, to install manually:

Using the node is straightforward and designed for minimal workflow disruption.

### For FLUX Models

1. **Load Your FLUX Model:** Use a standard `Load Checkpoint` node to load your FLUX model (e.g., `FLUX.1-Krea-dev`).
2. **Add the DyPE Node:** Add the `DyPE for FLUX` node to your graph (found under `model_patches/unet`).
3. **Connect the Model:** Connect the `MODEL` output from your loader to the `model` input of the DyPE node.
4. **Set Resolution:** Set the `width` and `height` on the DyPE node to match the resolution of your `Empty Latent Image`.
5. **Connect to KSampler:** Use the `MODEL` output from the DyPE node as the input for your `KSampler`.
6. **Generate!** That's it. Your workflow is now DyPE-enabled.

### For Qwen-Image Models

1. **Load Your Qwen-Image Model:** Use a standard `Load Checkpoint` node to load your Qwen-Image model.
2. **Add the DyPE Node:** Add the `DyPE for Qwen-Image` node to your graph (found under `model_patches/unet`).
3. **Connect the Model:** Connect the `MODEL` output from your loader to the `model` input of the DyPE node.
4. **Set Resolution:** Set the `width` and `height` on the DyPE node to match the resolution of your `Empty Latent Image`.
5. **Connect to KSampler:** Use the `MODEL` output from the DyPE node as the input for your `KSampler`.
6. **Generate!** The node will automatically detect your Qwen-Image model structure and apply architecture-specific optimizations.

### Example Workflows

Ready-to-use example workflows are available in the [`example_workflows`](example_workflows) folder:
* **[DyPE-Flux-workflow.json](example_workflows/DyPE-Flux-workflow.json)** - Example workflow for FLUX models
* **[DyPE-Qwen-workflow.json](example_workflows/DyPE-Qwen-workflow.json)** - Example workflow for Qwen-Image models

> [!NOTE]
> This node specifically patches the **diffusion model (UNet)**. It does not modify the CLIP or VAE models. It is designed exclusively for **FLUX-based** architectures.
> This node specifically patches the **diffusion model (UNet)**. It does not modify the CLIP or VAE models. It is designed for **FLUX-based** architectures, with enhanced support for **Qwen-Image** models through intelligent model structure detection and architecture-specific optimizations.

### Node Inputs

Expand Down Expand Up @@ -130,7 +148,7 @@ Beyond the code, I believe in the power of community and continuous learning. I
<p align="center">══════════════════════════════════</p>

## ⚠️ Known Issues and Limitations
* **FLUX Only:** This implementation is highly specific to the architecture of the FLUX model and will not work on standard U-Net models (like SD 1.5/SDXL) or other Diffusion Transformers.
* **Supported Models:** This implementation is optimized for **FLUX-based** architectures and **Qwen-Image** models. It will not work on standard U-Net models (like SD 1.5/SDXL) or other Diffusion Transformers. For Qwen-Image models, the node automatically detects model structure and applies architecture-specific optimizations (see `IMPROVEMENTS.md` for details).
* **Parameter Tuning:** The optimal `dype_exponent` can vary based on your target resolution. Experimentation is key to finding the best setting for your use case. The default of `2.0` is optimized for 4K.

<p align="right">(<a href="#readme-top">back to top</a>)</p>
Expand Down
100 changes: 97 additions & 3 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import torch
from comfy_api.latest import ComfyExtension, io
from .src.patch import apply_dype_to_flux
from .src.patch import apply_dype_to_flux, apply_dype_to_qwen

class DyPE_FLUX(io.ComfyNode):
"""
Expand Down Expand Up @@ -82,11 +82,105 @@ def execute(cls, model, width: int, height: int, method: str, enable_dype: bool,
patched_model = apply_dype_to_flux(model, width, height, method, enable_dype, dype_exponent, base_shift, max_shift)
return io.NodeOutput(patched_model)

class DyPE_QWEN(io.ComfyNode):
"""
Applies DyPE (Dynamic Position Extrapolation) to a Qwen-Image model.
This allows generating images at resolutions far beyond the model's training scale
by dynamically adjusting positional encodings and the noise schedule.
"""

@classmethod
def define_schema(cls) -> io.Schema:
return io.Schema(
node_id="DyPE_QWEN",
display_name="DyPE for Qwen-Image",
category="model_patches/unet",
description="Applies DyPE (Dynamic Position Extrapolation) to a Qwen-Image model for ultra-high-resolution generation.",
inputs=[
io.Model.Input(
"model",
tooltip="The Qwen-Image model to patch with DyPE.",
),
io.Int.Input(
"width",
default=1024, min=16, max=8192, step=8,
tooltip="Target image width. Must match the width of your empty latent."
),
io.Int.Input(
"height",
default=1024, min=16, max=8192, step=8,
tooltip="Target image height. Must match the height of your empty latent."
),
io.Combo.Input(
"method",
options=["yarn", "ntk", "base"],
default="yarn",
tooltip="Position encoding extrapolation method (YARN recommended).",
),
io.Boolean.Input(
"enable_dype",
default=True,
label_on="Enabled",
label_off="Disabled",
tooltip="Enable or disable Dynamic Position Extrapolation for RoPE.",
),
io.Float.Input(
"dype_exponent",
default=3.0, min=0.0, max=10.0, step=0.1,
optional=True,
tooltip="Controls DyPE strength over time (λt). 3.0=Very aggressive (best for 4K+), 2.0=Exponential, 1.0=Linear, 0.5=Sub-linear (better for ~2K). Higher values (up to 10.0) for extreme high-resolution generation."
),
io.Float.Input(
"base_shift",
default=0.10, min=0.0, max=10.0, step=0.01,
optional=True,
tooltip="Advanced: Base shift for the noise schedule (mu). Default is 0.10."
),
io.Float.Input(
"max_shift",
default=1.15, min=0.0, max=10.0, step=0.01,
optional=True,
tooltip="Advanced: Max shift for the noise schedule (mu) at high resolutions. Default is 1.15."
),
io.Float.Input(
"editing_strength",
default=0.0, min=0.0, max=1.0, step=0.1,
optional=True,
tooltip="DyPE strength multiplier for image editing (0.0-1.0). Lower values preserve more original structure. Default 0.0 for maximum preservation. Set to 1.0 for pure generation."
),
io.Combo.Input(
"editing_mode",
options=["adaptive", "timestep_aware", "resolution_aware", "minimal", "full"],
default="adaptive",
tooltip="Editing mode strategy: 'adaptive' (recommended) - timestep-aware scaling, 'timestep_aware' - more DyPE early/less late, 'resolution_aware' - only reduce at high res, 'minimal' - minimal DyPE for editing, 'full' - always full DyPE."
),
],
outputs=[
io.Model.Output(
display_name="Patched Model",
tooltip="The Qwen-Image model patched with DyPE.",
),
],
)

@classmethod
def execute(cls, model, width: int, height: int, method: str, enable_dype: bool, dype_exponent: float = 3.0, base_shift: float = 0.10, max_shift: float = 1.15, editing_strength: float = 0.0, editing_mode: str = "adaptive") -> io.NodeOutput:
"""
Clones the model and applies the DyPE patch for both the noise schedule and positional embeddings.
"""
# Check if this is a Qwen model
has_transformer = hasattr(model.model, "transformer") or hasattr(model.model, "diffusion_model")
if not has_transformer:
raise ValueError("This node is only compatible with Qwen-Image models.")

patched_model = apply_dype_to_qwen(model, width, height, method, enable_dype, dype_exponent, base_shift, max_shift, editing_strength, editing_mode)
return io.NodeOutput(patched_model)

class DyPEExtension(ComfyExtension):
"""Registers the DyPE node."""
"""Registers the DyPE nodes for both FLUX and Qwen-Image."""

async def get_node_list(self) -> list[type[io.ComfyNode]]:
return [DyPE_FLUX]
return [DyPE_FLUX, DyPE_QWEN]

async def comfy_entrypoint() -> DyPEExtension:
return DyPEExtension()
Loading