Skip to content
39 changes: 37 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

## About The Project

DyPE is a novel, training-free method that allows pre-trained diffusion transformers like FLUX to generate images at resolutions far beyond their training data, with no additional sampling cost.
DyPE is a novel, training-free method that allows pre-trained diffusion transformers like FLUX (and now **Qwen Image**) to generate images at resolutions far beyond their training data, with no additional sampling cost.

It works by taking advantage of the spectral progression inherent to the diffusion process. By dynamically adjusting the model's positional encodings at each step, DyPE matches their frequency spectrum with the current stage of the generative process—focusing on low-frequency structures early on and resolving high-frequency details in later steps. This prevents the repeating artifacts and structural degradation typically seen when pushing models beyond their native resolution.

Expand All @@ -43,9 +43,10 @@ This node provides a seamless, "plug-and-play" integration of DyPE into any FLUX

**✨ Key Features:**
* **True High-Resolution Generation:** Push FLUX models to 4096x4096 and beyond while maintaining global coherence and fine detail.
* **Single-Node Integration:** Simply place the `DyPE for FLUX` node after your model loader to patch the model. No complex workflow changes required.
* **Single-Node Integration:** Simply place the `DyPE for FLUX` or `DyPE for Qwen Image` node after your model loader to patch the model. No complex workflow changes required.
* **Full Compatibility:** Works seamlessly with your existing ComfyUI workflows, samplers, schedulers, and other optimization nodes like Self-Attention or quantization.
* **Fine-Grained Control:** Exposes key DyPE hyperparameters, allowing you to tune the algorithm's strength and behavior for optimal results at different target resolutions.
* **Model-Aware Qwen Support:** Automatically infers Qwen patch geometry, adds editing-aware DyPE tapers, and gracefully patches non-FLUX samplers.
* **Zero Inference Overhead:** DyPE's adjustments happen on-the-fly with negligible performance impact.

<div align="center">
Expand Down Expand Up @@ -75,6 +76,8 @@ Alternatively, to install manually:

Using the node is straightforward and designed for minimal workflow disruption.

### FLUX Workflows

1. **Load Your FLUX Model:** Use a standard `Load Checkpoint` node to load your FLUX model (e.g., `FLUX.1-Krea-dev`).
2. **Add the DyPE Node:** Add the `DyPE for FLUX` node to your graph (found under `model_patches/unet`).
3. **Connect the Model:** Connect the `MODEL` output from your loader to the `model` input of the DyPE node.
Expand All @@ -85,6 +88,33 @@ Using the node is straightforward and designed for minimal workflow disruption.
> [!NOTE]
> This node specifically patches the **diffusion model (UNet)**. It does not modify the CLIP or VAE models. It is designed exclusively for **FLUX-based** architectures.

### Qwen Image Workflows

1. **Load Your Qwen Image Model:** Use the usual `Load Checkpoint` node for `QwenImage`.
2. **Add the DyPE Node:** Drop in the `DyPE for Qwen Image` node (under `model_patches/unet`).
3. **Set Width/Height:** Match the values to your target latent/image resolution (the same numbers you use in `Empty Latent Image`).
4. **Auto-Detect Geometry:** Leave `auto_detect` enabled to let the node read the Qwen patch size and base grid directly from the checkpoint. Disable it only if you need to override the base dimensions for custom fine-tunes.
5. **Dial In Editing:** Lower `editing_strength` (and pick an `editing_mode`) when you are working on inpainting or image-to-image tasks so DyPE eases off as it preserves source structure.
6. **Choose Method:** `yarn` is recommended for aggressive extrapolation; switch to `ntk` if you prefer a smoother scaling curve.
7. **Run the KSampler:** Route the patched model output into your sampler as usual.

> [!TIP]
> `base_shift`/`max_shift` let you blend the flow-matching schedule as you scale to extremely large canvases. Keeping them at `1.15`/`1.35` mirrors the defaults we found stable in early tests—feel free to tune if you observe over-smoothing or excess repetition.

#### How DyPE for Qwen Image Works

The native Qwen Image transformer was trained on a 1024×1024 latent grid, so every attention layer expects RoPE caches sized for 58×104 spatial tokens (plus text tokens). When you push beyond that window, the model reuses frequencies and starts repeating structures.

The DyPE node swaps the stock `EmbedND` for `QwenSpatialPosEmbed`, a drop-in replacement that:

* Clones the original positional embedder so the node can be removed without side-effects.
* Recomputes the rotary cache using YaRN or NTK scaling for the height/width axes while leaving the text index axis untouched.
* Tracks the sampler’s normalized timestep (via a lightweight wrapper) and applies the DyPE power ramp (`t^λ`) to blend from the base grid to the expanded grid over the course of sampling. Early steps stay close to the training spectrum; late steps receive the extra high-frequency coverage that keeps 4K images coherent.
* Interpolates the FLUX-style flow shift between `base_shift` and `max_shift` according to the requested canvas size so the noise schedule stays in sync with the wider attention field.
* Emits INFO logs (`[DyPE QwenImage] axis=…`) showing the current grid lengths, ramp strength, and YaRN/NTK factors. These diagnostics make it easy to correlate visual artifacts with the positional scaling parameters.

Because the embedder is swapped via `ModelPatcher`, you can chain other ComfyUI optimizations after the DyPE node, and disabling the node returns you to the stock Qwen behaviour instantly.

### Node Inputs

* **`model`**: The FLUX model to be patched.
Expand All @@ -98,6 +128,9 @@ Using the node is straightforward and designed for minimal workflow disruption.
* `1.0` (Linear): A good starting point for **~2K-3K** resolutions.
* `0.5` (Sub-linear): A gentler schedule that may work best for resolutions just above the model's native 1K.
* **`base_shift` / `max_shift`** (Advanced): These parameters control the interpolation of the dynamic noise schedule shift (`mu`). The default values (`0.5`, `1.15`) are taken directly from the FLUX architecture and are generally optimal. Adjust only if you are an advanced user experimenting with the noise schedule.
* **`auto_detect`**: When enabled (default), the node inspects the loaded Qwen checkpoint to recover its training grid and patch size. Disable it if you need to supply `base_width`/`base_height` manually.
* **`base_width` / `base_height`**: Manual override for the training canvas; only consulted when `auto_detect` is turned off.
* **`editing_strength` & `editing_mode`**: Let you taper DyPE during edits. Reduce the strength (e.g., 0.5) and pick a mode like `adaptive` to keep structure intact during image-to-image or inpainting workflows.

> [!WARNING]
> It seems the width/height parameters in the node are buggy. Keep the values below 1024x1024; doing so won’t affect your output.
Expand Down Expand Up @@ -132,6 +165,8 @@ Beyond the code, I believe in the power of community and continuous learning. I
## ⚠️ Known Issues and Limitations
* **FLUX Only:** This implementation is highly specific to the architecture of the FLUX model and will not work on standard U-Net models (like SD 1.5/SDXL) or other Diffusion Transformers.
* **Parameter Tuning:** The optimal `dype_exponent` can vary based on your target resolution. Experimentation is key to finding the best setting for your use case. The default of `2.0` is optimized for 4K.
* **Qwen CLIP Diagnostics:** When a supplied CLIP encoder is missing the expected `transformer.model`, the extension now recursively searches typical attachment points (including nested module dictionaries) and, if still unresolved, raises an error that includes a structured snapshot in both the logs and exception text to speed up debugging. Both DyPE nodes also emit INFO-level logs summarizing the requested patch parameters whenever they run.
* **Qwen Spatial Scaling:** Extremely aggressive aspect ratios (>3:1) may still require manual tuning of `max_shift` or method selection to maintain coherence. Start with `yarn` and step the exponent down (e.g. to `1.0`) if the model oversharpens.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

Expand Down
280 changes: 277 additions & 3 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,17 @@
import logging
import torch
from comfy_api.latest import ComfyExtension, io
from .src.patch import apply_dype_to_flux

try:
from .src.patch import apply_dype_to_flux
from .src.qwen_patch import apply_dype_to_qwen_clip
from .src.qwen_spatial import apply_dype_to_qwen_image
except ImportError: # pragma: no cover - fallback for direct execution contexts
from src.patch import apply_dype_to_flux # type: ignore[no-redef]
from src.qwen_patch import apply_dype_to_qwen_clip # type: ignore[no-redef]
from src.qwen_spatial import apply_dype_to_qwen_image # type: ignore[no-redef]

logger = logging.getLogger(__name__)

class DyPE_FLUX(io.ComfyNode):
"""
Expand Down Expand Up @@ -78,15 +89,278 @@ def execute(cls, model, width: int, height: int, method: str, enable_dype: bool,
"""
if not hasattr(model.model, "diffusion_model") or not hasattr(model.model.diffusion_model, "pe_embedder"):
raise ValueError("This node is only compatible with FLUX models.")

logger.info(
"DyPE_FLUX: patching model at %dx%d (method=%s, enable_dype=%s, "
"dype_exponent=%s, base_shift=%s, max_shift=%s).",
width,
height,
method,
enable_dype,
dype_exponent,
base_shift,
max_shift,
)

patched_model = apply_dype_to_flux(model, width, height, method, enable_dype, dype_exponent, base_shift, max_shift)
return io.NodeOutput(patched_model)


class DyPE_QWEN_IMAGE(io.ComfyNode):
"""
Applies DyPE-style spatial extrapolation to a Qwen Image diffusion model.
"""

@classmethod
def define_schema(cls) -> io.Schema:
return io.Schema(
node_id="DyPE_QwenImage",
display_name="DyPE for Qwen Image",
category="model_patches/unet",
description="Expands the spatial rotary embeddings inside the Qwen image transformer using DyPE.",
inputs=[
io.Model.Input(
"model",
tooltip="The Qwen Image model to patch.",
),
io.Int.Input(
"width",
default=1024,
min=16,
max=16384,
step=8,
tooltip="Target output width in pixels.",
),
io.Int.Input(
"height",
default=1024,
min=16,
max=16384,
step=8,
tooltip="Target output height in pixels.",
),
io.Boolean.Input(
"auto_detect",
default=True,
label_on="Auto",
label_off="Manual",
tooltip="Automatically detect Qwen patch size and training resolution from the model.",
),
io.Int.Input(
"base_width",
default=1024,
min=16,
max=16384,
step=8,
tooltip="Training width used by the base Qwen model (used when auto detection is disabled).",
),
io.Int.Input(
"base_height",
default=1024,
min=16,
max=16384,
step=8,
tooltip="Training height used by the base Qwen model (used when auto detection is disabled).",
),
io.Combo.Input(
"method",
options=["yarn", "ntk", "base"],
default="yarn",
tooltip="Spatial RoPE extrapolation strategy.",
),
io.Boolean.Input(
"enable_dype",
default=True,
label_on="Enabled",
label_off="Disabled",
tooltip="Enable Dynamic Position Extrapolation over the sampling trajectory.",
),
io.Float.Input(
"dype_exponent",
default=2.0,
min=0.0,
max=4.0,
step=0.1,
optional=True,
tooltip="Controls how strongly DyPE ramps across sampling timesteps.",
),
io.Float.Input(
"base_shift",
default=1.15,
min=0.0,
max=10.0,
step=0.01,
optional=True,
tooltip="Baseline shift applied to the flow-matching noise schedule.",
),
io.Float.Input(
"max_shift",
default=1.35,
min=0.0,
max=10.0,
step=0.01,
optional=True,
tooltip="Maximum shift applied when operating at the target resolution.",
),
io.Float.Input(
"editing_strength",
default=1.0,
min=0.0,
max=1.0,
step=0.05,
optional=True,
tooltip="Scale DyPE while editing images (1.0 = full strength, 0.0 = disable DyPE scaling in edits).",
),
io.Combo.Input(
"editing_mode",
options=["adaptive", "timestep_aware", "resolution_aware", "minimal", "full"],
default="adaptive",
tooltip="Strategy for tapering DyPE during edits. Adaptive is a balanced default.",
),
],
outputs=[
io.Model.Output(
display_name="Patched Model",
tooltip="The Qwen Image model patched with spatial DyPE.",
),
],
)

@classmethod
def execute(
cls,
model,
width: int,
height: int,
auto_detect: bool,
base_width: int,
base_height: int,
method: str,
enable_dype: bool,
dype_exponent: float = 2.0,
base_shift: float = 1.15,
max_shift: float = 1.35,
editing_strength: float = 1.0,
editing_mode: str = "adaptive",
) -> io.NodeOutput:
if not hasattr(model, "model") or not hasattr(model.model, "diffusion_model"):
raise ValueError("This node expects a Qwen Image diffusion model input.")

logger.info(
"DyPE_QwenImage: requested patch (width=%d, height=%d, method=%s, "
"enable_dype=%s, dype_exponent=%s, base_shift=%s, max_shift=%s, "
"auto_detect=%s, editing_mode=%s, editing_strength=%.3f).",
width,
height,
method,
enable_dype,
dype_exponent,
base_shift,
max_shift,
auto_detect,
editing_mode,
editing_strength,
)

patched_model = apply_dype_to_qwen_image(
model=model,
width=width,
height=height,
method=method,
enable_dype=enable_dype,
dype_exponent=dype_exponent,
base_width=base_width,
base_height=base_height,
base_shift=base_shift,
max_shift=max_shift,
auto_detect=auto_detect,
editing_strength=editing_strength,
editing_mode=editing_mode,
)
return io.NodeOutput(patched_model)

class DyPE_QWEN_CLIP(io.ComfyNode):
"""
Applies DyPE position extrapolation to a Qwen-based CLIP/text encoder.
"""

@classmethod
def define_schema(cls) -> io.Schema:
return io.Schema(
node_id="DyPE_QwenClip",
display_name="DyPE for Qwen CLIP",
category="model_patches/clip",
description="Extends Qwen text encoder RoPE for longer prompts using DyPE-style extrapolation.",
inputs=[
io.Clip.Input("clip"),
io.Combo.Input(
"method",
options=["yarn", "ntk", "base"],
default="ntk",
tooltip="RoPE extrapolation strategy. NTK is a good default for language models.",
),
io.Boolean.Input(
"enable_dype",
default=True,
label_on="Enabled",
label_off="Disabled",
tooltip="Toggle Dynamic Position Extrapolation scaling.",
),
io.Float.Input(
"dype_exponent",
default=2.0, min=0.0, max=4.0, step=0.1,
optional=True,
tooltip="Controls how aggressively DyPE ramps with context length.",
),
io.Int.Input(
"base_ctx_len",
default=8192, min=1024, max=512000, step=512,
tooltip="Context length the model was trained on. DyPE stays inactive at or below this size.",
),
io.Int.Input(
"max_ctx_len",
default=262144, min=4096, max=1048576, step=512,
tooltip="Target maximum context DyPE should support.",
),
],
outputs=[
io.Clip.Output(
display_name="Patched CLIP",
tooltip="Qwen text encoder with DyPE RoPE scaling.",
),
],
)

@classmethod
def execute(cls, clip, method: str, enable_dype: bool, dype_exponent: float = 2.0, base_ctx_len: int = 8192, max_ctx_len: int = 262144) -> io.NodeOutput:
if not hasattr(clip, "cond_stage_model"):
raise ValueError("This node expects a CLIP/text encoder input.")

logger.info(
"DyPE_QwenClip: requested patch (method=%s, enable_dype=%s, dype_exponent=%s, "
"base_ctx_len=%d, max_ctx_len=%d).",
method,
enable_dype,
dype_exponent,
base_ctx_len,
max_ctx_len,
)

patched_clip = apply_dype_to_qwen_clip(
clip,
method=method,
enable_dype=enable_dype,
dype_exponent=dype_exponent,
base_ctx_len=base_ctx_len,
max_ctx_len=max_ctx_len,
)
return io.NodeOutput(patched_clip)

class DyPEExtension(ComfyExtension):
"""Registers the DyPE node."""

async def get_node_list(self) -> list[type[io.ComfyNode]]:
return [DyPE_FLUX]
return [DyPE_FLUX, DyPE_QWEN_CLIP, DyPE_QWEN_IMAGE]

async def comfy_entrypoint() -> DyPEExtension:
return DyPEExtension()
return DyPEExtension()
Loading