-
Notifications
You must be signed in to change notification settings - Fork 36
Add Qwen-Image Support with Intelligent Model Detection #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Encryptic1
wants to merge
6
commits into
wildminder:main
Choose a base branch
from
Encryptic1:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ure-specific optimizations
… strategies This commit adds comprehensive image editing support to the DyPE implementation for Qwen-Image models, addressing the issue where DyPE's aggressive positional encoding modifications were interfering with image editing tasks. ## Problem Addressed: - DyPE was applying full strength modifications during image editing - This caused loss of original image structure and feature degradation - Editing tasks (inpainting, image-to-image) were not preserving spatial relationships - Users reported that while upscaling worked well, editing was problematic ## Solution Implemented: ### 1. Multiple Editing Mode Strategies Added 5 distinct editing mode strategies via new 'editing_mode' parameter: - 'adaptive' (default): Timestep-aware scaling - full DyPE early (structure), gradually reduces later (details). Best balance for most editing scenarios. - 'timestep_aware': More aggressive version - 100% early, 20% late - 'resolution_aware': Only reduces DyPE when editing at high resolutions (>base) - 'minimal': Fixed reduced strength throughout (original approach, improved) - 'full': Always full DyPE regardless of editing (for pure generation) ### 2. Timestep-Aware Dynamic Scaling The adaptive and timestep_aware modes implement intelligent scaling: - Early denoising steps (high noise): Full DyPE strength for structure building - Late denoising steps (low noise): Reduced DyPE to preserve fine details - Smooth transition based on normalized timestep (1.0 = pure noise, 0.0 = clean) - Formula: timestep_factor = base + (current_timestep * range) - adaptive: 0.3 + (timestep * 0.7) → 1.0 to 0.3 - timestep_aware: 0.2 + (timestep * 0.8) → 1.0 to 0.2 ### 3. Editing Strength Parameter - New 'editing_strength' parameter (0.0-1.0, default 0.6) - Acts as multiplier for DyPE strength during editing - Lower values = more structure preservation - Higher values = more quality, less preservation - Works in conjunction with editing_mode for fine control ### 4. Improved Editing Detection Enhanced detection of image editing scenarios: - Checks conditioning dict for image-related keys: - 'image', 'image_embeds', 'image_tokens' - 'concat_latent_image', 'concat_mask', 'concat_mask_image' - Heuristic: Analyzes input variance to distinguish edited images from pure noise - More accurate detection of when editing is occurring ### 5. Resolution-Aware Behavior Resolution-aware mode intelligently: - Detects if editing at base resolution (1024x1024) vs high resolution - Applies full DyPE strength at base resolution - Only reduces strength when upscaling during editing - Prevents unnecessary reduction when not needed ### 6. Extrapolation Scaling For YARN and NTK methods: - Applies timestep-aware scaling to extrapolation ratios - Adaptive modes use smoother transitions - Minimal mode uses fixed reduction - Better balance between structure and quality ## Technical Implementation: ### Changes to QwenPosEmbed class: - Added 'editing_strength' and 'editing_mode' parameters - Implemented timestep-aware strength calculation - Dynamic scaling of dype_exponent based on mode and timestep - Conditional extrapolation scaling for YARN/NTK methods ### Changes to apply_dype_to_qwen function: - Added 'editing_strength' and 'editing_mode' parameters - Passes parameters to QwenPosEmbed - Improved timestep normalization for editing scenarios - Mode-specific handling in wrapper function ### Changes to node interface (__init__.py): - Added 'editing_mode' combo input with 5 options - Added 'editing_strength' float input (default 0.6) - Updated tooltips with clear explanations - Backward compatible (editing_strength defaults to 0.6, mode to 'adaptive') ## Benefits: 1. **Better Editing Quality**: Preserves original image structure while allowing edits 2. **Flexible Control**: 5 modes + strength parameter for fine-tuning 3. **Intelligent Adaptation**: Timestep-aware modes adapt to denoising stage 4. **Backward Compatible**: Defaults work well, existing workflows unaffected 5. **Resolution Aware**: Only applies reduction when actually needed ## Usage Recommendations: - Most editing: editing_mode='adaptive', editing_strength=0.6 (default) - Maximum preservation: editing_mode='timestep_aware', editing_strength=0.5 - Base resolution: editing_mode='resolution_aware', editing_strength=0.7 - Pure generation: editing_mode='full' (ignores editing_strength) This implementation solves the image editing quality issues while maintaining excellent upscaling performance for pure generation tasks.
Updated default parameter values based on testing and optimization: - dype_exponent: 2.0 → 3.0 * More aggressive DyPE strength for better high-resolution generation * Better performance at 4K+ resolutions - base_shift: 0.5 → 0.10 * Lower base shift for noise schedule * Improved balance for Qwen-Image architecture - editing_strength: 0.6 → 0.0 * Maximum structure preservation during image editing * Prevents feature degradation when editing existing images * Users can increase if needed for more aggressive editing These defaults provide optimal balance between: - High-resolution generation quality (dype_exponent=3.0) - Image editing structure preservation (editing_strength=0.0) - Noise schedule tuning (base_shift=0.10)
Increased the maximum value for dype_exponent parameter from 4.0 to 10.0 for the Qwen-Image node only. This allows users to experiment with more aggressive DyPE settings for extreme high-resolution generation (8K+). The FLUX node remains capped at 4.0 as before.
|
OMG I should have checked to see that your PR existed before I spun up mine. Face palm. I'll have a look at this one. Will be interesting to see how your approach is similar / different. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive support for Qwen-Image models to ComfyUI-DyPE, introducing a new
DyPE for Qwen-Imagenode alongside the existingDyPE for FLUXnode. The implementation includes intelligent model structure detection and architecture-specific optimizations that automatically adapt to different Qwen-Image model variants.IMAGE EDIT scaling to come. NO SUPPORT for image edit requires a empty latent for now. Will have adaptive DyPE modes for image editing with Qwen soon. For now requires NO image fed into the Qwen Text Encoding node or an empty latent.
What Was Added
1. New ComfyUI Node:
DyPE for Qwen-Imagemodel_patches/unet)2. Intelligent Model Structure Detection
The implementation includes a robust
_detect_qwen_model_structure()function that automatically detects:pos_embedvspe_embedder)This eliminates hardcoded assumptions and adapts to different Qwen model variants seamlessly.
3. Architecture-Specific Optimizations
QwenPosEmbedclass with better device-aware dtype selection and robust tensor handling4. Documentation & Examples
IMPROVEMENTS.mddocumenting all Qwen-Image specific improvementsexample_workflows/DyPE-Qwen-workflow.jsonDyPE for Qwen-ImagenodeKey Benefits
Technical Details
For detailed technical information about all improvements, please see IMPROVEMENTS.md.
Key technical changes include:
Files Changed
__init__.py: AddedDyPE_QWENnode classsrc/patch.py: Added Qwen-Image support with intelligent model detectionREADME.md: Updated documentation for both nodesIMPROVEMENTS.md: Comprehensive documentation of all improvements (new file)example_workflows/DyPE-Qwen-workflow.json: Example workflow for Qwen-Image models (new file)Testing
The implementation has been tested with Qwen-Image models and includes:
Backward Compatibility
✅ Fully backward compatible - The existing
DyPE for FLUXnode remains unchanged and continues to work exactly as before. This PR only adds new functionality.Note: This PR extends the original FLUX-only implementation to support Qwen-Image models while maintaining full backward compatibility with existing FLUX workflows.