Skip to content

[NeurIPS 2024] Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

Notifications You must be signed in to change notification settings

MCG-NJU/FlowDCN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[NeurIPS24] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

caps

[NEWS] [9.26] 💐💐 Our FlowDCN is accepted by NeurIPS 2024! 💐💐

[NEWS] [11.22] 🍺 Our FlowDCN models and code are now available in the official repo!

Pretrained Models

Our Models consistently achieve state-of-the-art results on the sFID metrics compared to SiT/DiT.

Metrics

Our Models consistently has fewer parameters and GFLOPS compared to Transformer counterparts. Our code also support LogNorm and VAR(Various Aspect Ratio Training)

Model-iters Resolution Solver NFE-CFG FID sFID Params Link
FlowDCN-S-400k 256x256 EulerSDE-250 250x2 54.6 8.8 30.3M HF
FlowDCN-B-400k 256x256 EulerSDE-250 250x2 28.5 6.09 120M HF
VAR-FlowDCN-B-400k 256x256 EulerSDE-250 250x2 23.6 7.72 120M HF
FlowDCN-L-400k 256x256 EulerSDE-250 250x2 13.8 4.69 421M HF
FlowDCN-XL-2M 256x256 EulerODE-250 250x2 2.01 4.33 618M HF
FlowDCN-XL-2M 256x256 EulerSDE-250 250x2 2.00 4.37 618M HF
FlowDCN-XL-2M 256x256 NeuralSolver-10 10x2 2.35 5.07 618M HF
FlowDCN-XL-100k 512x512 EulerODE-50 50x2 2.76 5.29 618M HF
FlowDCN-XL-100k 512x512 EulerSDE-250 250x2 2.44 4.53 618M HF
FlowDCN-XL-100k 512x512 NeuralSolver-10 10x2 2.77 4.68 618M HF

Usages

remember download models and change the VAE and pretrained models path

For training

python3 main.py fit -c configs/CONFIG

For sampling

python3 main.py predict -c configs/CONFIG

Visualizations

CFG1.375 Generation Images:

Models Resolution Link
FlowDCN-XL-100k 512x512 HF
FlowDCN-XL-2M 256x256 HF

CFG4.0 selected Generation Images:

caps

Various Resolution Extension

Models 256x256 FID sFID IS 320x320 FID sFID IS 224x448 FID sFID IS 160x480 FID sFID IS
DiT-B 44.83 8.49 32.05 95.47 108.68 18.38 109.1 110.71 14.00 143.8 122.81 8.93
with EI 44.83 8.49 32.05 81.48 62.25 20.97 133.2 72.53 11.11 160.4 93.91 7.30
with PI 44.83 8.49 32.05 72.47 54.02 24.15 133.4 70.29 11.73 156.5 93.80 7.80
FiT-B (+VAR) 36.36 11.08 40.69 61.35 30.71 31.01 44.67 24.09 37.1 56.81 22.07 25.25
with VisionYaRN 36.36 11.08 40.69 44.76 38.04 44.70 41.92 42.79 45.87 62.84 44.82 27.84
with VisionNTK 36.36 11.08 40.69 57.31 31.31 33.97 43.84 26.25 39.22 56.76 24.18 26.40
FlowDCN-B 28.5 6.09 51 34.4 27.2 52.2 71.7 62.0 23.7 211 111 5.83
FlowDCN-B (+VAR) 23.6 7.72 62.8 29.1 15.8 69.5 31.4 17.0 62.4 44.7 17.8 35.8

Linear-Multi-step Solvers

We also provide a adams-like linear-multi-step solver for the recitified flow sampling. The related configs are named with adam2 or adam4. The solver code are placed in ./src/diffusion/flow_matching/adam_sampling.py.

Compared to Henu/RK4, the linear-multi-step solver is more stable and faster.

During some experiments, we supringly find that the linear-multi-step solver can achieve comparable results even with FlowTurbo.

As they are distinct methods, so armed with Adams, we believe FlowTurbo can be more powerful.

Also, We provide some magic solvers for the recitified flow sampling. These solvers are highly inspired by linear-multi-steps methods, and consists of just some Magic Numbers These solvers are really powerful and interesting. We place the related code in ./src/diffusion/flow_matching/ns_sampling.py.

SiT-XL-R256 Steps NFE-CFG Extra-Paramters FID IS PR Recall
Heun 8 16x2 0 3.68 / / /
Heun 11 22x2 0 2.79 / / /
Heun 15 30x2 0 2.42 / / /
Adam2 6 6x2 0 6.35 190 0.75 0.55
Adam2 8 8x2 0 4.16 212 0.78 0.56
Adam2 16 16x2 0 2.42 237 0.80 0.60
Adam4 16 16x2 0 2.27 243 0.80 0.60

Citation

@inproceedings{
wang2024exploring,
title={Exploring {DCN}-like architecture for fast image generation with arbitrary resolution},
author={Shuai Wang and Zexian Li and Tianhui Song and Xubin Li and Tiezheng Ge and Bo Zheng and Limin Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=e57B7BfA2B}
}

About

[NeurIPS 2024] Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published