Skip to content

Commit db435dd

Browse files
authored
integration diffsynth (#100)
* toc * finish docs
1 parent 76aca0a commit db435dd

File tree

12 files changed

+288
-12
lines changed

12 files changed

+288
-12
lines changed

.vitepress/en.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ function sidebarIntegration(): DefaultTheme.SidebarItem[] {
162162
items: [
163163
{ text: 'Argparse', link:'integration-argparse' },
164164
{ text: 'Ascend NPU & MindSpore', link: 'integration-ascend' },
165+
{ text: 'DiffSynth-Studio', link: 'integration-diffsynth-studio' },
165166
{ text: 'EasyR1', link: 'integration-easyr1' },
166167
{ text: 'Fastai', link: 'integration-fastai' },
167168
]

.vitepress/zh.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,7 @@ function sidebarIntegration(): DefaultTheme.SidebarItem[] {
178178
items: [
179179
{ text: 'Argparse', link:'integration-argparse' },
180180
{ text: 'Ascend NPU & MindSpore', link: 'integration-ascend' },
181+
{ text: 'DiffSynth-Studio', link: 'integration-diffsynth-studio' },
181182
{ text: 'EasyR1', link: 'integration-easyr1' },
182183
{ text: 'Fastai', link: 'integration-fastai' },
183184
]

en/guide_cloud/general/what-is-swanlab.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -92,18 +92,19 @@ Below is a list of frameworks we have integrated, please submit [Issue](https://
9292
- [Keras](/en/guide_cloud/integration/integration-keras.html)
9393

9494
**Specialized/Fine-tuned Frameworks**
95-
- [PyTorch Lightning](/en/guide_cloud/integration/integration-pytorch-lightning.html)
96-
- [HuggingFace Transformers](/en/guide_cloud/integration/integration-huggingface-transformers.html)
95+
- [PyTorch Lightning](/guide_cloud/integration/integration-pytorch-lightning.html)
96+
- [HuggingFace Transformers](/guide_cloud/integration/integration-huggingface-transformers.html)
97+
- [LLaMA Factory](/guide_cloud/integration/integration-llama-factory.html)
98+
- [Modelscope Swift](/guide_cloud/integration/integration-swift.html)
99+
- [DiffSynth-Studio](/guide_cloud/integration/integration-diffsynth-studio.html)
100+
- [Sentence Transformers](/guide_cloud/integration/integration-sentence-transformers.html)
97101
- [OpenMind](https://modelers.cn/docs/zh/openmind-library/1.0.0/basic_tutorial/finetune/finetune_pt.html#%E8%AE%AD%E7%BB%83%E7%9B%91%E6%8E%A7)
98-
- [LLaMA Factory](/en/guide_cloud/integration/integration-llama-factory.html)
99-
- [Modelscope Swift](/en/guide_cloud/integration/integration-swift.html)
100-
- [Sentence Transformers](/en/guide_cloud/integration/integration-sentence-transformers.html)
101-
- [Torchtune](/en/guide_cloud/integration/integration-pytorch-torchtune.html)
102-
- [XTuner](/en/guide_cloud/integration/integration-xtuner.html)
103-
- [MMEngine](/en/guide_cloud/integration/integration-mmengine.html)
104-
- [FastAI](/en/guide_cloud/integration/integration-fastai.html)
105-
- [LightGBM](/en/guide_cloud/integration/integration-lightgbm.html)
106-
- [XGBoost](/en/guide_cloud/integration/integration-xgboost.html)
102+
- [Torchtune](/guide_cloud/integration/integration-pytorch-torchtune.html)
103+
- [XTuner](/guide_cloud/integration/integration-xtuner.html)
104+
- [MMEngine](/guide_cloud/integration/integration-mmengine.html)
105+
- [FastAI](/guide_cloud/integration/integration-fastai.html)
106+
- [LightGBM](/guide_cloud/integration/integration-lightgbm.html)
107+
- [XGBoost](/guide_cloud/integration/integration-xgboost.html)
107108

108109

109110
**Computer Vision**
6.15 MB
Loading
350 KB
Loading
201 KB
Loading
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# DiffSynth Studio
2+
3+
[DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) is an open-source diffusion model engine launched by [ModelScope](https://modelscope.cn/), focusing on image and video style transfer and generation tasks. By optimizing architectural designs (such as text encoders, UNet, VAE, and other components), it significantly enhances computational performance while maintaining compatibility with open-source community models, providing users with an efficient and flexible creative tool.
4+
5+
DiffSynth Studio supports various diffusion models, including Wan-Video, StepVideo, HunyuanVideo, CogVideoX, FLUX, ExVideo, Kolors, Stable Diffusion 3, and more.
6+
7+
![](./diffsynth/logo.png)
8+
9+
You can use DiffSynth Studio to quickly train Diffusion models while using SwanLab for experiment tracking and visualization.
10+
11+
[[toc]]
12+
13+
## Preparation
14+
15+
**1. Clone the Repository and Set Up the Environment**
16+
17+
```bash
18+
git clone https://github.com/modelscope/DiffSynth-Studio.git
19+
cd DiffSynth-Studio
20+
pip install -e .
21+
pip install swanlab
22+
```
23+
24+
**2. Prepare the Dataset**
25+
26+
The dataset for DiffSynth Studio needs to be structured in the following format. For example, place the image data in the `data/dog` directory:
27+
28+
```bash
29+
data/dog/
30+
└── train
31+
├── 00.jpg
32+
├── 01.jpg
33+
├── 02.jpg
34+
├── 03.jpg
35+
├── 04.jpg
36+
└── metadata.csv
37+
```
38+
39+
The `metadata.csv` file should be structured as follows:
40+
41+
```csv
42+
file_name,text
43+
00.jpg,A small dog
44+
01.jpg,A small dog
45+
02.jpg,A small dog
46+
03.jpg,A small dog
47+
04.jpg,A small dog
48+
```
49+
50+
**3. Prepare the Model**
51+
52+
Here, we use the Kolors model as an example. Download the model weights and VAE weights:
53+
54+
```bash
55+
modelscope download --model=Kwai-Kolors/Kolors --local_dir models/kolors/Kolors
56+
modelscope download --model=AI-ModelScope/sdxl-vae-fp16-fix --local_dir models/kolors/sdxl-vae-fp16-fix
57+
```
58+
59+
## Setting SwanLab Parameters
60+
61+
When running the training script, add `--use_swanlab` to record the training process on the SwanLab platform.
62+
63+
If you need offline recording, you can add `--swanlab_mode "local"`.
64+
65+
```bash {3,4}
66+
CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
67+
...
68+
--use_swanlab \
69+
--swanlab_mode "cloud"
70+
```
71+
72+
## Starting the Training
73+
74+
Use the following command to start the training and record hyperparameters, training logs, loss curves, and other information using SwanLab:
75+
76+
```bash {11,12}
77+
CUDA_VISIBLE_DEVICES="0" python examples/train/kolors/train_kolors_lora.py \
78+
--pretrained_unet_path models/kolors/Kolors/unet/diffusion_pytorch_model.safetensors \
79+
--pretrained_text_encoder_path models/kolors/Kolors/text_encoder \
80+
--pretrained_fp16_vae_path models/kolors/sdxl-vae-fp16-fix/diffusion_pytorch_model.safetensors \
81+
--dataset_path data/dog \
82+
--output_path ./models \
83+
--max_epochs 10 \
84+
--center_crop \
85+
--use_gradient_checkpointing \
86+
--precision "16-mixed" \
87+
--use_swanlab \
88+
--swanlab_mode "cloud"
89+
```
90+
91+
![](./diffsynth/ui-1.png)
92+
93+
![](./diffsynth/ui-2.png)
94+
95+
## Additional Notes
96+
97+
If you want to customize SwanLab project names, experiment names, and other parameters, you can:
98+
99+
**1. Text-to-Image Tasks**
100+
101+
In the `DiffSynth-Studio/diffsynth/trainers/text_to_image.py` file, locate the `swanlab_logger` variable and modify the `project` and `name` parameters:
102+
103+
```python {6-7}
104+
if args.use_swanlab:
105+
from swanlab.integration.pytorch_lightning import SwanLabLogger
106+
swanlab_config = {"UPPERFRAMEWORK": "DiffSynth-Studio"}
107+
swanlab_config.update(vars(args))
108+
swanlab_logger = SwanLabLogger(
109+
project="diffsynth_studio",
110+
name="diffsynth_studio",
111+
config=swanlab_config,
112+
mode=args.swanlab_mode,
113+
logdir=args.output_path,
114+
)
115+
logger = [swanlab_logger]
116+
```
117+
118+
**2. Wan-Video Text-to-Video Tasks**
119+
120+
In the `DiffSynth-Studio/examples/wanvideo/train_wan_t2v.py` file, locate the `swanlab_logger` variable and modify the `project` and `name` parameters:
121+
122+
```python {6-7}
123+
if args.use_swanlab:
124+
from swanlab.integration.pytorch_lightning import SwanLabLogger
125+
swanlab_config = {"UPPERFRAMEWORK": "DiffSynth-Studio"}
126+
swanlab_config.update(vars(args))
127+
swanlab_logger = SwanLabLogger(
128+
project="wan",
129+
name="wan",
130+
config=swanlab_config,
131+
mode=args.swanlab_mode,
132+
logdir=args.output_path,
133+
)
134+
logger = [swanlab_logger]
135+
```

zh/guide_cloud/general/what-is-swanlab.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,11 @@ SwanLab 面向人工智能研究者,设计了友好的Python API 和漂亮的U
105105
**专有/微调框架**
106106
- [PyTorch Lightning](/guide_cloud/integration/integration-pytorch-lightning.html)
107107
- [HuggingFace Transformers](/guide_cloud/integration/integration-huggingface-transformers.html)
108-
- [OpenMind](https://modelers.cn/docs/zh/openmind-library/1.0.0/basic_tutorial/finetune/finetune_pt.html#%E8%AE%AD%E7%BB%83%E7%9B%91%E6%8E%A7)
109108
- [LLaMA Factory](/guide_cloud/integration/integration-llama-factory.html)
110109
- [Modelscope Swift](/guide_cloud/integration/integration-swift.html)
110+
- [DiffSynth-Studio](/guide_cloud/integration/integration-diffsynth-studio.html)
111111
- [Sentence Transformers](/guide_cloud/integration/integration-sentence-transformers.html)
112+
- [OpenMind](https://modelers.cn/docs/zh/openmind-library/1.0.0/basic_tutorial/finetune/finetune_pt.html#%E8%AE%AD%E7%BB%83%E7%9B%91%E6%8E%A7)
112113
- [Torchtune](/guide_cloud/integration/integration-pytorch-torchtune.html)
113114
- [XTuner](/guide_cloud/integration/integration-xtuner.html)
114115
- [MMEngine](/guide_cloud/integration/integration-mmengine.html)
6.15 MB
Loading
350 KB
Loading

0 commit comments

Comments
 (0)