Stable-Diffusion

This repository contains an implementation of the Stable Diffusion model for image generation. It is trained on the Flowers102 dataset. Everything is implemented from scratch using PyTorch.

Results

Here are some samples generated by the model after training for 500 epochs. It can be seen that the model is able to generate realistic looking flowers (and some that look like organic waste).

Installation

To get started with this project, follow these steps:

Clone this repository:

git clone https://github.com/ProfessorNova/Stable-Diffusion.git
cd Stable-Diffusion

Set up Python Environment: Make sure you have Python installed (tested with Python 3.10.11).
Install PyTorch: Visit the PyTorch website for proper PyTorch installation based on your system configuration.
Install Additional Dependencies: There are two additional dependencies required for this project. tqdm is used for progress bars and matplotlib is used for plotting the results during inference.
```
pip install tqdm matplotlib
```
Run the Pretrained Model: To generate images using the pretrained model, run the following command:
```
python sd_inference.py
```
This will generate eight images and plot them using matplotlib.

Understanding Stable Diffusion

For getting a better understanding of how Stable Diffusion works and how it is implemented in this repository, I created a jupyter notebook (notebook.ipynb) which explains the fundamentals of Stable Diffusion together with the code. It also shows a creative way to generate images using a hand-drawn sketch of a flower.

Model Architecture

Have a look at the unet.py file in the lib folder of this repository if you want to see the details of the model.

Noise Embedding
- We first map the scalar noise level (a single float) into a high-dimensional embedding (1×1×64).
- This embedding will be broadcast, upsampled, and fused with feature maps in the decoder, so the network “knows” how much noise to remove at each spatial location.
Encoder (DownBlocks)
- The noisy image (128×128×3) is first processed by a Conv2D layer to lift it into a (128×128×64) feature map.
- We then apply a series of DownBlocks, each of which:
  - Halves the spatial resolution (e.g. 128→64, 64→32, …)
  - Increases the number of channels (e.g. 64→128→256→512→1024)
  - Uses residual connections internally to ease gradient flow and preserve information.
- At each stage we save the output feature map for later skip connections.
Bottleneck (ResidualBlock ×2)
- Once we reach the smallest spatial size (8×8), we apply two ResidualBlock at constant channel width (1024).
- These deepen the network’s representation power without further downsampling.
Decoder (UpBlocks)
- We then reverse the process with a series of UpBlocks:
  - Upsample spatially (e.g. 8→16, 16→32, …)
  - Reduce channel width symmetrically to the encoder (e.g. 1024→512→256→128→64)
  - Concatenate with the corresponding encoder feature map (the skip connection) at the same resolution
  - Fuse via convolution and residual connections
- This combination of coarse, high-level features with fine, low-level details allows precise reconstruction of the denoised image.
Final Convolution
- After the last UpBlock (back to (128×128×64)), a simple Conv2D layer reduces the channels to 3, yielding a predicted noise map (128×128×3).

Visualization:

Training

To train the model from scratch, run the following command:

python sd_train.py

This will start the training process. The model will generate samples after every epoch and save them in the output_sd folder by default.

Here are some images generated during training:

Epoch 1:

It is just pure noise at this point.
Epoch 10:

The model is starting to generate some larger blobs.
Epoch 50:

You can see some flower-like structures starting to form.
Epoch 100:

Colors are getting more vibrant and the shapes are more defined.
Epoch 300:

Now you can really spot the flowers. But some still look very weird.
Epoch 500:

Now almost all images look like flowers. Some are very realistic, some are not.

Acknowledgements

This project was highly inspired by the keras example Denoising Diffusion Implicit Models by András Béres.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
lib		lib
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
model.pt		model.pt
notebook.ipynb		notebook.ipynb
sd_inference.py		sd_inference.py
sd_train.py		sd_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stable-Diffusion

Results

Installation

Understanding Stable Diffusion

Model Architecture

Training

Acknowledgements

About

Uh oh!

Languages

ProfessorNova/Stable-Diffusion

Folders and files

Latest commit

History

Repository files navigation

Stable-Diffusion

Results

Installation

Understanding Stable Diffusion

Model Architecture

Training

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages