ImageGenControlNet

In notebook_02.ipynb tried mutliple pipelines of different combination of control net conditioning such as :

depth
depth, canny
depth, normal
depth, canny, normal
depth , normal , canny
canny, depth, normal

Some input depth images, such as 168x168 or 177x177, result in distorted outputs, sometimes even generating NSFW content. Additionally, a depth image with a resolution of 2668x2668 produces very noisy outputs. Therefore, decided to resize all inputs to a standard size of 512x512.

Also normalized input depth image to (0,255)

Final Pipeline

Normalize each depth image such that its pixel values are scaled between 0 and 255. This normalization ensures that all depth images are on a comparable scale for visualization and processing.
Resize all depth images to a standard size of 512x512xch pixels to ensure uniform input to the model and ease of comparison across various images. Using this fixed size simplifies the processing pipeline.
Convert depth image to normals, then use ControlNet pipeline to apply depth conditioning followed by normals conditioning for improved generation.

Here for each text prompt, provide three set of images

one is input depth image, canny edge image followed by normal image extracted from given depth image.
generated output image
comparison shown between input depth image vs depth image extracted from generated output image

beautiful landscape, mountains in the background

luxury bedroom interior

Beautiful snowy mountains

luxurious bedroom interior

walls with cupboard

room with chair

House in the forest

Also extracted depth from generated output using Monocular depth estimation and verified visually. It is visually almost same.

See final_pipeline.ipynb for the final output.

Aspect ratio

In aspect_ratio.ipynb notebook, it was demonstrated that for a given text prompt, we have two different depth images: one with an aspect ratio of 1:1 and the other with an aspect ratio close to 16:9.

Yes we can generate different aspect ratio image from Stable Diffusion.

This is 1:1 aspect ratio image

This is 16:9 aspect ratio image

The Stable Diffusion model is optimized for generating images with a 1:1 aspect ratio. In this case, a 1:1 aspect ratio depth image was created from an original 16:9 aspect ratio depth image. As a result, there is no distortion or weird output, since the image was not stretched or compressed—only cropped.

Cropping a depth image from 16:9 to 1:1 typically results in a more focused and condensed output, but it may reduce the richness and context of the generated image by removing peripheral details.

In this instance, the 1:1 aspect ratio output appears more detailed compared to the 16:9 aspect ratio output.

Reducing Generation Latency

In this reducing generation latency notebook, we demonstrate methods to quickly reduce generation latency. The image below was generated with the default number of inference steps (i.e., 50) and default resolution, taking 3.36 seconds to produce.

One way to reduce generation time is by decreasing the number of inference steps in the Stable Diffusion pipeline. In this case, the generation time was reduced to 2.1 seconds.

Another approach is to reduce the resolution of the generated output image, which resulted in a generation time of 2.7 seconds.

From a visual inspection, we can conclude that reducing latency also reduces the quality of the generated image.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Metadata		Metadata
outputs		outputs
.gitignore		.gitignore
README.md		README.md
aspect_ratio.ipynb		aspect_ratio.ipynb
final_pipeline.ipynb		final_pipeline.ipynb
notebook_02.ipynb		notebook_02.ipynb
reducing_generation_latency.ipynb		reducing_generation_latency.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ImageGenControlNet

Final Pipeline

beautiful landscape, mountains in the background

luxury bedroom interior

Beautiful snowy mountains

luxurious bedroom interior

walls with cupboard

room with chair

House in the forest

Aspect ratio

Reducing Generation Latency

About

Uh oh!

Releases

Packages

ShamSinha/ImageGenControlNet

Folders and files

Latest commit

History

Repository files navigation

ImageGenControlNet

Final Pipeline

beautiful landscape, mountains in the background

luxury bedroom interior

Beautiful snowy mountains

luxurious bedroom interior

walls with cupboard

room with chair

House in the forest

Aspect ratio

Reducing Generation Latency

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages