Skip to content

ShamSinha/ImageGenControlNet

Repository files navigation

ImageGenControlNet

In notebook_02.ipynb tried mutliple pipelines of different combination of control net conditioning such as :

  • depth
  • depth, canny
  • depth, normal
  • depth, canny, normal
  • depth , normal , canny
  • canny, depth, normal

Some input depth images, such as 168x168 or 177x177, result in distorted outputs, sometimes even generating NSFW content. Additionally, a depth image with a resolution of 2668x2668 produces very noisy outputs. Therefore, decided to resize all inputs to a standard size of 512x512.

Also normalized input depth image to (0,255)

Final Pipeline

  1. Normalize each depth image such that its pixel values are scaled between 0 and 255. This normalization ensures that all depth images are on a comparable scale for visualization and processing.
  2. Resize all depth images to a standard size of 512x512xch pixels to ensure uniform input to the model and ease of comparison across various images. Using this fixed size simplifies the processing pipeline.
  3. Convert depth image to normals, then use ControlNet pipeline to apply depth conditioning followed by normals conditioning for improved generation.

Here for each text prompt, provide three set of images

  • one is input depth image, canny edge image followed by normal image extracted from given depth image.
  • generated output image
  • comparison shown between input depth image vs depth image extracted from generated output image

beautiful landscape, mountains in the background


Logo

Logo

Logo


luxury bedroom interior

Logo

Logo

Logo


Beautiful snowy mountains

Logo

Logo

Logo


luxurious bedroom interior

Logo

Logo

Logo


walls with cupboard

Logo

Logo

Logo


room with chair

Logo

Logo

Logo


House in the forest

Logo

Logo

Logo


Also extracted depth from generated output using Monocular depth estimation and verified visually. It is visually almost same.

See final_pipeline.ipynb for the final output.

Aspect ratio

In aspect_ratio.ipynb notebook, it was demonstrated that for a given text prompt, we have two different depth images: one with an aspect ratio of 1:1 and the other with an aspect ratio close to 16:9.

Yes we can generate different aspect ratio image from Stable Diffusion.

This is 1:1 aspect ratio image Logo

This is 16:9 aspect ratio image Logo

The Stable Diffusion model is optimized for generating images with a 1:1 aspect ratio. In this case, a 1:1 aspect ratio depth image was created from an original 16:9 aspect ratio depth image. As a result, there is no distortion or weird output, since the image was not stretched or compressed—only cropped.

Cropping a depth image from 16:9 to 1:1 typically results in a more focused and condensed output, but it may reduce the richness and context of the generated image by removing peripheral details.

In this instance, the 1:1 aspect ratio output appears more detailed compared to the 16:9 aspect ratio output.

Reducing Generation Latency

In this reducing generation latency notebook, we demonstrate methods to quickly reduce generation latency. The image below was generated with the default number of inference steps (i.e., 50) and default resolution, taking 3.36 seconds to produce.

One way to reduce generation time is by decreasing the number of inference steps in the Stable Diffusion pipeline. In this case, the generation time was reduced to 2.1 seconds.

Logo

Another approach is to reduce the resolution of the generated output image, which resulted in a generation time of 2.7 seconds.

Logo

From a visual inspection, we can conclude that reducing latency also reduces the quality of the generated image.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published