Conditional Polygon Colorization using a UNet Model

This repository contains the complete project for the Ayna ML Assignment. The goal is to train a conditional UNet model to color a polygon based on its shape (from an image) and a desired color (from a text prompt). The entire implementation, from data loading to training and inference, is contained within a single Google Colab notebook.

Project Overview

The core of this project is a deep learning model that performs conditional image-to-image translation. It takes two inputs:

An image of a polygon outline.
A text prompt specifying a color (e.g., "blue", "red").

The model's output is an image of the input polygon filled with the specified color. This is achieved using a UNet architecture conditioned on text embeddings generated by OpenAI's CLIP model.

Key Features

Conditional UNet: Utilizes the UNet2DConditionModel from Hugging Face diffusers, which injects conditional information via cross-attention layers.
CLIP-Powered Text Conditioning: Employs the openai/clip-vit-base-patch32 model to transform color names into rich semantic embeddings.
Advanced Loss Function: A composite loss combining pixel-wise (MSE), perceptual (LPIPS), structural (SSIM), and a domain-specific Color Loss in the LAB color space to achieve high-fidelity results.
Sophisticated Training Schedule: Uses an AdamW optimizer with a linear warmup and cosine decay learning rate scheduler for stable and effective training.
Robust Data Pipeline: Leverages datasets from the Hugging Face Hub, combining synthetic and augmented data with on-the-fly transformations to ensure model generalization.
Comprehensive Experiment Tracking: All experiments are logged to Weights & Biases for detailed analysis and reproducibility.

Dataset Strategy

A robust dataset was created by combining two sources from the Hugging Face Hub: bhavya777/synthetic-colored-shapes and bhavya777/augmented-colored-shapes.

To ensure the model generalizes well, an on-the-fly augmentation pipeline was used during training. This included paired geometric transforms (random flips and rotations) and color jitter. This strategy was proven critical after an early model trained only on white backgrounds failed completely when tested on a noisy black background.

Model and Training Methodology

Model Architecture

The generator is a UNet2DConditionModel conditioned on 768-dimensional text embeddings from a CLIPTextModel. The text embeddings guide the UNet's decoding path via cross-attention, allowing the model to infuse the correct color information at multiple feature map scales.

Training Details

Optimizer: AdamW
Learning Rate: $1 \times 10^{-4}$ (Base)
Weight Decay: $1 \times 10^{-4}$
Scheduler: Linear warmup for the first 10% of steps, followed by a cosine decay schedule.
Gradient Clipping: Max norm of 1.0 to prevent exploding gradients.

Composite Loss Function

The final loss was a weighted sum of four components, designed to balance different aspects of image quality: $ \mathcal{L}{total} = 1.0 \cdot L{MSE} + 0.5 \cdot L_{LPIPS} + 0.2 \cdot L_{SSIM} + 0.3 \cdot L_{Color} $ The custom $L_{Color}$ was computed in the perceptually uniform LAB color space, which was a key factor in achieving accurate color reproduction.

Experiments Summary

A total of 8 models were trained to systematically find the optimal configuration. The detailed qualitative results for each epoch can be found in the final PDF report.

Model	Key Change / Loss Function	Architecture Details	Parameters	Epochs	W&B Link	Hugging Face Hub
1	Baseline: MSE Only	`(32,64,128)`, layers=1	68,148,747	5	Link	Link
2	Added LPIPS	`(32,64,128)`, layers=1	68,148,747	5	Link	Link
3	Added SSIM	`(32,64,128)`, layers=1	68,148,747	5	Link	Link
4	Added Color Loss	`(32,64,128)`, layers=1	68,148,747	5	Link	Link
5	Deeper UNet	`(64,128,256)`, layers=2	89,723,811	5	Link	Link
6	Longer Training (10 Ep)	`(32,64,128)`, head_dim=8	68,395,939	10	Link	Link
7	Longer Training (15 Ep)	`(32,64,128)`, head_dim=8	68,395,939	15	Link	Link
8	Final Architecture	`(64,32,64)`, head_dim=12	65,212,403	15	Link	Link

Key Learnings

A Custom Loss is a Game-Changer: The custom LAB-based Color Loss was the single most impactful change, directly addressing the core task of accurate color reproduction where other losses failed.
Augment for Generalization: On-the-fly data augmentation is essential for building robust models that perform well on data outside their immediate training distribution.
Systematic Experimentation is Crucial: The progression through the 8 models clearly shows how iterative improvements to the loss function and architecture lead to a superior final result.
Sophisticated Training Works: The combination of a warmup-plus-cosine-decay scheduler, AdamW optimizer, and gradient clipping created a stable training environment that allowed models to converge effectively.

How to Run

Click the "Open in Colab" badge at the top of this README to launch the notebook in Google Colab.
The notebook is self-contained. Run the cells in order from top to bottom.
Dependencies will be installed, data will be downloaded, the model will be trained, and inference examples will be shown.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
report-runs-wandb		report-runs-wandb
readme.md		readme.md
report.pdf		report.pdf
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conditional Polygon Colorization using a UNet Model

Table of Contents

Project Overview

Key Features

Dataset Strategy

Model and Training Methodology

Model Architecture

Training Details

Composite Loss Function

Experiments Summary

Key Learnings

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conditional Polygon Colorization using a UNet Model

Table of Contents

Project Overview

Key Features

Dataset Strategy

Model and Training Methodology

Model Architecture

Training Details

Composite Loss Function

Experiments Summary

Key Learnings

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages