Skip to content

BhavyaGoyal777/POLYGON-COLOURIZATION

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Conditional Polygon Colorization using a UNet Model

This repository contains the complete project for the Ayna ML Assignment. The goal is to train a conditional UNet model to color a polygon based on its shape (from an image) and a desired color (from a text prompt). The entire implementation, from data loading to training and inference, is contained within a single Google Colab notebook.

Open In Colab
W&B Report

Table of Contents

Project Overview

The core of this project is a deep learning model that performs conditional image-to-image translation. It takes two inputs:

  1. An image of a polygon outline.
  2. A text prompt specifying a color (e.g., "blue", "red").

The model's output is an image of the input polygon filled with the specified color. This is achieved using a UNet architecture conditioned on text embeddings generated by OpenAI's CLIP model.

Key Features

  • Conditional UNet: Utilizes the UNet2DConditionModel from Hugging Face diffusers, which injects conditional information via cross-attention layers.
  • CLIP-Powered Text Conditioning: Employs the openai/clip-vit-base-patch32 model to transform color names into rich semantic embeddings.
  • Advanced Loss Function: A composite loss combining pixel-wise (MSE), perceptual (LPIPS), structural (SSIM), and a domain-specific Color Loss in the LAB color space to achieve high-fidelity results.
  • Sophisticated Training Schedule: Uses an AdamW optimizer with a linear warmup and cosine decay learning rate scheduler for stable and effective training.
  • Robust Data Pipeline: Leverages datasets from the Hugging Face Hub, combining synthetic and augmented data with on-the-fly transformations to ensure model generalization.
  • Comprehensive Experiment Tracking: All experiments are logged to Weights & Biases for detailed analysis and reproducibility.

Dataset Strategy

A robust dataset was created by combining two sources from the Hugging Face Hub: bhavya777/synthetic-colored-shapes and bhavya777/augmented-colored-shapes.

To ensure the model generalizes well, an on-the-fly augmentation pipeline was used during training. This included paired geometric transforms (random flips and rotations) and color jitter. This strategy was proven critical after an early model trained only on white backgrounds failed completely when tested on a noisy black background.

Model and Training Methodology

Model Architecture

The generator is a UNet2DConditionModel conditioned on 768-dimensional text embeddings from a CLIPTextModel. The text embeddings guide the UNet's decoding path via cross-attention, allowing the model to infuse the correct color information at multiple feature map scales.

Training Details

  • Optimizer: AdamW
  • Learning Rate: $1 \times 10^{-4}$ (Base)
  • Weight Decay: $1 \times 10^{-4}$
  • Scheduler: Linear warmup for the first 10% of steps, followed by a cosine decay schedule.
  • Gradient Clipping: Max norm of 1.0 to prevent exploding gradients.

Composite Loss Function

The final loss was a weighted sum of four components, designed to balance different aspects of image quality: $ \mathcal{L}{total} = 1.0 \cdot L{MSE} + 0.5 \cdot L_{LPIPS} + 0.2 \cdot L_{SSIM} + 0.3 \cdot L_{Color} $ The custom $L_{Color}$ was computed in the perceptually uniform LAB color space, which was a key factor in achieving accurate color reproduction.

Experiments Summary

A total of 8 models were trained to systematically find the optimal configuration. The detailed qualitative results for each epoch can be found in the final PDF report.

Model Key Change / Loss Function Architecture Details Parameters Epochs W&B Link Hugging Face Hub
1 Baseline: MSE Only (32,64,128), layers=1 68,148,747 5 Link Link
2 Added LPIPS (32,64,128), layers=1 68,148,747 5 Link Link
3 Added SSIM (32,64,128), layers=1 68,148,747 5 Link Link
4 Added Color Loss (32,64,128), layers=1 68,148,747 5 Link Link
5 Deeper UNet (64,128,256), layers=2 89,723,811 5 Link Link
6 Longer Training (10 Ep) (32,64,128), head_dim=8 68,395,939 10 Link Link
7 Longer Training (15 Ep) (32,64,128), head_dim=8 68,395,939 15 Link Link
8 Final Architecture (64,32,64), head_dim=12 65,212,403 15 Link Link

Key Learnings

  • A Custom Loss is a Game-Changer: The custom LAB-based Color Loss was the single most impactful change, directly addressing the core task of accurate color reproduction where other losses failed.
  • Augment for Generalization: On-the-fly data augmentation is essential for building robust models that perform well on data outside their immediate training distribution.
  • Systematic Experimentation is Crucial: The progression through the 8 models clearly shows how iterative improvements to the loss function and architecture lead to a superior final result.
  • Sophisticated Training Works: The combination of a warmup-plus-cosine-decay scheduler, AdamW optimizer, and gradient clipping created a stable training environment that allowed models to converge effectively.

How to Run

  1. Click the "Open in Colab" badge at the top of this README to launch the notebook in Google Colab.
  2. The notebook is self-contained. Run the cells in order from top to bottom.
  3. Dependencies will be installed, data will be downloaded, the model will be trained, and inference examples will be shown.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors