Fashion Image Retrieval System

Natural language search engine for fashion images using CLIP, VLMs, and hybrid retrieval.

Features

5 Search Approaches: Fine-tuned CLIP, Hybrid, VLM, and enhanced versions
Multi-attribute queries: Colors, clothing types, environments
Sub-100ms search latency with caching and two-stage retrieval
Gradio web interface for easy interaction

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Index Images

# Approach 2 (Fine-tuned CLIP - Recommended)
cd approach2_finetune_clip
python index_images.py --image-dir ../fashion-data

# Approach 3 (Hybrid) - from project root
python index_images.py --image-dir ./fashion-data

# Approach 4 (VLM) - from project root
python approach4_vlm/index_images.py --image-dir ./fashion-data

# Enhanced Approach 3 - requires base A3 indexed first
python approach3_enhanced/build_indices.py

# Enhanced Approach 4 - requires base A4 indexed first
python enhanced_approach4/build_indices.py

3. Search

# ═══════════════════════════════════════════════════════════
# Approach 2 (Fine-tuned CLIP)
# ═══════════════════════════════════════════════════════════
cd approach2_finetune_clip
python search.py "crimson red blazer"
python search.py --interactive
python search.py --run-eval

# ═══════════════════════════════════════════════════════════
# Approach 3 (Hybrid) - from project root
# ═══════════════════════════════════════════════════════════
python search.py "red dress in park"
python search.py --interactive

# ═══════════════════════════════════════════════════════════
# Approach 4 (VLM) - from project root
# ═══════════════════════════════════════════════════════════
python approach4_vlm/search.py "elegant evening gown"

# ═══════════════════════════════════════════════════════════
# Enhanced Approach 3 - from project root
# ═══════════════════════════════════════════════════════════
python approach3_enhanced/search_enhanced.py "blue jacket in office"
python approach3_enhanced/search_enhanced.py --benchmark

# ═══════════════════════════════════════════════════════════
# Enhanced Approach 4 - from project root
# ═══════════════════════════════════════════════════════════
python enhanced_approach4/search_enhanced.py "casual summer outfit"
python enhanced_approach4/search_enhanced.py --benchmark

# ═══════════════════════════════════════════════════════════
# Web Demo (All approaches)
# ═══════════════════════════════════════════════════════════
python demo.py

Dataset

The system uses the fashion-data dataset from Hugging Face, containing 617 fashion images generated using Google Whisk.

Approaches

Approach	Description	Best For
Approach 2	Fine-tuned CLIP ViT-L/14	Fashion-specific queries
Approach 3	CLIP + Color + Scene hybrid	Balanced accuracy
Approach 4	InternVL3 VLM captions	Interpretable results
Enhanced A3	+ Hierarchical index + Cache	High-scale search
Enhanced A4	+ Keyword index + Cache	Fast VLM search

Architecture Diagrams

Approach 2: Fine-tuned CLIP

Approach 3: Hybrid Search

Approach 4: VLM Caption-based Search

Performance

Approach	Cold Query	Warm Query	Speedup
Approach 2	50-100ms	50ms	-
Approach 3	200-300ms	200ms	-
Approach 4	800ms	800ms	-
Enhanced A3	220ms	20ms	10x
Enhanced A4	80ms	20ms	40x

Project Structure

├── approach2_finetune_clip/    # Fine-tuned CLIP
│   ├── best_model/             # Fine-tuned model weights
│   ├── index_images.py         # Index command
│   └── search.py               # Search command
├── approach3_enhanced/         # Enhanced hybrid search
│   ├── build_indices.py        # Build enhanced indices
│   └── search_enhanced.py      # Search command
├── approach4_vlm/              # VLM caption-based search
│   ├── index_images.py         # Index command
│   └── search.py               # Search command
├── enhanced_approach4/         # Enhanced VLM search
│   ├── build_indices.py        # Build enhanced indices
│   └── search_enhanced.py      # Search command
├── indexer/                    # Feature extraction (A3)
├── retriever/                  # Search engine (A3)
├── demo.py                     # Gradio web interface
├── index_images.py             # Index command (A3)
└── search.py                   # Search command (A3)

Tech Stack

Models: CLIP ViT-L/14, InternVL3-1B, sentence-transformers
Vector DB: ChromaDB
Frameworks: PyTorch, Transformers, Gradio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fashion Image Retrieval System

Features

Quick Start

1. Install Dependencies

2. Index Images

3. Search

Dataset

Approaches

Architecture Diagrams

Performance

Project Structure

Tech Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gradio		.gradio
approach2_finetune_clip		approach2_finetune_clip
approach3_enhanced		approach3_enhanced
approach4_vlm		approach4_vlm
assets		assets
enhanced_approach4		enhanced_approach4
fashion-data		fashion-data
indexer		indexer
retriever		retriever
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
index_images.py		index_images.py
requirements.txt		requirements.txt
search.py		search.py

Folders and files

Latest commit

History

Repository files navigation

Fashion Image Retrieval System

Features

Quick Start

1. Install Dependencies

2. Index Images

3. Search

Dataset

Approaches

Architecture Diagrams

Performance

Project Structure

Tech Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages