Skip to content

sprnjt/fashion-multi

Repository files navigation

Fashion Image Retrieval System

Natural language search engine for fashion images using CLIP, VLMs, and hybrid retrieval.

Features

  • 5 Search Approaches: Fine-tuned CLIP, Hybrid, VLM, and enhanced versions
  • Multi-attribute queries: Colors, clothing types, environments
  • Sub-100ms search latency with caching and two-stage retrieval
  • Gradio web interface for easy interaction

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Index Images

# Approach 2 (Fine-tuned CLIP - Recommended)
cd approach2_finetune_clip
python index_images.py --image-dir ../fashion-data

# Approach 3 (Hybrid) - from project root
python index_images.py --image-dir ./fashion-data

# Approach 4 (VLM) - from project root
python approach4_vlm/index_images.py --image-dir ./fashion-data

# Enhanced Approach 3 - requires base A3 indexed first
python approach3_enhanced/build_indices.py

# Enhanced Approach 4 - requires base A4 indexed first
python enhanced_approach4/build_indices.py

3. Search

# ═══════════════════════════════════════════════════════════
# Approach 2 (Fine-tuned CLIP)
# ═══════════════════════════════════════════════════════════
cd approach2_finetune_clip
python search.py "crimson red blazer"
python search.py --interactive
python search.py --run-eval

# ═══════════════════════════════════════════════════════════
# Approach 3 (Hybrid) - from project root
# ═══════════════════════════════════════════════════════════
python search.py "red dress in park"
python search.py --interactive

# ═══════════════════════════════════════════════════════════
# Approach 4 (VLM) - from project root
# ═══════════════════════════════════════════════════════════
python approach4_vlm/search.py "elegant evening gown"

# ═══════════════════════════════════════════════════════════
# Enhanced Approach 3 - from project root
# ═══════════════════════════════════════════════════════════
python approach3_enhanced/search_enhanced.py "blue jacket in office"
python approach3_enhanced/search_enhanced.py --benchmark

# ═══════════════════════════════════════════════════════════
# Enhanced Approach 4 - from project root
# ═══════════════════════════════════════════════════════════
python enhanced_approach4/search_enhanced.py "casual summer outfit"
python enhanced_approach4/search_enhanced.py --benchmark

# ═══════════════════════════════════════════════════════════
# Web Demo (All approaches)
# ═══════════════════════════════════════════════════════════
python demo.py

Dataset

The system uses the fashion-data dataset from Hugging Face, containing 617 fashion images generated using Google Whisk.

Fashion Dataset on Hugging Face

Approaches

Approach Description Best For
Approach 2 Fine-tuned CLIP ViT-L/14 Fashion-specific queries
Approach 3 CLIP + Color + Scene hybrid Balanced accuracy
Approach 4 InternVL3 VLM captions Interpretable results
Enhanced A3 + Hierarchical index + Cache High-scale search
Enhanced A4 + Keyword index + Cache Fast VLM search

Architecture Diagrams

Approach 2: Fine-tuned CLIP

Approach 2 Architecture

Approach 3: Hybrid Search

Approach 3 Architecture

Approach 4: VLM Caption-based Search

Approach 4 Architecture

Performance

Approach Cold Query Warm Query Speedup
Approach 2 50-100ms 50ms -
Approach 3 200-300ms 200ms -
Approach 4 800ms 800ms -
Enhanced A3 220ms 20ms 10x
Enhanced A4 80ms 20ms 40x

Project Structure

├── approach2_finetune_clip/    # Fine-tuned CLIP
│   ├── best_model/             # Fine-tuned model weights
│   ├── index_images.py         # Index command
│   └── search.py               # Search command
├── approach3_enhanced/         # Enhanced hybrid search
│   ├── build_indices.py        # Build enhanced indices
│   └── search_enhanced.py      # Search command
├── approach4_vlm/              # VLM caption-based search
│   ├── index_images.py         # Index command
│   └── search.py               # Search command
├── enhanced_approach4/         # Enhanced VLM search
│   ├── build_indices.py        # Build enhanced indices
│   └── search_enhanced.py      # Search command
├── indexer/                    # Feature extraction (A3)
├── retriever/                  # Search engine (A3)
├── demo.py                     # Gradio web interface
├── index_images.py             # Index command (A3)
└── search.py                   # Search command (A3)

Tech Stack

  • Models: CLIP ViT-L/14, InternVL3-1B, sentence-transformers
  • Vector DB: ChromaDB
  • Frameworks: PyTorch, Transformers, Gradio

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors