Skip to content

KupynOrest/s3od

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

Orest Kupyn1 Β· Hirokatsu Kataoka12 Β· Christian Rupprecht1 Β·

1University of Oxford Β· 2AIST, Japan

PyPI License: MIT

S3OD is a large-scale fully synthetic dataset for salient object detection and background removal, with 140K high-quality images generated using diffusion models. Our model, trained on large-scale synthetic data and fine-tuned on real, achieves state-of-the-art performance on real-world benchmarks and enables single-step effective background removal from images in high resolution.

banner

News

  • [2025/10/26] πŸ”₯ Release Version 0.1.0 - Training code, synthetic dataset, and inference package!

S3OD Dataset Download Instructions

The S3OD dataset contains 140K synthetic images with high-quality masks for salient object detection.

Using HuggingFace Datasets Library (Recommended)

The simplest way to use the dataset is through the datasets library:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("okupyn/s3od_dataset", split="train")

# Access data
for sample in dataset:
    image = sample["image"]      # PIL Image
    mask = sample["mask"]        # PIL Image (mask)
    caption = sample["caption"]  # str
    category = sample["category"]# str
    image_id = sample["image_id"]      # str

# Or access specific samples
sample = dataset[0]
image = sample["image"]
mask = sample["mask"]

Dataset Statistics

  • Total images: 140,000+
  • Categories: 1,000+ ImageNet classes
  • Resolution: Variable (resized during training)
  • Format: JPEG (images), PNG (masks)
  • Storage size: ~35GB (Parquet format)

Dataset Structure

Each sample in the dataset contains:

  • image: RGB image (PIL Image)
  • mask: Binary segmentation mask (PIL Image)
  • caption: Descriptive caption generated for the image
  • category: Object category from ImageNet
  • image_id: Unique image identifier

Installation

For Inference Only

From GitHub:

pip install git+https://github.com/KupynOrest/s3od.git

For Training & Research

git clone https://github.com/KupynOrest/s3od.git
cd s3od
pip install -r requirements-training.txt

Usage

Quick Start

from s3od import BackgroundRemoval
from PIL import Image

# Initialize detector (automatically downloads model from HuggingFace)
detector = BackgroundRemoval()

# Load and process image
image = Image.open("your_image.jpg")
result = detector.remove_background(image)

# Save result with transparent background
result.rgba_image.save("output.png")

# Access predictions
best_mask = result.predicted_mask  # Best mask (H, W) numpy array
all_masks = result.all_masks       # All masks (N, H, W) numpy array  
all_ious = result.all_ious         # IoU scores (N,) numpy array

Model Variants

We provide multiple model variants optimized for different use cases:

Model Training Data Best For HuggingFace
okupyn/s3od (default) Synthetic + All Real Datasets General-purpose background removal, best overall performance πŸ€— Hub
okupyn/s3od-synth Synthetic Only Research on synthetic-to-real transfer, zero-shot evaluation πŸ€— Hub
okupyn/s3od-dis Synthetic + DIS5K High-precision dichotomous segmentation πŸ€— Hub
okupyn/s3od-sod Synthetic + SOD Datasets Salient object detection tasks πŸ€— Hub

Usage with different models:

# Default model (best general performance)
detector = BackgroundRemoval(model_id="okupyn/s3od")

# Synthetic-only model (pure zero-shot)
detector_synth = BackgroundRemoval(model_id="okupyn/s3od-synth")

# DIS-specialized model (high precision)
detector_dis = BackgroundRemoval(model_id="okupyn/s3od-dis")

# SOD-specialized model
detector_sod = BackgroundRemoval(model_id="okupyn/s3od-sod")

Key Differences:

  • okupyn/s3od: Trained on 140K synthetic images + fine-tuned on DUTS, DIS5K, HR-SOD and others. Best for production use.
  • okupyn/s3od-synth: Trained exclusively on synthetic data. Demonstrates strong zero-shot generalization.
  • okupyn/s3od-dis: Fine-tuned specifically for dichotomous image segmentation with highly accurate boundaries - use for evaluation on academic benchmarks.
  • okupyn/s3od-sod: Optimized for general salient object detection benchmarks - use for evaluation on academic benchmarks.

Demo

🌐 Online Demo: HuggingFace Spaces

πŸ’» Run Locally:

cd demo
pip install -r requirements.txt
python app.py

Training

S3OD uses Hydra for configuration management and PyTorch Lightning for training.

Training on Synthetic Data

# Single GPU
python -m synth_sod.model_training.train \
    dataset=synth \
    model=dinob \
    backend=1gpu

# Multi-GPU (DDP)
torchrun --standalone --nnodes=1 --nproc_per_node=NUM_GPUS \
    -m synth_sod.model_training.train \
    dataset=synth \
    model=dinob \
    backend=NUM_GPUs

Replace NUM_GPUS with the number of GPUs you want to use (e.g., 2, 4, 8).

Available Configurations

Check synth_sod/model_training/config/ for available options:

Models (model=...):

  • dinob: DINOv3-Base (default)
  • dinol: DINOv3-Large
  • flux_teacher: FLUX-enhanced teacher model

Datasets (dataset=...):

  • synth: Synthetic S3OD dataset
  • duts: DUTS dataset
  • dis: DIS5K dataset
  • full: All datasets combined

Compute (backend=...):

  • 1gpu, 2gpu, 4gpu, 8gpu

Training Examples

# Train on real-world benchmark
python -m synth_sod.model_training.train dataset=duts model=dinob

# Train teacher model with FLUX features
python -m synth_sod.model_training.train \
    -cn train_teacher \
    dataset=full \
    model=flux_teacher \
    backend=4gpu

# Resume training from checkpoint
python -m synth_sod.model_training.train \
    dataset=synth \
    model=dinob \
    training_hyperparams.resume=True

Generating Synthetic Data

Our data generation pipeline uses FLUX diffusion models and concept-guided attention to create high-quality synthetic training data.

Prerequisites

  1. FLUX model weights
  2. OpenAI API key (for caption generation)
export OPENAI_API_KEY="your-api-key-here"

Generation Pipeline

  1. Generate Captions & Tags
cd synth_sod/data_generation/flux_finetune
python generate_captions.py --image_dir <dir> --output_file captions.json
python tag_data.py --image_dir <dir> --output_file tags.json
  1. Extract FLUX Features
python feature_extraction.py \
    --caption_file captions.json \
    --tag_file tags.json \
    --save_folder <features_dir> \
    --model_path <flux_model_path>
  1. Generate Images and Masks
python generate_train_images.py --config_path generation_config.yaml
  1. Filter Dataset (optional)
python run_filtering.py --config_path filtering_config.yaml

Update configuration files with your paths before running. See example configs in synth_sod/data_generation/.

Citation

If you use S3OD in your research, please cite:

@article{kupyn2025s3od,
  title={S3OD: Towards Generalizable Salient Object Detection with Synthetic Data},
  author={Kupyn, Orest and Kataoka, Hirokatsu and Rupprecht, Christian},
  journal={arXiv preprint arXiv:2510.21605},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Official repo for S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages