S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

Orest Kupyn¹ · Hirokatsu Kataoka¹² · Christian Rupprecht¹ ·

¹University of Oxford · ²AIST, Japan

S3OD is a large-scale fully synthetic dataset for salient object detection and background removal, with 140K high-quality images generated using diffusion models. Our model, trained on large-scale synthetic data and fine-tuned on real, achieves state-of-the-art performance on real-world benchmarks and enables single-step effective background removal from images in high resolution.

News

[2025/10/26] 🔥 Release Version 0.1.0 - Training code, synthetic dataset, and inference package!

S3OD Dataset Download Instructions

The S3OD dataset contains 140K synthetic images with high-quality masks for salient object detection.

Using HuggingFace Datasets Library (Recommended)

The simplest way to use the dataset is through the datasets library:

from datasets import load_dataset

# Load the dataset
dataset = load_dataset("okupyn/s3od_dataset", split="train")

# Access data
for sample in dataset:
    image = sample["image"]      # PIL Image
    mask = sample["mask"]        # PIL Image (mask)
    caption = sample["caption"]  # str
    category = sample["category"]# str
    image_id = sample["image_id"]      # str

# Or access specific samples
sample = dataset[0]
image = sample["image"]
mask = sample["mask"]

Dataset Statistics

Total images: 140,000+
Categories: 1,000+ ImageNet classes
Resolution: Variable (resized during training)
Format: JPEG (images), PNG (masks)
Storage size: ~35GB (Parquet format)

Dataset Structure

Each sample in the dataset contains:

image: RGB image (PIL Image)
mask: Binary segmentation mask (PIL Image)
caption: Descriptive caption generated for the image
category: Object category from ImageNet
image_id: Unique image identifier

Installation

For Inference Only

From GitHub:

pip install git+https://github.com/KupynOrest/s3od.git

For Training & Research

git clone https://github.com/KupynOrest/s3od.git
cd s3od
pip install -r requirements-training.txt

Usage

Quick Start

from s3od import BackgroundRemoval
from PIL import Image

# Initialize detector (automatically downloads model from HuggingFace)
detector = BackgroundRemoval()

# Load and process image
image = Image.open("your_image.jpg")
result = detector.remove_background(image)

# Save result with transparent background
result.rgba_image.save("output.png")

# Access predictions
best_mask = result.predicted_mask  # Best mask (H, W) numpy array
all_masks = result.all_masks       # All masks (N, H, W) numpy array  
all_ious = result.all_ious         # IoU scores (N,) numpy array

Model Variants

We provide multiple model variants optimized for different use cases:

Model	Training Data	Best For	HuggingFace
okupyn/s3od (default)	Synthetic + All Real Datasets	General-purpose background removal, best overall performance	🤗 Hub
okupyn/s3od-synth	Synthetic Only	Research on synthetic-to-real transfer, zero-shot evaluation	🤗 Hub
okupyn/s3od-dis	Synthetic + DIS5K	High-precision dichotomous segmentation	🤗 Hub
okupyn/s3od-sod	Synthetic + SOD Datasets	Salient object detection tasks	🤗 Hub

Usage with different models:

# Default model (best general performance)
detector = BackgroundRemoval(model_id="okupyn/s3od")

# Synthetic-only model (pure zero-shot)
detector_synth = BackgroundRemoval(model_id="okupyn/s3od-synth")

# DIS-specialized model (high precision)
detector_dis = BackgroundRemoval(model_id="okupyn/s3od-dis")

# SOD-specialized model
detector_sod = BackgroundRemoval(model_id="okupyn/s3od-sod")

Key Differences:

okupyn/s3od: Trained on 140K synthetic images + fine-tuned on DUTS, DIS5K, HR-SOD and others. Best for production use.
okupyn/s3od-synth: Trained exclusively on synthetic data. Demonstrates strong zero-shot generalization.
okupyn/s3od-dis: Fine-tuned specifically for dichotomous image segmentation with highly accurate boundaries - use for evaluation on academic benchmarks.
okupyn/s3od-sod: Optimized for general salient object detection benchmarks - use for evaluation on academic benchmarks.

Demo

🌐 Online Demo: HuggingFace Spaces

💻 Run Locally:

cd demo
pip install -r requirements.txt
python app.py

Training

S3OD uses Hydra for configuration management and PyTorch Lightning for training.

Training on Synthetic Data

# Single GPU
python -m synth_sod.model_training.train \
    dataset=synth \
    model=dinob \
    backend=1gpu

# Multi-GPU (DDP)
torchrun --standalone --nnodes=1 --nproc_per_node=NUM_GPUS \
    -m synth_sod.model_training.train \
    dataset=synth \
    model=dinob \
    backend=NUM_GPUs

Replace NUM_GPUS with the number of GPUs you want to use (e.g., 2, 4, 8).

Available Configurations

Check synth_sod/model_training/config/ for available options:

Models (model=...):

dinob: DINOv3-Base (default)
dinol: DINOv3-Large
flux_teacher: FLUX-enhanced teacher model

Datasets (dataset=...):

synth: Synthetic S3OD dataset
duts: DUTS dataset
dis: DIS5K dataset
full: All datasets combined

Compute (backend=...):

1gpu, 2gpu, 4gpu, 8gpu

Training Examples

# Train on real-world benchmark
python -m synth_sod.model_training.train dataset=duts model=dinob

# Train teacher model with FLUX features
python -m synth_sod.model_training.train \
    -cn train_teacher \
    dataset=full \
    model=flux_teacher \
    backend=4gpu

# Resume training from checkpoint
python -m synth_sod.model_training.train \
    dataset=synth \
    model=dinob \
    training_hyperparams.resume=True

Generating Synthetic Data

Our data generation pipeline uses FLUX diffusion models and concept-guided attention to create high-quality synthetic training data.

Prerequisites

FLUX model weights
OpenAI API key (for caption generation)

export OPENAI_API_KEY="your-api-key-here"

Generation Pipeline

Generate Captions & Tags

cd synth_sod/data_generation/flux_finetune
python generate_captions.py --image_dir <dir> --output_file captions.json
python tag_data.py --image_dir <dir> --output_file tags.json

Extract FLUX Features

python feature_extraction.py \
    --caption_file captions.json \
    --tag_file tags.json \
    --save_folder <features_dir> \
    --model_path <flux_model_path>

Generate Images and Masks

python generate_train_images.py --config_path generation_config.yaml

Filter Dataset (optional)

python run_filtering.py --config_path filtering_config.yaml

Update configuration files with your paths before running. See example configs in synth_sod/data_generation/.

Citation

If you use S3OD in your research, please cite:

@article{kupyn2025s3od,
  title={S3OD: Towards Generalizable Salient Object Detection with Synthetic Data},
  author={Kupyn, Orest and Kataoka, Hirokatsu and Rupprecht, Christian},
  journal={arXiv preprint arXiv:2510.21605},
  year={2025}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
demo		demo
images		images
s3od		s3od
scripts		scripts
synth_sod		synth_sod
tests		tests
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements-training.txt		requirements-training.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

News

S3OD Dataset Download Instructions

Using HuggingFace Datasets Library (Recommended)

Dataset Statistics

Dataset Structure

Installation

For Inference Only

For Training & Research

Usage

Quick Start

Model Variants

Demo

Training

Training on Synthetic Data

Available Configurations

Training Examples

Generating Synthetic Data

Prerequisites

Generation Pipeline

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

KupynOrest/s3od

Folders and files

Latest commit

History

Repository files navigation

S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

News

S3OD Dataset Download Instructions

Using HuggingFace Datasets Library (Recommended)

Dataset Statistics

Dataset Structure

Installation

For Inference Only

For Training & Research

Usage

Quick Start

Model Variants

Demo

Training

Training on Synthetic Data

Available Configurations

Training Examples

Generating Synthetic Data

Prerequisites

Generation Pipeline

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages