Orest Kupyn1 Β· Hirokatsu Kataoka12 Β· Christian Rupprecht1 Β·
1University of Oxford Β· 2AIST, Japan
S3OD is a large-scale fully synthetic dataset for salient object detection and background removal, with 140K high-quality images generated using diffusion models. Our model, trained on large-scale synthetic data and fine-tuned on real, achieves state-of-the-art performance on real-world benchmarks and enables single-step effective background removal from images in high resolution.
- [2025/10/26] π₯ Release Version 0.1.0 - Training code, synthetic dataset, and inference package!
The S3OD dataset contains 140K synthetic images with high-quality masks for salient object detection.
The simplest way to use the dataset is through the datasets library:
from datasets import load_dataset
# Load the dataset
dataset = load_dataset("okupyn/s3od_dataset", split="train")
# Access data
for sample in dataset:
image = sample["image"] # PIL Image
mask = sample["mask"] # PIL Image (mask)
caption = sample["caption"] # str
category = sample["category"]# str
image_id = sample["image_id"] # str
# Or access specific samples
sample = dataset[0]
image = sample["image"]
mask = sample["mask"]- Total images: 140,000+
- Categories: 1,000+ ImageNet classes
- Resolution: Variable (resized during training)
- Format: JPEG (images), PNG (masks)
- Storage size: ~35GB (Parquet format)
Each sample in the dataset contains:
image: RGB image (PIL Image)mask: Binary segmentation mask (PIL Image)caption: Descriptive caption generated for the imagecategory: Object category from ImageNetimage_id: Unique image identifier
From GitHub:
pip install git+https://github.com/KupynOrest/s3od.gitgit clone https://github.com/KupynOrest/s3od.git
cd s3od
pip install -r requirements-training.txtfrom s3od import BackgroundRemoval
from PIL import Image
# Initialize detector (automatically downloads model from HuggingFace)
detector = BackgroundRemoval()
# Load and process image
image = Image.open("your_image.jpg")
result = detector.remove_background(image)
# Save result with transparent background
result.rgba_image.save("output.png")
# Access predictions
best_mask = result.predicted_mask # Best mask (H, W) numpy array
all_masks = result.all_masks # All masks (N, H, W) numpy array
all_ious = result.all_ious # IoU scores (N,) numpy arrayWe provide multiple model variants optimized for different use cases:
| Model | Training Data | Best For | HuggingFace |
|---|---|---|---|
| okupyn/s3od (default) | Synthetic + All Real Datasets | General-purpose background removal, best overall performance | π€ Hub |
| okupyn/s3od-synth | Synthetic Only | Research on synthetic-to-real transfer, zero-shot evaluation | π€ Hub |
| okupyn/s3od-dis | Synthetic + DIS5K | High-precision dichotomous segmentation | π€ Hub |
| okupyn/s3od-sod | Synthetic + SOD Datasets | Salient object detection tasks | π€ Hub |
Usage with different models:
# Default model (best general performance)
detector = BackgroundRemoval(model_id="okupyn/s3od")
# Synthetic-only model (pure zero-shot)
detector_synth = BackgroundRemoval(model_id="okupyn/s3od-synth")
# DIS-specialized model (high precision)
detector_dis = BackgroundRemoval(model_id="okupyn/s3od-dis")
# SOD-specialized model
detector_sod = BackgroundRemoval(model_id="okupyn/s3od-sod")Key Differences:
- okupyn/s3od: Trained on 140K synthetic images + fine-tuned on DUTS, DIS5K, HR-SOD and others. Best for production use.
- okupyn/s3od-synth: Trained exclusively on synthetic data. Demonstrates strong zero-shot generalization.
- okupyn/s3od-dis: Fine-tuned specifically for dichotomous image segmentation with highly accurate boundaries - use for evaluation on academic benchmarks.
- okupyn/s3od-sod: Optimized for general salient object detection benchmarks - use for evaluation on academic benchmarks.
π Online Demo: HuggingFace Spaces
π» Run Locally:
cd demo
pip install -r requirements.txt
python app.pyS3OD uses Hydra for configuration management and PyTorch Lightning for training.
# Single GPU
python -m synth_sod.model_training.train \
dataset=synth \
model=dinob \
backend=1gpu
# Multi-GPU (DDP)
torchrun --standalone --nnodes=1 --nproc_per_node=NUM_GPUS \
-m synth_sod.model_training.train \
dataset=synth \
model=dinob \
backend=NUM_GPUsReplace NUM_GPUS with the number of GPUs you want to use (e.g., 2, 4, 8).
Check synth_sod/model_training/config/ for available options:
Models (model=...):
dinob: DINOv3-Base (default)dinol: DINOv3-Largeflux_teacher: FLUX-enhanced teacher model
Datasets (dataset=...):
synth: Synthetic S3OD datasetduts: DUTS datasetdis: DIS5K datasetfull: All datasets combined
Compute (backend=...):
1gpu,2gpu,4gpu,8gpu
# Train on real-world benchmark
python -m synth_sod.model_training.train dataset=duts model=dinob
# Train teacher model with FLUX features
python -m synth_sod.model_training.train \
-cn train_teacher \
dataset=full \
model=flux_teacher \
backend=4gpu
# Resume training from checkpoint
python -m synth_sod.model_training.train \
dataset=synth \
model=dinob \
training_hyperparams.resume=TrueOur data generation pipeline uses FLUX diffusion models and concept-guided attention to create high-quality synthetic training data.
- FLUX model weights
- OpenAI API key (for caption generation)
export OPENAI_API_KEY="your-api-key-here"- Generate Captions & Tags
cd synth_sod/data_generation/flux_finetune
python generate_captions.py --image_dir <dir> --output_file captions.json
python tag_data.py --image_dir <dir> --output_file tags.json- Extract FLUX Features
python feature_extraction.py \
--caption_file captions.json \
--tag_file tags.json \
--save_folder <features_dir> \
--model_path <flux_model_path>- Generate Images and Masks
python generate_train_images.py --config_path generation_config.yaml- Filter Dataset (optional)
python run_filtering.py --config_path filtering_config.yamlUpdate configuration files with your paths before running. See example configs in synth_sod/data_generation/.
If you use S3OD in your research, please cite:
@article{kupyn2025s3od,
title={S3OD: Towards Generalizable Salient Object Detection with Synthetic Data},
author={Kupyn, Orest and Kataoka, Hirokatsu and Rupprecht, Christian},
journal={arXiv preprint arXiv:2510.21605},
year={2025}
}This project is licensed under the MIT License - see the LICENSE file for details.
