Introduction | Demo | How to use | Train | Citation | Acknowledgements
- 06/28/2025: Training code release! Check out Train_Guide.md
- 06/25/2025: Our work has been accepted by ICCV 2025!
- 06/02/2025: Release GenStereo v2.1, which demonstrates better performance and higher resolution! Check out the demo.
- 03/17/2025: Codes and demos are released!
This repository is an official implementation for the paper "Towards Open-World Generation of Stereo Images and Unsupervised Matching". Given an arbitrary reference image, GenStereo generates the corresponding right-view image by enforcing constraints at three levels: input (disparity-aware coordinate and warped-image embeddings), feature (cross-view attention), and output (pixel-level loss with adaptive fusion). These constraints yield stereo images with geometric consistency and visual quality. Our methods demonstrate state-of-the-art performance in both stereo image generation and unsupervised stereo matching.
Try the demo here.
We tested our codes on Ubuntu with nVidia A100 GPU. If you're using other machines like Windows, consider using Docker. You can either add packages to your python environment or use Docker to build an python environment. Commands below are all expected to run in the root directory of the repository.
We tested the environment with python >=3.10
and CUDA =11.8
. To add mandatory dependencies run the command below.
pip install -r requirements.txt
To run developmental codes such as the example provided in jupyter notebook and the live demo implemented by gradio, add extra dependencies via the command below.
pip install -r requirements_dev.txt
GenStereo uses pretrained models which consist of both our finetuned models and publicly available third-party ones. Download all the models to checkpoints
directory or anywhere of your choice. You can do it manually or by the download_models.sh script.
bash scripts/download_models.sh
Note
Models and checkpoints provided below may be distributed under different licenses. Users are required to check licenses carefully on their behalf.
- Our finetuned models, we provide two versions of GenStereo
- v1.5: 512px, faster, model card.
- v2.1: 768px, better performance, high resolution, takes more time, model card.
- Pretrained models:
- sd-vae-ft-mse
- download
config.json
anddiffusion_pytorch_model.safetensors
tocheckpoints/sd-vae-ft-mse
- download
- sd-image-variations-diffusers
- download
image_encoder/config.json
andimage_encoder/pytorch_model.bin
tocheckpoints/image_encoder
- download
- sd-vae-ft-mse
- MDE (Monocular Depth Estimation) models
- We use Depth Anything V2 as the MDE model and get the disparity maps.
The final
checkpoints
directory must look like this:
- We use Depth Anything V2 as the MDE model and get the disparity maps.
The final
.
├── depth_anything_v2_vitl.pth
├── genstereo-v1.5
│ ├── config.json
│ ├── denoising_unet.pth
│ ├── fusion_layer.pth
│ ├── pose_guider.pth
│ └── reference_unet.pth
├── genstereo-v2.1
│ ├── config.json
│ ├── denoising_unet.pth
│ ├── fusion_layer.pth
│ ├── pose_guider.pth
│ └── reference_unet.pth
├── image_encoder
│ ├── config.json
│ └── pytorch_model.bin
└── sd-vae-ft-mse
├── config.json
└── diffusion_pytorch_model.safetensors
You can easily run the inference code by running the following command, and the results will be save under ./vis
folder.
python test.py /path/to/your/image
An interactive live demo is also available. Start gradio demo by running the command below, and goto http://127.0.0.1:7860/ If you are running it on the server, be sure to forward the port 7860.
Or you can just visit Spaces hosted by Hugging Face to try it now.
python app.py
Please read Train_Guide.md.
@inproceedings{qiao2025genstereo,
author = {Qiao, Feng and Xiong, Zhexiao and Xing, Eric and Jacobs, Nathan},
title = {Towards Open-World Generation of Stereo Images and Unsupervised Matching},
booktitle = {Proceedings of the {IEEE/CVF} International Conference on Computer Vision ({ICCV})},
year = {2025},
eprint = {2503.12720},
archiveprefix = {arXiv},
primaryclass = {cs.CV}
}
Our codes are based on GenWarp, Moore-AnimateAnyone and other repositories. We thank the authors of relevant repositories and papers.