Skip to content

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction (ICCV 2025)

Notifications You must be signed in to change notification settings

caiyuanhao1998/Open-DiffusionGS

Repository files navigation

arXiv project hf MrNeRF

Baking Gaussian Splatting into Diffusion Denoiser for Fast and
Scalable Single-stage Image-to-3D Generation and Reconstruction

abo gso real_img wild

sd_2 sd_1 flux_1 green_man

plaza town

cliff art_gallery

Introduction

This is an implementation of our work "Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction ". The code and checkpoints here is a re-implementation and re-training and differs from the original version developed at Adobe. Our DiffusionGS is single-stage and does not rely on 2D multi-view diffusion model. DiffusionGS can be applied to single-view 3D object generation and scene reconstruction without using depth estimator in ~6 seconds. If you find our repo useful, please give it a star ⭐ and consider citing our paper. Thank you :)

pipeline

News

  • 2025.10.17 : Add visual comparisons between Hunyuan-v2.5 and our open-source model. Our method is over 7.5x Hunyuan-v2.5 model. 🚀
  • 2025.10.10 : Code and models have been released. Feel free to check and use them. 💫
  • 2024.11.22 : Our project page has been built up. Feel free to check the video and interactive generation results on the project page.
  • 2024.11.21 : We upload the prompt image and our generation results to our hugging face dataset. Feel free to download and make a comparison with your method. 🤗
  • 2024.11.20 : Our paper is on arxiv now. 🚀

Comparison with State-of-the-Art Methods

Quantitative Comparison in the Paper

results1

Qualitative Comparison in the paper

visual_results

Qualitative Comparison between Hunyuan-v2.5 and Our Open-source Version Model

Note: The first row is the prompt image. The second row is Hunyuan-v2.5. The third row is our open-source model. Our model only takes 24s for inference, while Hunyuan-v2.5 takes about 180s. Our model is 7.5x faster. As for the training cost, our open-source model only takes 16-32 GPUs to train and can be applied on scene-level generation, while Hunyuan-v2.5 is much more expensive.

1 2 3

hunyuan_1 hunyuan_2 hunyuan_3

ours_1 ours_2 ours_3

 

 

1. Create Environment

conda create -n diffusiongs python=3.11 -y
conda activate diffusiongs
# conda install -c "nvidia/label/cuda-12.1.1" cudatoolkit
# conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install torch==2.5.1 torchvision==0.20.1
pip install -r requirements.txt
pip install -e submodules/diff-gaussian-rasterization
pip install -e submodules/simple-knn

 

2. Quick Demo

For object-centric image-to-3D generation model, we provide a single-line script to use the code:

python run.py

This code will automatically download the model checkpoints and config files from HuggingFace. Or you can manually download it from this link and set it to local dir.

 

3. Data Preparation

3.1 Scene-level Dataset

Download the RealEstate10K dataset from this link, which is provided by pixelSplat, and unzip the zip file and put the data in YOUR_RAW_DATAPATH. Run the following command to preprocess the data into our format.

python process_data.py --base_path YOUR_RAW_DATAPATH --output_dir YOUR_PROCESSED_DATAPATH --mode ['train' or 'test']

3.2 Object-level Dataset

We retrained our model using only the Objaverse dataset, which differs from the approaches adopted by Adobe. Additionally, we provide a dataloader that allows you to leverage the open-source G-Objaverse to train object models from scratch.

For prepare the G-objaverse dataset, please follow the instructions in G-objaverse.

After you download and unzip the dataset. You can see the following structure:

gobjaverse
├──0
    ├── 10010
    ├── 10013
    └── ...          

After that, you need to prepare a folder that contains 3 json files call json like:

json
├── test.json ## set a subset for eval
├── train.json ##Use the download script as the training jsons
└── val.json  ## set a subset for eval

Then, specified the local_dir to this json file and the image_dir to the gobjaverse file in the config file (diffusionGS/configs/diffusionGS_rel.yaml) so that you can train our model using gobjaverse.

 

4. Evaluation for Single-view Scene Reconstruction

The scene-level evaluation is conducted on the RealEstate10K dataset prepocessed by pixelSplat. The model checkpoints are host on HuggingFace.

Model PSNR SSIM LPIPS
Open-DiffusionGS(res256) 21.26 0.672 0.257
Open-DiffusionGS(res512) - - -

We use ./extra_files/evaluation_index_re10k.json to specify the input and target view indice. This json file is originally from pixelSplat.

We only provide evaluation codes for scene as

bash script/eval.sh

This code will evaluate all testsets, and generate the .pt for you to caculate metrics, if you want to store the scene videos and gaussians, plese turn system.save_intermediate_video to True in the config file (diffusiongs/configs/diffusionGS_scene_eval.yaml).

After run this codes, the result will specified in {exp_root_dir}/{name}/{tags} in the config file.

For provided config, the result will be form like

outputs/diffusion_gs_scene_re10k_256_stage1_eval/diffusion-gs-model-scene+lr0.0001/save/it0
├── 0a3b5fb184936a83.pt
├── 0a4cf8d9b81b4c6e.pt
├── ...

each .pt store the rendered images and gt images for you to calculate metrics. if you turn system.save_intermediate_video = True you will see rendered videos of the scene.

If you want to calculate metrics, please run:

bash cal_metrics.sh

after you replace the exp_root_dir in cal_metrics.sh, you can run this script to calculate metrics.

 

5. Training

We provide 4 stages training scripts for you to train your own models:

bash scripts/train_scene_stage1.py # train object model (res256)
bash scripts/train_scene_stage2.py # train object model (res512)
bash scripts/train_obj_stage1.py  # train scene model (res256)
bash scripts/train_obj_stage2.py  # train scene model (res512)

Before training, you need to specified your data path in the config files by replace local_dir to your processed RealEstate10K (For scene). Or image_dir and local_dir to the gobjaverse file and the prepared json folder (For object).

Note: when you train the second stage model, remember to replace shape_model.pretrained_model_name_or_path: to the trained first stage checkpoint.

 

6. Citation

@inproceedings{diffusiongs,
  title={Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction},
  author={Yuanhao Cai and He Zhang and Kai Zhang and Yixun Liang and Mengwei Ren and Fujun Luan and Qing Liu and Soo Ye Kim and Jianming Zhang and Zhifei Zhang and Yuqian Zhou and Yulun Zhang and Xiaokang Yang and Zhe Lin and Alan Yuille},
  booktitle={ICCV},
  year={2025}
}

 

Acknowledgments

We would like to thank the following projects: DiffSplat, CraftsMan3D, LVSM.

About

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction (ICCV 2025)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published