This is an implementation of our work "Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction ". The code and checkpoints here is a re-implementation and re-training and differs from the original version developed at Adobe. Our DiffusionGS is single-stage and does not rely on 2D multi-view diffusion model. DiffusionGS can be applied to single-view 3D object generation and scene reconstruction without using depth estimator in ~6 seconds. If you find our repo useful, please give it a star ⭐ and consider citing our paper. Thank you :)
- 2025.10.17 : Add visual comparisons between Hunyuan-v2.5 and our open-source model. Our method is over 7.5x Hunyuan-v2.5 model. 🚀
- 2025.10.10 : Code and models have been released. Feel free to check and use them. 💫
- 2024.11.22 : Our project page has been built up. Feel free to check the video and interactive generation results on the project page.
- 2024.11.21 : We upload the prompt image and our generation results to our hugging face dataset. Feel free to download and make a comparison with your method. 🤗
- 2024.11.20 : Our paper is on arxiv now. 🚀
Qualitative Comparison between Hunyuan-v2.5 and Our Open-source Version Model
Note:
The first row is the prompt image. The second row is Hunyuan-v2.5. The third row is our open-source model. Our model only takes 24s for inference, while Hunyuan-v2.5 takes about 180s. Our model is 7.5x faster. As for the training cost, our open-source model only takes 16-32 GPUs to train and can be applied on scene-level generation, while Hunyuan-v2.5 is much more expensive.
conda create -n diffusiongs python=3.11 -y
conda activate diffusiongs
# conda install -c "nvidia/label/cuda-12.1.1" cudatoolkit
# conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install torch==2.5.1 torchvision==0.20.1
pip install -r requirements.txt
pip install -e submodules/diff-gaussian-rasterization
pip install -e submodules/simple-knn
For object-centric image-to-3D generation model, we provide a single-line script to use the code:
python run.py
This code will automatically download the model checkpoints and config files from HuggingFace. Or you can manually download it from this link and set it to local dir.
Download the RealEstate10K dataset from this link, which is provided by pixelSplat, and unzip
the zip file and put the data in YOUR_RAW_DATAPATH
.
Run the following command to preprocess the data into our format.
python process_data.py --base_path YOUR_RAW_DATAPATH --output_dir YOUR_PROCESSED_DATAPATH --mode ['train' or 'test']
We retrained our model using only the Objaverse dataset, which differs from the approaches adopted by Adobe. Additionally, we provide a dataloader that allows you to leverage the open-source G-Objaverse to train object models from scratch.
For prepare the G-objaverse dataset, please follow the instructions in G-objaverse.
After you download and unzip the dataset. You can see the following structure:
gobjaverse
├──0
├── 10010
├── 10013
└── ...
After that, you need to prepare a folder that contains 3 json files call json
like:
json
├── test.json ## set a subset for eval
├── train.json ##Use the download script as the training jsons
└── val.json ## set a subset for eval
Then, specified the local_dir
to this json file and the image_dir
to the gobjaverse
file in the config file (diffusionGS/configs/diffusionGS_rel.yaml
) so that you can train our model using gobjaverse.
The scene-level evaluation is conducted on the RealEstate10K dataset prepocessed by pixelSplat. The model checkpoints are host on HuggingFace.
Model | PSNR | SSIM | LPIPS |
---|---|---|---|
Open-DiffusionGS(res256) | 21.26 | 0.672 | 0.257 |
Open-DiffusionGS(res512) | - | - | - |
We use ./extra_files/evaluation_index_re10k.json
to specify the input and target view indice. This json file is originally from pixelSplat.
We only provide evaluation codes for scene as
bash script/eval.sh
This code will evaluate all testsets, and generate the .pt
for you to caculate metrics, if you want to store the scene videos and gaussians, plese turn system.save_intermediate_video
to True
in the config file (diffusiongs/configs/diffusionGS_scene_eval.yaml
).
After run this codes, the result will specified in {exp_root_dir}/{name}/{tags}
in the config file.
For provided config, the result will be form like
outputs/diffusion_gs_scene_re10k_256_stage1_eval/diffusion-gs-model-scene+lr0.0001/save/it0
├── 0a3b5fb184936a83.pt
├── 0a4cf8d9b81b4c6e.pt
├── ...
each .pt
store the rendered images and gt images for you to calculate metrics. if you turn system.save_intermediate_video = True
you will see rendered videos of the scene.
If you want to calculate metrics, please run:
bash cal_metrics.sh
after you replace the exp_root_dir
in cal_metrics.sh
, you can run this script to calculate metrics.
We provide 4 stages training scripts for you to train your own models:
bash scripts/train_scene_stage1.py # train object model (res256)
bash scripts/train_scene_stage2.py # train object model (res512)
bash scripts/train_obj_stage1.py # train scene model (res256)
bash scripts/train_obj_stage2.py # train scene model (res512)
Before training, you need to specified your data path in the config files by replace local_dir
to your processed RealEstate10K
(For scene). Or image_dir
and local_dir
to the gobjaverse
file and the prepared json folder (For object).
Note: when you train the second stage model, remember to replace shape_model.pretrained_model_name_or_path:
to the trained first stage checkpoint.
@inproceedings{diffusiongs,
title={Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction},
author={Yuanhao Cai and He Zhang and Kai Zhang and Yixun Liang and Mengwei Ren and Fujun Luan and Qing Liu and Soo Ye Kim and Jianming Zhang and Zhifei Zhang and Yuqian Zhou and Yulun Zhang and Xiaokang Yang and Zhe Lin and Alan Yuille},
booktitle={ICCV},
year={2025}
}
We would like to thank the following projects: DiffSplat, CraftsMan3D, LVSM.