This repository presents the Splatter Image framework, an ultra-fast approach for single-view 3D object reconstruction. The approach operates at 38 FPS and is based on Gaussian Splatting, a novel method that has shown success in multi-view reconstruction for real-time rendering, fast training, and scaling. Our research extends this method to monocular reconstruction by incorporating additional depth information into the model during training.
The Splatter Image framework modifies the UNet architecture, integrating depth channels to enhance 3D object reconstruction quality, significantly improving reconstruction metrics like PSNR, SSIM, and LPIPS across multiple datasets.
![]() Ground Truth Model |
![]() RGB Baseline Reconstruction |
![]() RGB+D DepthAnything Reconstruction |
![]() RGB+D Splatter-Image Reconstruction |
- π Monocular 3D object reconstruction using a fast feed-forward neural network.
- π οΈ Integration of depth channels to improve the quality of reconstructions.
- π§ͺ Evaluation of the approach on multiple datasets including SRN Cars and CO3D Cars.
- π Quantitative improvements measured using PSNR, SSIM, and LPIPS.
The project evaluates the performance of the Splatter Image framework on the following datasets:
- SRN Cars
- Subsets used: 100%, 50%, 20%
- CO3D Cars with Background
For each dataset, we tested baseline models using only RGB inputs, followed by models that integrate depth information.
We conducted experiments with two depth configurations:
- RGB+D using Splatter Image Depth Output: Depth maps were generated by the Splatter Image model itself.
- RGB+D using Depth Anything Model Output: Depth maps were generated using external depth estimation models, providing more robust depth predictions.
For each dataset, results were evaluated based on the following metrics:
- PSNR (Peak Signal-to-Noise Ratio)
- SSIM (Structural Similarity Index)
- LPIPS (Learned Perceptual Image Patch Similarity)
The performance of the Splatter Image framework showed improvements in reconstruction quality when depth information was integrated. The results are summarized below:
Dataset | Configuration | PSNR | SSIM | LPIPS |
---|---|---|---|---|
SRN Cars (100%) | Baseline (RGB only) | 19.5569 | 0.8334 | 0.2559 |
RGB+D using Splatter Image Depth Output | 18.9316 | 0.8244 | 0.2639 | |
RGB+D using Depth Anything Model Output | 19.4645 | 0.8361 | 0.2530 | |
SRN Cars (50%) | Baseline (RGB only) | 19.5290 | 0.8326 | 0.2539 |
RGB+D using Splatter Image Depth Output | 18.9742 | 0.8225 | 0.2651 | |
RGB+D using Depth Anything Model Output | 19.4829 | 0.8374 | 0.2494 | |
SRN Cars (20%) | Baseline (RGB only) | 19.3081 | 0.8298 | 0.2554 |
RGB+D using Splatter Image Depth Output | 18.7255 | 0.8193 | 0.2663 | |
RGB+D using Depth Anything Model Output | 19.3170 | 0.8329 | 0.2567 | |
CO3D Cars | Baseline (RGB only) | 14.0015 | 0.3806 | 0.6762 |
RGB+D using CO3D Depth Output | 13.9242 | 0.3730 | 0.6883 |
![]() Ground Truth |
![]() RGB Baseline |
![]() RGB+D DepthAnything |
![]() RGB+D Splatter-Image |
- Operating System: Windows 11
- Python Version: Python 3.8
- CUDA Toolkit: CUDA 11.7
- Anaconda/Miniconda: For managing the Python environment
- Git: Install Git from the official website.
- Anaconda: Install Anaconda or Miniconda from the official website.
- Visual Studio 2019 Community: Download and install from the official website. During installation, select "Desktop Development with C++".
- CUDA Toolkit v11.7: Download and install from the NVIDIA website.
- COLMAP: Install COLMAP as per the official instructions.
- ImageMagick: Install from the official website.
- FFmpeg: Install from the official website.
git clone https://github.com/dinakog/CV_Project-Splatter_Image
cd CV_Project-Splatter_Image
conda create --name splatter-image python=3.8
conda activate splatter-image
conda install --file conda_requirements.txt
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
Open the Command Prompt and run:
"C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvars64.bat"
set DISTUTILS_USE_SDK=1
Create a file named cuda_check.py with the following content:
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
print(torch.cuda.get_device_name(0))
python cuda_check.py
The training process involves two steps:
Step 1: Initial Training Run the initial training script:
python train_network.py +dataset=cars/cars_co3d
Step 2: Update Configuration and Continue Training
- Update Configuration File:
- Open configs/experiment_configs/lpips_100k.yaml.
- Update the load_network_path parameter with the path to the model created in Step 1.
- Continue Training:
python train_network.py +dataset=cars/cars_co3d +experiment=lpips_100k.yaml
To evaluate the trained model, use the following command:
python eval.py cars/cars_co3d --experiment_path <path_to_experiment>
- Replace <path_to_experiment> with the actual path to your trained model.
If you wish to generate visualizations of the model's output, add the --save_vis flag:
python eval.py cars/cars_co3d --experiment_path <path_to_experiment> --save_vis <number_of_visualizations>
- Replace <number_of_visualizations> with the desired number of visualizations to generate.
Scale the Experiments: Due to limited resources, our experiments were constrained. We hypothesize that with more computational power and a larger dataset, we could maintain the observed improvement trends. Scaling the dataset and experimenting with more extensive training iterations would be the next step.
Improve Model Architecture: Investigate more advanced neural architectures that can dynamically adapt to varying depth inputs.
Multi-View Inputs: Extend the Splatter Image framework to handle multi-view inputs and real-time dynamic object reconstruction.
Szymanowicz, S., et al. (2024). Splatter Image: Ultra-Fast Single-View 3D Reconstruction. arXiv preprint, arXiv:2312.13150.
Kerbl, B., et al. (2024). Gaussian Splatting for Real-Time Rendering. Graph Deco Inria GitHub Repository.
Li, J., Li, Y., & Zhang, L. (2023). Depth Anything: Plug-and-Play Supervised Depth Estimation with Pretrained Foundation Models. arXiv preprint, arXiv:2307.06661.