Skip to content

hapgaoyi/SceneGen

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass

This repository contains the official PyTorch implementation of SceneGen: https://arxiv.org/abs/2508.15769/.

Now the Training, Inference Code and Pretrained Models have all been released! Feel free to reach out for discussions!

๐ŸŒŸ Some Information

Project Page $\cdot$ Paper $\cdot$ Checkpoints

โฉ News

  • [2025.9] Our training code with configs and data processing code are released.
  • [2025.8] The inference code and checkpoints are released.
  • [2025.8] Our pre-print paper has been released on arXiv.

๐Ÿ“ฆ Installation & Pretrained Models

Prerequisites

  • Hardware: An NVIDIA GPU with at least 16GB of memory is necessary. The code has been verified on NVIDIA A100 and RTX 3090 GPUs.
  • Software:
    • The CUDA Toolkit is needed to compile certain submodules. The code has been tested with CUDA versions 12.1.
    • Python version 3.8 or higher is required.

Installation Steps

  1. Clone the repo:

    git clone https://github.com/Mengmouxu/SceneGen.git
    cd SceneGen
  2. Install the dependencies: Create a new conda environment named scenegen and install the dependencies:

    . ./setup.sh --new-env --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast --demo

    The detailed usage of setup.sh can be found by running . ./setup.sh --help.

Pretrained Models

  1. First, create a directory in the SceneGen folder to store the checkpoints:
    mkdir -p checkpoints
  2. Download the pretrained models for SAM2-Hiera-Large and VGGT-1B from SAM2 and VGGT, then place them in the checkpoints directory. (SAM2 installation and its checkpoints are required for interactive generation with segmentation.)
  3. Download our pretrained SceneGen model from here and place it in the checkpoints directory as follows:
    SceneGen/
    โ”œโ”€โ”€ checkpoints/
    โ”‚   โ”œโ”€โ”€ sam2-hiera-large
    โ”‚   โ”œโ”€โ”€ VGGT-1B
    โ”‚   โ””โ”€โ”€ scenegen
    |       โ”œโ”€โ”€ckpts
    |       โ””โ”€โ”€pipeline.json
    โ””โ”€โ”€ ...
    

๐Ÿ’ก Inference

We provide two scripts for inference: inference.py for batch processing and interactive_demo.py for an interactive Gradio demo.

Interactive Demo

This script launches a Gradio web interface for interactive scene generation.

  • Features: It uses SAM2 for interactive image segmentation, allows for adjusting various generation parameters, and supports scene generation from single or multiple images.
  • Usage:
    python interactive_demo.py

    ๐Ÿš€ Quick Start Guide

    ๐Ÿ“ท Step 1: Input & Segment

    1. Upload your scene image.
    2. Use the mouse to draw bounding boxes around objects.
    3. Click "Run Segmentation" to segment objects.

    โ€ป For multi-image generation: maintain consistent object annotation order across all images.

    ๐Ÿ—ƒ๏ธ Step 2: Manage Cache

    1. Click "Add to Cache" when satisfied with the segmentation.
    2. Repeat Step 1-2 for multiple images.
    3. Use "Delete Selected" or "Clear All" to manage cached images.

    ๐ŸŽฎ Step 3: Generate Scene

    1. Adjust generation parameters (optional).
    2. Click "Generate 3D Scene".
    3. Download the generated GLB file when ready.

    ๐Ÿ’ก Pro Tip: Try the examples below to get started quickly!

Interactive_Demo_of_SceneGen.mp4

Click the image above to watch the demo video

Pre-segmented Image Inference

This script processes a directory of pre-segmented images.

  • Input: The input folder structure should be similar to assets/masked_image_test, containing segmented scene images.
  • Visualization: For scenes with ground truth data, you can use the --gradio flag to launch a Gradio interface that visualizes both the ground truth and the generated model. We provide data from the 3D-FUTURE test set as a demonstration.
  • Usage:
    python inference.py --gradio

๐Ÿ“š Dataset

To train and evaluate SceneGen, we use the 3D-FUTURE dataset. Please download and preprocess the dataset as follows:

  1. Download the 3D-FUTURE dataset from here which requires applying for access.
  2. Follow the TRELLIS data processing instructions to preprocess the dataset. Make sure to follow their directory structure for compatibility and fully generate the necessary files and metadata.csv.
  3. Run the dataset_toolkits/build_metadata_scene.py script to create the scene-level metadata file:
    python dataset_toolkits/build_metadata_scene.py 3D-FUTURE 
    --output_dir <path_to_3D-FUTURE> 
    --set <train or test> 
    --vggt_ckpt checkpoints/VGGT-1B --save_mask
    This will generate a metadata_scene.csv file or a metadata_scene_test.csv file in the specified dataset directory.
  4. For evaluation, run the dataset_toolkits/build_scene.sh script to render scene image for each scene(with Blender installed and the configs in the script set correctly):
    bash dataset_toolkits/build_scene.sh
    This will create a scene_test_render folder in the dataset directory containing the rendered images of the test scenes with Blender, which will be further used for evaluation.

๐Ÿ‹๏ธโ€โ™‚๏ธ Training

With the processed 3D-FUTURE dataset and the pretrained ss_flow_img_dit_L_16l8_fp16.safetensors model checkpoint from TRELLIS correctly placed in the checkpoints/scenegen/ckpts directory, you can train SceneGen using the following command:

bash scripts/train.sh

For detailed training configurations, please refer to configs/generation/ss_scenegen_flow_img_train.json and change the parameters as needed.

Evaluation

To be updated soon...

๐Ÿ“œ Citation

If you use this code and data for your research or project, please cite:

@article{meng2025scenegen,
  author    = {Meng, Yanxu and Wu, Haoning and Zhang, Ya and Xie, Weidi},
  title     = {SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass},
  journal   = {arXiv preprint arXiv:2508.15769},
  year      = {2025},
}

TODO

  • Release Paper
  • Release Checkpoints & Inference Code
  • Release Training Code
  • Release Data Processing Code
  • Release Evaluation Code

Acknowledgements

Many thanks to the code bases from TRELLIS, DINOv2, and VGGT.

Contact

If you have any questions, please feel free to contact [email protected] and [email protected].

About

Official repo for paper "SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 67.8%
  • Python 31.8%
  • Shell 0.4%