Overview of our UniUGG, the first unified framework for spatial understanding and generation. (A) UniUGG supports spatial-level VQA and generates geometrically consistent 3D scenes. (B) Given a reference image, it can creatively generate 3D variations and describe them accurately. (C) UniUGG outperforms baselines in both spatial understanding and generation, with our specially tuned vision encoder excelling in downstream tasks.
🎞️ Demo Video
If you find this project or dataset helpful, please consider citing our paper:
@article{xu2025uniugg,
title={UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding},
author={Xu, Yueming and Zhang, Jiahui and Huang, Ze and Chen, Yurui and Zhou, Yanpeng and Chen Zhenyu and Yuan, Yujie and Xia, Pengxiang and Huang, Guowei and Cai, Xinyue and Qi, Zhongang and Quan, Xingyue and Hao, Jianye and Xu, Hang and Zhang, Li},
year={2025},
journal={arXiv preprint arXiv:2508.11952},
}