Official implementation of TVVE
Yongjie Bai, Zhouxia Wang, Yang Liu, Kaijun Luo, Yifan Wen, Mingtong Dai, Weixing Chen, Ziliang Chen, Lingbo Liu, Guanbin Li, Liang Lin
TVVE learns task-aware virtual viewpoints for robotic manipulation. The method combines a Multi-Viewpoint Exploration Policy (MVEP) with a Task-aware Mixture-of-Experts visual encoder (TaskMoE), improving 3D perception, feature discrimination, and cross-domain generalization on RLBench, RLBench-OG, and real-world robot setups.
RLBench-OG is an extension benchmark built on top of RLBench to evaluate model robustness under occlusion and generalization to environment perturbations. The benchmark selects ten tasks from RLBench (covering simple and long-horizon tasks) and contains two main components: the Occlusion Suite and the Generalization Suite.
- Highlights
- Project Status
- Installation
- Data Preparation
- Training and Evaluation on RLBench
- RLBench-OG Benchmark
- Repository Layout
- Citation
- Acknowledgements
- Accepted by 🔥CVPR 2026🔥.
- Official code release for TVVE.
- Training scripts for stage 1 and stage 2/3 optimization.
- Evaluation entry points for RLBench and RLBench-OG.
- Public release of the RLBench-OG benchmark dataset and codebase.
- Show more simulation results on RLBench and RLBench-OG
- Show more real-world robot results on Dobot and Franka
- Release the model and code
- Release the RLBench-OG benchmark
- Release the RLBench-OG test code
conda create -n tvve python=3.8 -y
conda activate tvve
pip install pip==21 setuptools==65.5.0 wheel==0.38.0git clone https://github.com/HCPLab-SYSU/TAVP.git
cd TAVPpip install -r requirements.txtSkip this step if your CUDA environment is already configured and compatible.
bash ./cuda_12.3.2_545.23.08_linux.run --silent --toolkit --toolkitpath=$HOME/cuda-12.3
export CUDA_HOME=$HOME/cuda-12.3export NVCC_FLAGS="--generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_87,code=sm_87 --generate-code arch=compute_89,code=sm_89"
pip install git+https://github.com/facebookresearch/pytorch3d.git@stablepip install ninja
export MAX_JOBS=1
export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.7;8.9"
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformersAdjust the CUDA architectures above to match your GPU.
TVVE depends on the RLBench simulation stack.
cd ..
wget https://downloads.coppeliarobotics.com/V4_1_0/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz
tar -xf CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz
mv CoppeliaSim_Edu_V4_1_0_Ubuntu20_04 CoppeliaSim
export COPPELIASIM_ROOT=$PWD/CoppeliaSim
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOTgit clone https://github.com/stepjam/PyRep.git
cd PyRep
pip install -r requirements.txt
pip install -e .
cd ..git clone https://github.com/mlzxy/RLBench.arp.git
cd RLBench.arp
pip install -r requirements.txt
python setup.py develop
cd ../TAVPIf you evaluate on a headless server, make sure the corresponding RLBench and PyRep headless rendering requirements are satisfied.
This is the recommended renderer used by TVVE.
cd ..
git clone https://github.com/NVlabs/RVT.git
cp -rf RVT/rvt/libs/point-renderer ./point-renderer
rm -rf RVT
cd point-renderer
pip install -e .
cd ../TAVPThen remove the following import from point-renderer/point_renderer/rvt_renderer.py:
from mvt.utils import ForkedPdbIf you do not want to use the C++ renderer, set render_with_cpp=false in the config files.
The training scripts expect the compact RLBench dataset format released with ARP.
mkdir -p data
cd dataDownload datasets/RLBench.tar from:
Then extract it:
tar xvf RLBench.tar
rm -f RLBench.tar
cd ..After extraction, the default layout is expected to look like:
data/
train/
test/
The benchmark resources are released here:
- Dataset: https://huggingface.co/datasets/baiyu858/RLBench-OG
- Code: https://github.com/baiyu858/rlbench-og.git
The provided shell scripts read paths from environment variables. If a variable is not set, the scripts fall back to /path/to/... placeholders.
export TRAIN_DEMO_FOLDER=/path/to/rlbench/data/train
export EVAL_DATA_FOLDER=/path/to/rlbench/data/test
export STAGE1_INIT_WEIGHTS=/path/to/stage1_checkpoint.pth
export STAGE23_ONLY_IL_WEIGHTS=/path/to/stage23_only_il_checkpoint.pth
export STAGE23_EVAL_WEIGHTS=/path/to/stage23_eval_checkpoint.pthDepending on your workflow, you may also want to set:
export STAGE23_JOINT_INIT_WEIGHTS=/path/to/stage23_joint_checkpoint.pth
export STAGE23_PPO_IL_WEIGHTS=/path/to/stage23_ppo_il_checkpoint.pth| Command | Purpose |
|---|---|
./start_tvve_stage1.sh |
Stage 1 training |
./start_tvve_stage23.sh a 1 op |
Stage 2 training with PPO-only updates |
./start_tvve_stage23.sh a 1 oi |
Stage 3 training with IL-only updates |
./start_tvve_stage23.sh b 1 oi |
RLBench evaluation |
astarts training andbstarts evaluation instart_tvve_stage23.sh.- Evaluation uses
xvfb-run; install the required headless rendering dependencies first. - Logs are written to
logs/, and Hydra outputs are written tooutputs/.
mkdir -p env
cd env
git clone https://github.com/baiyu858/rlbench-og.git
cd rlbench-og
pip install -r requirements.txt
pip install -e .
cd ../..huggingface-cli download baiyu858/RLBench-OG \
--repo-type dataset \
--local-dir ./data/RLBench-OG
cd data/RLBench-OG
tar -xvf Occlusion.tar.xz
rm -f Occlusion.tar.xz
cd Generalization
tar -xvf train.tar.xz
rm -f train.tar.xz
tar -xvf test.tar.xz
rm -f test.tar.xz
cd ../../..Use the same training and evaluation commands as RLBench after switching the dataset paths and task lists.
Occlusion1 split
Update env.tasks in both configs/tvve_stage1.yaml and configs/tvve_stage23.yaml:
[
"basketball_in_hoop_occlusion",
"scoop_with_spatula_occlusion",
"take_plate_off_colored_dish_rack_occlusion",
"water_plants_occlusion",
"block_pyramid_occlusion",
"solve_puzzle_occlusion",
"take_usb_out_of_computer_occlusion",
"close_drawer_occlusion",
"straighten_rope_occlusion",
"toilet_seat_down_occlusion"
]Set:
train.demo_folder: ./data/RLBench-OG/Occlusion/train
eval.datafolder: ./data/RLBench-OG/Occlusion/testOcclusion2 split
Update env.tasks in both configs/tvve_stage1.yaml and configs/tvve_stage23.yaml:
[
"basketball_in_hoop",
"scoop_with_spatula",
"take_plate_off_colored_dish_rack",
"water_plants",
"block_pyramid",
"solve_puzzle",
"take_usb_out_of_computer",
"close_drawer_occlusion",
"straighten_rope",
"toilet_seat_down"
]Set:
train.demo_folder: ./data/RLBench-OG/Generalization/train
eval.datafolder: ./data/RLBench-OG/Occlusion/testThen run the same commands from Training and Evaluation on RLBench.
You can directly evaluate with the STAGE23_EVAL_WEIGHTS checkpoint trained for Occlusion2.
First install the local YARR and PerAct packages:
cd env/YARR
pip install -e .
cd ../peract
pip install -e .
cd ../..Then edit the placeholders in eval_og.sh:
epoch=<select_which_epoch_to_eval>model_folder=<path_to_STAGE23_EVAL_WEIGHTS_directory>
Finally run:
bash ./eval_og.shTAVP/
configs/ experiment configuration files
env/ RLBench-OG, PerAct, and YARR integrations
static/ website images and videos
train_tvve_stage1.py stage 1 training entry
train_tvve_stage23.py stage 2/3 training entry
eval.py RLBench evaluation entry
eval_og.py RLBench-OG evaluation entry
start_tvve_stage1.sh stage 1 launch script
start_tvve_stage23.sh stage 2/3 launch script
eval_og.sh RLBench-OG batch evaluation script
If you find TVVE useful in your research, please cite:
@InProceedings{Bai_2026_CVPR,
author = {Bai, Yongjie and Wang, Zhouxia and Liu, Yang and Luo, Kaijun and Wen, Yifan and Dai, Mingtong and Chen, Weixing and Chen, Ziliang and Liu, Lingbo and Li, Guanbin and Lin, Liang},
title = {Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2026}
}This repository builds on several excellent open-source projects, including RLBench, PyRep, RVT, ARP, PerAct, and YARR. Please also cite their original work if you use this codebase in your research.
