Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation

Official implementation of TVVE

Yongjie Bai, Zhouxia Wang, Yang Liu, Kaijun Luo, Yifan Wen, Mingtong Dai, Weixing Chen, Ziliang Chen, Lingbo Liu, Guanbin Li, Liang Lin

TVVE learns task-aware virtual viewpoints for robotic manipulation. The method combines a Multi-Viewpoint Exploration Policy (MVEP) with a Task-aware Mixture-of-Experts visual encoder (TaskMoE), improving 3D perception, feature discrimination, and cross-domain generalization on RLBench, RLBench-OG, and real-world robot setups.

RLBench-OG: A Benchmark for Evaluating Robustness and Generalization for Robotic Manipulation

🚀 Go To RLBench-OG

RLBench-OG is an extension benchmark built on top of RLBench to evaluate model robustness under occlusion and generalization to environment perturbations. The benchmark selects ten tasks from RLBench (covering simple and long-horizon tasks) and contains two main components: the Occlusion Suite and the Generalization Suite.

Highlights

Accepted by 🔥CVPR 2026🔥.
Official code release for TVVE.
Training scripts for stage 1 and stage 2/3 optimization.
Evaluation entry points for RLBench and RLBench-OG.
Public release of the RLBench-OG benchmark dataset and codebase.

Project Status

Show more simulation results on RLBench and RLBench-OG
Show more real-world robot results on Dobot and Franka
Release the model and code
Release the RLBench-OG benchmark
Release the RLBench-OG test code

Installation

1. Create a Python environment

conda create -n tvve python=3.8 -y
conda activate tvve
pip install pip==21 setuptools==65.5.0 wheel==0.38.0

2. Clone the repository

git clone https://github.com/HCPLab-SYSU/TAVP.git
cd TAVP

3. Install Python dependencies

pip install -r requirements.txt

4. Install CUDA-dependent extras

Skip this step if your CUDA environment is already configured and compatible.

bash ./cuda_12.3.2_545.23.08_linux.run --silent --toolkit --toolkitpath=$HOME/cuda-12.3
export CUDA_HOME=$HOME/cuda-12.3

Install PyTorch3D

export NVCC_FLAGS="--generate-code arch=compute_80,code=sm_80 --generate-code arch=compute_86,code=sm_86 --generate-code arch=compute_87,code=sm_87 --generate-code arch=compute_89,code=sm_89"
pip install git+https://github.com/facebookresearch/pytorch3d.git@stable

Install xFormers

pip install ninja
export MAX_JOBS=1
export TORCH_CUDA_ARCH_LIST="8.0;8.6;8.7;8.9"
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers

Adjust the CUDA architectures above to match your GPU.

5. Install CoppeliaSim, PyRep, and RLBench

TVVE depends on the RLBench simulation stack.

cd ..
wget https://downloads.coppeliarobotics.com/V4_1_0/CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz
tar -xf CoppeliaSim_Edu_V4_1_0_Ubuntu20_04.tar.xz
mv CoppeliaSim_Edu_V4_1_0_Ubuntu20_04 CoppeliaSim
export COPPELIASIM_ROOT=$PWD/CoppeliaSim
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT

git clone https://github.com/stepjam/PyRep.git
cd PyRep
pip install -r requirements.txt
pip install -e .
cd ..

git clone https://github.com/mlzxy/RLBench.arp.git
cd RLBench.arp
pip install -r requirements.txt
python setup.py develop
cd ../TAVP

If you evaluate on a headless server, make sure the corresponding RLBench and PyRep headless rendering requirements are satisfied.

6. Install the faster point renderer

This is the recommended renderer used by TVVE.

cd ..
git clone https://github.com/NVlabs/RVT.git
cp -rf RVT/rvt/libs/point-renderer ./point-renderer
rm -rf RVT
cd point-renderer
pip install -e .
cd ../TAVP

Then remove the following import from point-renderer/point_renderer/rvt_renderer.py:

from mvt.utils import ForkedPdb

If you do not want to use the C++ renderer, set render_with_cpp=false in the config files.

Data Preparation

RLBench demonstrations

The training scripts expect the compact RLBench dataset format released with ARP.

mkdir -p data
cd data

Download datasets/RLBench.tar from:

https://rutgers.box.com/s/uzozemx67kje58ycy3lyzf1zgddz8tyq

Then extract it:

tar xvf RLBench.tar
rm -f RLBench.tar
cd ..

After extraction, the default layout is expected to look like:

data/
  train/
  test/

RLBench-OG benchmark data

The benchmark resources are released here:

Dataset: https://huggingface.co/datasets/baiyu858/RLBench-OG
Code: https://github.com/baiyu858/rlbench-og.git

Training and Evaluation on RLBench

The provided shell scripts read paths from environment variables. If a variable is not set, the scripts fall back to /path/to/... placeholders.

1. Set runtime paths

export TRAIN_DEMO_FOLDER=/path/to/rlbench/data/train
export EVAL_DATA_FOLDER=/path/to/rlbench/data/test
export STAGE1_INIT_WEIGHTS=/path/to/stage1_checkpoint.pth
export STAGE23_ONLY_IL_WEIGHTS=/path/to/stage23_only_il_checkpoint.pth
export STAGE23_EVAL_WEIGHTS=/path/to/stage23_eval_checkpoint.pth

Depending on your workflow, you may also want to set:

export STAGE23_JOINT_INIT_WEIGHTS=/path/to/stage23_joint_checkpoint.pth
export STAGE23_PPO_IL_WEIGHTS=/path/to/stage23_ppo_il_checkpoint.pth

2. Run training or evaluation

Command	Purpose
`./start_tvve_stage1.sh`	Stage 1 training
`./start_tvve_stage23.sh a 1 op`	Stage 2 training with PPO-only updates
`./start_tvve_stage23.sh a 1 oi`	Stage 3 training with IL-only updates
`./start_tvve_stage23.sh b 1 oi`	RLBench evaluation

3. Notes

a starts training and b starts evaluation in start_tvve_stage23.sh.
Evaluation uses xvfb-run; install the required headless rendering dependencies first.
Logs are written to logs/, and Hydra outputs are written to outputs/.

RLBench-OG Benchmark

1. Install the RLBench-OG environment

mkdir -p env
cd env
git clone https://github.com/baiyu858/rlbench-og.git
cd rlbench-og
pip install -r requirements.txt
pip install -e .
cd ../..

2. Download and extract the dataset

huggingface-cli download baiyu858/RLBench-OG \
    --repo-type dataset \
    --local-dir ./data/RLBench-OG

cd data/RLBench-OG
tar -xvf Occlusion.tar.xz
rm -f Occlusion.tar.xz

cd Generalization
tar -xvf train.tar.xz
rm -f train.tar.xz
tar -xvf test.tar.xz
rm -f test.tar.xz

cd ../../..

3. Occlusion Suite

Use the same training and evaluation commands as RLBench after switching the dataset paths and task lists.

Occlusion1 split

Update env.tasks in both configs/tvve_stage1.yaml and configs/tvve_stage23.yaml:

[
  "basketball_in_hoop_occlusion",
  "scoop_with_spatula_occlusion",
  "take_plate_off_colored_dish_rack_occlusion",
  "water_plants_occlusion",
  "block_pyramid_occlusion",
  "solve_puzzle_occlusion",
  "take_usb_out_of_computer_occlusion",
  "close_drawer_occlusion",
  "straighten_rope_occlusion",
  "toilet_seat_down_occlusion"
]

Set:

train.demo_folder: ./data/RLBench-OG/Occlusion/train
eval.datafolder: ./data/RLBench-OG/Occlusion/test

Occlusion2 split

Update env.tasks in both configs/tvve_stage1.yaml and configs/tvve_stage23.yaml:

[
  "basketball_in_hoop",
  "scoop_with_spatula",
  "take_plate_off_colored_dish_rack",
  "water_plants",
  "block_pyramid",
  "solve_puzzle",
  "take_usb_out_of_computer",
  "close_drawer_occlusion",
  "straighten_rope",
  "toilet_seat_down"
]

Set:

train.demo_folder: ./data/RLBench-OG/Generalization/train
eval.datafolder: ./data/RLBench-OG/Occlusion/test

Then run the same commands from Training and Evaluation on RLBench.

4. Generalization Suite

You can directly evaluate with the STAGE23_EVAL_WEIGHTS checkpoint trained for Occlusion2.

First install the local YARR and PerAct packages:

cd env/YARR
pip install -e .

cd ../peract
pip install -e .

cd ../..

Then edit the placeholders in eval_og.sh:

epoch=<select_which_epoch_to_eval>
model_folder=<path_to_STAGE23_EVAL_WEIGHTS_directory>

Finally run:

bash ./eval_og.sh

Repository Layout

TAVP/
  configs/                experiment configuration files
  env/                    RLBench-OG, PerAct, and YARR integrations
  static/                 website images and videos
  train_tvve_stage1.py    stage 1 training entry
  train_tvve_stage23.py   stage 2/3 training entry
  eval.py                 RLBench evaluation entry
  eval_og.py              RLBench-OG evaluation entry
  start_tvve_stage1.sh    stage 1 launch script
  start_tvve_stage23.sh   stage 2/3 launch script
  eval_og.sh              RLBench-OG batch evaluation script

Citation

If you find TVVE useful in your research, please cite:

@InProceedings{Bai_2026_CVPR,
  author    = {Bai, Yongjie and Wang, Zhouxia and Liu, Yang and Luo, Kaijun and Wen, Yifan and Dai, Mingtong and Chen, Weixing and Chen, Ziliang and Liu, Lingbo and Li, Guanbin and Lin, Liang},
  title     = {Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2026}
}

Acknowledgements

This repository builds on several excellent open-source projects, including RLBench, PyRep, RVT, ARP, PerAct, and YARR. Please also cite their original work if you use this codebase in your research.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
configs		configs
env		env
static		static
utils		utils
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
arp.py		arp.py
dataset.py		dataset.py
eval.py		eval.py
eval_og.py		eval_og.py
eval_og.sh		eval_og.sh
index.html		index.html
preprocess.py		preprocess.py
requirements.txt		requirements.txt
start_tvve_stage1.sh		start_tvve_stage1.sh
start_tvve_stage23.sh		start_tvve_stage23.sh
taskmoe_mvt.py		taskmoe_mvt.py
train_tvve_stage1.py		train_tvve_stage1.py
train_tvve_stage23.py		train_tvve_stage23.py
tvve_stage1.py		tvve_stage1.py
tvve_stage23.py		tvve_stage23.py

Folders and files

Latest commit

History

Repository files navigation

Learning to See and Act: Task-Aware Virtual View Exploration for Robotic Manipulation

RLBench-OG: A Benchmark for Evaluating Robustness and Generalization for Robotic Manipulation

Table of Contents

Highlights

Project Status

Installation

1. Create a Python environment

2. Clone the repository

3. Install Python dependencies

4. Install CUDA-dependent extras

Install PyTorch3D

Install xFormers

5. Install CoppeliaSim, PyRep, and RLBench

6. Install the faster point renderer

Data Preparation

RLBench demonstrations

RLBench-OG benchmark data

Training and Evaluation on RLBench

1. Set runtime paths

2. Run training or evaluation

3. Notes

RLBench-OG Benchmark

1. Install the RLBench-OG environment

2. Download and extract the dataset

3. Occlusion Suite

4. Generalization Suite

Repository Layout

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages