Adds enable_cpu_readback option for separation of sim and task device #3974

kellyguo11 · 2025-11-07T22:01:39Z

Description

Up until now, we have been working under the assumption of keeping consistency between simulation device and the task device. Data returned from simulation will always match the device simulation is running on, which in turn, should also be the environment device and training device.

Recent surface gripper interface allows for running simulation on the GPU, but does not support the full end-to-end GPU pipeline and instead requires data to be copied back onto the CPU for additional processing. To allow for this type of workflow, we have to relax our assumption and allow for having different devices for which simulation runs computation on and which data from simulation is returned on.

This means we can now mix and match between simulation, environment, and training device:

simulation device: GPU, enable_cpu_readback: True (simulation data returned on CPU), environment device: CPU, training device: optional GPU/CPU (default follows training config)
simulation device: GPU, enable_cpu_readback: False (previous behavior and still the default behavior), environment device: GPU, training device: optional GPU/CPU (default follows training config
simulation device: CPU, enable_cpu_readback: ignored, environment device: CPU, training device: optional GPU/CPU (default follows training config)

I'm not 100% convinced we should do this though, it requires a lot of additional code paths and checks for different devices that feels very error prone, and we likely won't need this flexibility in most cases...

Type of change

New feature (non-breaking change which adds functionality)
Documentation update

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…enable_cpu_readback for physx

greptile-apps

Greptile Overview

Greptile Summary

This PR introduces a three-layer device architecture that separates simulation device (where physics runs), environment/task device (where data buffers are allocated), and training device (where RL agent runs). The key addition is the enable_cpu_readback parameter in SimulationCfg that enables running GPU physics while automatically copying data to CPU for tasks that require it (like surface gripper).

Major changes:

Added enable_cpu_readback configuration to control PhysX data readback behavior
Added device parameter to environment configurations and InteractiveScene for explicit task device control
Environment device property now intelligently defaults to CPU when enable_cpu_readback=True
Updated RL wrappers (RSL-RL, RL-Games) to handle device transfers between environment and RL agent
Modified surface gripper validation to accept GPU simulation with CPU readback enabled
Fixed device allocation for environment buffers (reset_buf, episode_length_buf) to use environment device instead of simulation device

Use cases enabled:

GPU physics + CPU readback → CPU task → CPU/GPU training (for surface gripper workflows)
GPU physics → GPU task → CPU/GPU training (existing behavior, remains default)
CPU physics → CPU task → CPU/GPU training (existing behavior)

Confidence Score: 4/5

This PR is generally safe to merge with careful testing of the device separation workflows
The implementation is well-structured with comprehensive device handling logic across the codebase. The author acknowledges concerns about complexity and error-proneness with multiple device configurations. While the core changes are sound, the added complexity of device management warrants thorough testing, especially for edge cases and device mismatch scenarios
Pay close attention to scripts/demos/pick_and_place.py for the multi-environment vectorization changes and verify device handling in all RL wrapper integrations

Important Files Changed

File Analysis

Filename	Score	Overview
source/isaaclab/isaaclab/sim/simulation_cfg.py	5/5	Adds `enable_cpu_readback` parameter to `SimulationCfg` with clear documentation explaining its behavior for GPU/CPU physics
source/isaaclab/isaaclab/sim/simulation_context.py	5/5	Implements `_apply_cpu_readback_setting()` to configure PhysX readback behavior based on new configuration parameter
source/isaaclab/isaaclab/scene/interactive_scene.py	5/5	Adds `device` parameter to `InteractiveScene` constructor to support explicit device specification for scene tensors, enabling separation from simulation device
source/isaaclab/isaaclab/envs/direct_rl_env.py	5/5	Updates `device` property to intelligently default based on `enable_cpu_readback`, passes device to scene initialization, fixes `reset_buf` allocation
source/isaaclab/isaaclab/envs/manager_based_rl_env.py	5/5	Fixes `episode_length_buf` initialization to compute device from config before environment setup to respect CPU readback settings
source/isaaclab/isaaclab/assets/surface_gripper/surface_gripper.py	5/5	Updates validation to allow GPU simulation with CPU readback, provides clearer error messages with actionable guidance
source/isaaclab_rl/isaaclab_rl/rsl_rl/vecenv_wrapper.py	5/5	Adds `rl_device` parameter to enable device separation between environment and RL agent, implements device transfers in reset/step/get_observations
source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py	5/5	Adds device transfer logic in `_process_obs()` to move observations from environment device to RL device when different

Sequence Diagram

sequenceDiagram
    participant Config as Environment Config
    participant SimCtx as SimulationContext
    participant Scene as InteractiveScene
    participant Env as Environment
    participant Wrapper as RL Wrapper
    participant Agent as RL Agent

    Note over Config: enable_cpu_readback=True<br/>sim.device="cuda:0"<br/>device="cpu"
    
    Config->>SimCtx: Initialize with cfg
    SimCtx->>SimCtx: _apply_cpu_readback_setting()
    Note over SimCtx: Set PhysX suppressReadback=False<br/>(enables GPU→CPU copy)
    
    Config->>Env: Create environment
    Env->>Env: device property resolves to "cpu"
    Note over Env: Falls back to CPU when<br/>enable_cpu_readback=True
    
    Env->>Scene: Initialize scene with device="cpu"
    Note over Scene: env_origins, tensors on CPU
    
    Scene-->>Env: Scene ready
    
    Env->>Wrapper: Wrap with rl_device
    Note over Wrapper: env_device="cpu"<br/>rl_device="cuda:0"
    
    loop Training Step
        Agent->>Wrapper: actions (on cuda:0)
        Wrapper->>Wrapper: actions.to("cpu")
        Wrapper->>Env: step(actions)
        
        Note over Env: Physics on GPU
        Env->>SimCtx: sim.step()
        SimCtx->>SimCtx: PhysX computes on GPU
        SimCtx-->>Env: Data automatically copied to CPU
        
        Env->>Env: Process on CPU device
        Env-->>Wrapper: obs, rewards (on cpu)
        
        Wrapper->>Wrapper: obs.to("cuda:0")<br/>rewards.to("cuda:0")
        Wrapper-->>Agent: obs, rewards (on cuda:0)
    end

_{24 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-07T22:05:30Z

scripts/demos/pick_and_place.py

+                self.instant_controls[env_idx] = torch.tensor(
+                    [d_cube_robot_x * 5.0, d_cube_robot_y * 5.0, 0.0], device=self.device
+                )
+            elif self.go_to_target[env_idx]:
+                # Effort based proportional controller to track the target position
+                head_pos_x = self.pick_and_place.data.joint_pos[env_idx, self._x_dof_idx[0]]
+                head_pos_y = self.pick_and_place.data.joint_pos[env_idx, self._y_dof_idx[0]]
+                target_pos_x = self.target_pos[env_idx, 0]
+                target_pos_y = self.target_pos[env_idx, 1]
+                d_target_robot_x = target_pos_x - head_pos_x
+                d_target_robot_y = target_pos_y - head_pos_y
+                self.instant_controls[env_idx] = torch.tensor(
+                    [d_target_robot_x * 5.0, d_target_robot_y * 5.0, 0.0], device=self.device


style: creating new tensors inside the for loop is inefficient

Each iteration creates a new tensor on the device. For better performance with vectorized environments, compute all controls using vectorized operations:

Suggested change

self.instant_controls[env_idx] = torch.tensor(

[d_cube_robot_x * 5.0, d_cube_robot_y * 5.0, 0.0], device=self.device

)

elif self.go_to_target[env_idx]:

# Effort based proportional controller to track the target position

head_pos_x = self.pick_and_place.data.joint_pos[env_idx, self._x_dof_idx[0]]

head_pos_y = self.pick_and_place.data.joint_pos[env_idx, self._y_dof_idx[0]]

target_pos_x = self.target_pos[env_idx, 0]

target_pos_y = self.target_pos[env_idx, 1]

d_target_robot_x = target_pos_x - head_pos_x

d_target_robot_y = target_pos_y - head_pos_y

self.instant_controls[env_idx] = torch.tensor(

[d_target_robot_x * 5.0, d_target_robot_y * 5.0, 0.0], device=self.device

# Process all environments with vectorized operations

cube_mask = self.go_to_cube

target_mask = self.go_to_target

# Vectorized cube tracking

if cube_mask.any():

head_pos_x = self.pick_and_place.data.joint_pos[cube_mask, self._x_dof_idx[0]]

head_pos_y = self.pick_and_place.data.joint_pos[cube_mask, self._y_dof_idx[0]]

cube_pos_x = self.cube.data.root_pos_w[cube_mask, 0] - self.scene.env_origins[cube_mask, 0]

cube_pos_y = self.cube.data.root_pos_w[cube_mask, 1] - self.scene.env_origins[cube_mask, 1]

d_cube_robot = torch.stack([

(cube_pos_x - head_pos_x) * 5.0,

(cube_pos_y - head_pos_y) * 5.0,

torch.zeros_like(cube_pos_x)

], dim=-1)

self.instant_controls[cube_mask] = d_cube_robot

# Vectorized target tracking

if target_mask.any():

head_pos_x = self.pick_and_place.data.joint_pos[target_mask, self._x_dof_idx[0]]

head_pos_y = self.pick_and_place.data.joint_pos[target_mask, self._y_dof_idx[0]]

target_pos_x = self.target_pos[target_mask, 0]

target_pos_y = self.target_pos[target_mask, 1]

d_target_robot = torch.stack([

(target_pos_x - head_pos_x) * 5.0,

(target_pos_y - head_pos_y) * 5.0,

torch.zeros_like(target_pos_x)

], dim=-1)

self.instant_controls[target_mask] = d_target_robot

kellyguo11 added 4 commits November 7, 2025 12:59

Adds setting to decouple simulation and task device with addition of …

d98753f

…enable_cpu_readback for physx

update pick and place example to support multi-env with GPU sim backend

a7ce941

more fixes

847ff01

updates and fixes

a126ee7

kellyguo11 requested review from ClemensSchwarke, Mayankm96, Toni-SM, jtigue-bdai, matthewtrepte, ooctipus and ossamaAhmed as code owners November 7, 2025 22:01

kellyguo11 added this to Isaac Lab Nov 7, 2025

github-actions bot added enhancement New feature or request isaac-lab Related to Isaac Lab team labels Nov 7, 2025

kellyguo11 moved this to In progress in Isaac Lab Nov 7, 2025

greptile-apps bot reviewed Nov 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adds enable_cpu_readback option for separation of sim and task device #3974

Adds enable_cpu_readback option for separation of sim and task device #3974

kellyguo11 commented Nov 7, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-                self.instant_controls[env_idx] = torch.tensor(
-                    [d_cube_robot_x * 5.0, d_cube_robot_y * 5.0, 0.0], device=self.device
-                )
-            elif self.go_to_target[env_idx]:
-                # Effort based proportional controller to track the target position
-                head_pos_x = self.pick_and_place.data.joint_pos[env_idx, self._x_dof_idx[0]]
-                head_pos_y = self.pick_and_place.data.joint_pos[env_idx, self._y_dof_idx[0]]
-                target_pos_x = self.target_pos[env_idx, 0]
-                target_pos_y = self.target_pos[env_idx, 1]
-                d_target_robot_x = target_pos_x - head_pos_x
-                d_target_robot_y = target_pos_y - head_pos_y
-                self.instant_controls[env_idx] = torch.tensor(
-                    [d_target_robot_x * 5.0, d_target_robot_y * 5.0, 0.0], device=self.device
+        # Process all environments with vectorized operations
+        cube_mask = self.go_to_cube
+        target_mask = self.go_to_target
+        # Vectorized cube tracking
+        if cube_mask.any():
+            head_pos_x = self.pick_and_place.data.joint_pos[cube_mask, self._x_dof_idx[0]]
+            head_pos_y = self.pick_and_place.data.joint_pos[cube_mask, self._y_dof_idx[0]]
+            cube_pos_x = self.cube.data.root_pos_w[cube_mask, 0] - self.scene.env_origins[cube_mask, 0]
+            cube_pos_y = self.cube.data.root_pos_w[cube_mask, 1] - self.scene.env_origins[cube_mask, 1]
+            d_cube_robot = torch.stack([
+                (cube_pos_x - head_pos_x) * 5.0,
+                (cube_pos_y - head_pos_y) * 5.0,
+                torch.zeros_like(cube_pos_x)
+            ], dim=-1)
+            self.instant_controls[cube_mask] = d_cube_robot
+        # Vectorized target tracking
+        if target_mask.any():
+            head_pos_x = self.pick_and_place.data.joint_pos[target_mask, self._x_dof_idx[0]]
+            head_pos_y = self.pick_and_place.data.joint_pos[target_mask, self._y_dof_idx[0]]
+            target_pos_x = self.target_pos[target_mask, 0]
+            target_pos_y = self.target_pos[target_mask, 1]
+            d_target_robot = torch.stack([
+                (target_pos_x - head_pos_x) * 5.0,
+                (target_pos_y - head_pos_y) * 5.0,
+                torch.zeros_like(target_pos_x)
+            ], dim=-1)
+            self.instant_controls[target_mask] = d_target_robot

Adds enable_cpu_readback option for separation of sim and task device #3974

Are you sure you want to change the base?

Adds enable_cpu_readback option for separation of sim and task device #3974

Conversation

kellyguo11 commented Nov 7, 2025

Description

Type of change

Checklist

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant