Skip to content

Conversation

@kellyguo11
Copy link
Contributor

Description

Up until now, we have been working under the assumption of keeping consistency between simulation device and the task device. Data returned from simulation will always match the device simulation is running on, which in turn, should also be the environment device and training device.

Recent surface gripper interface allows for running simulation on the GPU, but does not support the full end-to-end GPU pipeline and instead requires data to be copied back onto the CPU for additional processing. To allow for this type of workflow, we have to relax our assumption and allow for having different devices for which simulation runs computation on and which data from simulation is returned on.

This means we can now mix and match between simulation, environment, and training device:

  • simulation device: GPU, enable_cpu_readback: True (simulation data returned on CPU), environment device: CPU, training device: optional GPU/CPU (default follows training config)
  • simulation device: GPU, enable_cpu_readback: False (previous behavior and still the default behavior), environment device: GPU, training device: optional GPU/CPU (default follows training config
  • simulation device: CPU, enable_cpu_readback: ignored, environment device: CPU, training device: optional GPU/CPU (default follows training config)

I'm not 100% convinced we should do this though, it requires a lot of additional code paths and checks for different devices that feels very error prone, and we likely won't need this flexibility in most cases...

Type of change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added enhancement New feature or request isaac-lab Related to Isaac Lab team labels Nov 7, 2025
@kellyguo11 kellyguo11 moved this to In progress in Isaac Lab Nov 7, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR introduces a three-layer device architecture that separates simulation device (where physics runs), environment/task device (where data buffers are allocated), and training device (where RL agent runs). The key addition is the enable_cpu_readback parameter in SimulationCfg that enables running GPU physics while automatically copying data to CPU for tasks that require it (like surface gripper).

Major changes:

  • Added enable_cpu_readback configuration to control PhysX data readback behavior
  • Added device parameter to environment configurations and InteractiveScene for explicit task device control
  • Environment device property now intelligently defaults to CPU when enable_cpu_readback=True
  • Updated RL wrappers (RSL-RL, RL-Games) to handle device transfers between environment and RL agent
  • Modified surface gripper validation to accept GPU simulation with CPU readback enabled
  • Fixed device allocation for environment buffers (reset_buf, episode_length_buf) to use environment device instead of simulation device

Use cases enabled:

  1. GPU physics + CPU readback → CPU task → CPU/GPU training (for surface gripper workflows)
  2. GPU physics → GPU task → CPU/GPU training (existing behavior, remains default)
  3. CPU physics → CPU task → CPU/GPU training (existing behavior)

Confidence Score: 4/5

  • This PR is generally safe to merge with careful testing of the device separation workflows
  • The implementation is well-structured with comprehensive device handling logic across the codebase. The author acknowledges concerns about complexity and error-proneness with multiple device configurations. While the core changes are sound, the added complexity of device management warrants thorough testing, especially for edge cases and device mismatch scenarios
  • Pay close attention to scripts/demos/pick_and_place.py for the multi-environment vectorization changes and verify device handling in all RL wrapper integrations

Important Files Changed

File Analysis

Filename Score Overview
source/isaaclab/isaaclab/sim/simulation_cfg.py 5/5 Adds enable_cpu_readback parameter to SimulationCfg with clear documentation explaining its behavior for GPU/CPU physics
source/isaaclab/isaaclab/sim/simulation_context.py 5/5 Implements _apply_cpu_readback_setting() to configure PhysX readback behavior based on new configuration parameter
source/isaaclab/isaaclab/scene/interactive_scene.py 5/5 Adds device parameter to InteractiveScene constructor to support explicit device specification for scene tensors, enabling separation from simulation device
source/isaaclab/isaaclab/envs/direct_rl_env.py 5/5 Updates device property to intelligently default based on enable_cpu_readback, passes device to scene initialization, fixes reset_buf allocation
source/isaaclab/isaaclab/envs/manager_based_rl_env.py 5/5 Fixes episode_length_buf initialization to compute device from config before environment setup to respect CPU readback settings
source/isaaclab/isaaclab/assets/surface_gripper/surface_gripper.py 5/5 Updates validation to allow GPU simulation with CPU readback, provides clearer error messages with actionable guidance
source/isaaclab_rl/isaaclab_rl/rsl_rl/vecenv_wrapper.py 5/5 Adds rl_device parameter to enable device separation between environment and RL agent, implements device transfers in reset/step/get_observations
source/isaaclab_rl/isaaclab_rl/rl_games/rl_games.py 5/5 Adds device transfer logic in _process_obs() to move observations from environment device to RL device when different

Sequence Diagram

sequenceDiagram
    participant Config as Environment Config
    participant SimCtx as SimulationContext
    participant Scene as InteractiveScene
    participant Env as Environment
    participant Wrapper as RL Wrapper
    participant Agent as RL Agent

    Note over Config: enable_cpu_readback=True<br/>sim.device="cuda:0"<br/>device="cpu"
    
    Config->>SimCtx: Initialize with cfg
    SimCtx->>SimCtx: _apply_cpu_readback_setting()
    Note over SimCtx: Set PhysX suppressReadback=False<br/>(enables GPU→CPU copy)
    
    Config->>Env: Create environment
    Env->>Env: device property resolves to "cpu"
    Note over Env: Falls back to CPU when<br/>enable_cpu_readback=True
    
    Env->>Scene: Initialize scene with device="cpu"
    Note over Scene: env_origins, tensors on CPU
    
    Scene-->>Env: Scene ready
    
    Env->>Wrapper: Wrap with rl_device
    Note over Wrapper: env_device="cpu"<br/>rl_device="cuda:0"
    
    loop Training Step
        Agent->>Wrapper: actions (on cuda:0)
        Wrapper->>Wrapper: actions.to("cpu")
        Wrapper->>Env: step(actions)
        
        Note over Env: Physics on GPU
        Env->>SimCtx: sim.step()
        SimCtx->>SimCtx: PhysX computes on GPU
        SimCtx-->>Env: Data automatically copied to CPU
        
        Env->>Env: Process on CPU device
        Env-->>Wrapper: obs, rewards (on cpu)
        
        Wrapper->>Wrapper: obs.to("cuda:0")<br/>rewards.to("cuda:0")
        Wrapper-->>Agent: obs, rewards (on cuda:0)
    end
Loading

24 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +251 to +263
self.instant_controls[env_idx] = torch.tensor(
[d_cube_robot_x * 5.0, d_cube_robot_y * 5.0, 0.0], device=self.device
)
elif self.go_to_target[env_idx]:
# Effort based proportional controller to track the target position
head_pos_x = self.pick_and_place.data.joint_pos[env_idx, self._x_dof_idx[0]]
head_pos_y = self.pick_and_place.data.joint_pos[env_idx, self._y_dof_idx[0]]
target_pos_x = self.target_pos[env_idx, 0]
target_pos_y = self.target_pos[env_idx, 1]
d_target_robot_x = target_pos_x - head_pos_x
d_target_robot_y = target_pos_y - head_pos_y
self.instant_controls[env_idx] = torch.tensor(
[d_target_robot_x * 5.0, d_target_robot_y * 5.0, 0.0], device=self.device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: creating new tensors inside the for loop is inefficient

Each iteration creates a new tensor on the device. For better performance with vectorized environments, compute all controls using vectorized operations:

Suggested change
self.instant_controls[env_idx] = torch.tensor(
[d_cube_robot_x * 5.0, d_cube_robot_y * 5.0, 0.0], device=self.device
)
elif self.go_to_target[env_idx]:
# Effort based proportional controller to track the target position
head_pos_x = self.pick_and_place.data.joint_pos[env_idx, self._x_dof_idx[0]]
head_pos_y = self.pick_and_place.data.joint_pos[env_idx, self._y_dof_idx[0]]
target_pos_x = self.target_pos[env_idx, 0]
target_pos_y = self.target_pos[env_idx, 1]
d_target_robot_x = target_pos_x - head_pos_x
d_target_robot_y = target_pos_y - head_pos_y
self.instant_controls[env_idx] = torch.tensor(
[d_target_robot_x * 5.0, d_target_robot_y * 5.0, 0.0], device=self.device
# Process all environments with vectorized operations
cube_mask = self.go_to_cube
target_mask = self.go_to_target
# Vectorized cube tracking
if cube_mask.any():
head_pos_x = self.pick_and_place.data.joint_pos[cube_mask, self._x_dof_idx[0]]
head_pos_y = self.pick_and_place.data.joint_pos[cube_mask, self._y_dof_idx[0]]
cube_pos_x = self.cube.data.root_pos_w[cube_mask, 0] - self.scene.env_origins[cube_mask, 0]
cube_pos_y = self.cube.data.root_pos_w[cube_mask, 1] - self.scene.env_origins[cube_mask, 1]
d_cube_robot = torch.stack([
(cube_pos_x - head_pos_x) * 5.0,
(cube_pos_y - head_pos_y) * 5.0,
torch.zeros_like(cube_pos_x)
], dim=-1)
self.instant_controls[cube_mask] = d_cube_robot
# Vectorized target tracking
if target_mask.any():
head_pos_x = self.pick_and_place.data.joint_pos[target_mask, self._x_dof_idx[0]]
head_pos_y = self.pick_and_place.data.joint_pos[target_mask, self._y_dof_idx[0]]
target_pos_x = self.target_pos[target_mask, 0]
target_pos_y = self.target_pos[target_mask, 1]
d_target_robot = torch.stack([
(target_pos_x - head_pos_x) * 5.0,
(target_pos_y - head_pos_y) * 5.0,
torch.zeros_like(target_pos_x)
], dim=-1)
self.instant_controls[target_mask] = d_target_robot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request isaac-lab Related to Isaac Lab team

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

1 participant