-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
Bug Description
1. Context & Objective
I am running a differentiable simulation in Genesis using the Franka Emika Panda robot. My goal is to perform trajectory optimization or policy learning where gradients must flow backward from a loss function through the simulation steps (Forward Kinematics) to the robot's control inputs.
2. The Problem
The differentiable computational graph is broken. When loss.backward() is triggered, the simulation hangs indefinitely or crashes, failing to compute gradients. This indicates that the link between the robot's physical state and the loss function is either severed or causing an infinite loop in the autograd engine.
3. Technical Investigation & Discrepancies
A. Missing API compared to Examples
I attempted to follow the pattern used in examples/differentiable_push.py.
Example Usage: The example utilizes obj1.get_state() to retrieve differentiable tensors.
Current Reality: The RigidLink object (used for the Franka robot parts) does not implement get_state().
B. Broken Gradient Flow via Standard Getters
I attempted to retrieve the state manually using available getters:
Method: Calling obj.get_pos() and obj.get_quat().
Result: These methods appear to return values (likely detached tensors) that are not tracked by the computation graph.
Consequence: The gradient chain is severed immediately; grad_fn is missing from these outputs.
C. Unstable Solver Access
I attempted to bypass the getters and access the state directly via the Rigid solver:
Method: Accessing internal solver state tensors directly. (check get_link_state_differentiable in provide code)
Result: While this retrieves data, invoking loss.backward() on a loss derived from this source causes the simulation to hang.
Steps to Reproduce
import genesis as gs
import torch
import numpy as np
def get_link_state_differentiable(link):
"""
Get link position and quaternion with GRADIENT CHAIN PRESERVED.
"""
solver_state = link._solver.get_state()
pos = solver_state.links_pos[:, link.idx] # (n_envs, 3)
quat = solver_state.links_quat[:, link.idx] # (n_envs, 4)
return pos, quat
def main():
print("Initializing Genesis...")
gs.init(backend=gs.gpu, precision='32')
# 1. Setup Scene
scene = gs.Scene(
sim_options=gs.options.SimOptions(
dt=0.01,
requires_grad=True, # Critical for diff sim
),
rigid_options=gs.options.RigidOptions(
enable_joint_limit=False, # Simplify for debug
enable_collision=False, # Simplify for debug
gravity=(0, 0, 0), # valid FK test
),
show_viewer=False,
)
pad = scene.add_entity(
gs.morphs.Plane(),
)
robot = scene.add_entity(
morph=gs.morphs.MJCF(file="xml/franka_emika_panda/panda_nohand.xml"),
)
scene.build(n_envs=1)
# 2. Setup Inputs and Targets
ee_link = robot.get_link("link7")
# Target Pose (Goal)
# "create any end effector pos as a y ... size would be [1,7] x,y,z qw,qx,qy,qz"
target_pose_np = np.array([[0.5, 0.0, 0.5, 1.0, 0.0, 0.0, 0.0]], dtype=float) # z=0.5
target_pose = gs.tensor(target_pose_np) # (1, 7) # its automatically on GPU by genesis since genesis device is set to gpu
# Initial Joint Position (Input)
# "createa control_point variable [1,7] ... (you can assume its a weight of neural network)"
# Franka has 7 DOFs (panda_nohand), user said "simple 6 dof joint postion matrix" but Franka is 7.
# I will use 7 to match the robot.
q_init_np = np.zeros((1, 7), dtype=float)
# Move it a bit so it's not at zero singularity if needed, though simple FK is fine
q_init_np[0, 1] = -0.5
# "control_point [1,7] will be a tensor with require grad"
# Important: In Genesis, we often need to ensure the tensor is created with requires_grad=True
# OR we enable it after.
control_point = gs.tensor(q_init_np, requires_grad=True) # its automatically on GPU by genesis since genesis device is set to gpu
print("\n=== TENSOR INFO ===")
print(f"control_point: requires_grad={control_point.requires_grad}, shape={control_point.shape}")
print(f"target_pose: requires_grad={target_pose.requires_grad}, shape={target_pose.shape}")
# 3. Simulation Loop (Single Step)
scene.reset()
# The loop
print("\n=== RUNNING SIMULATION ===")
# "once we run fk and get ee_pos for one interation"
# Important: User mentioned "load no finger franka... get ee_pos using this way" -> scene.step() is implicit in simulation
# In differentiable simulators, we usually need to step to propagate kinematics if we are using the physics engine's FK.
# Set robot joint positions
robot.set_qpos(control_point)
# Step simulation to update kinematics
scene.step()
# Get EE pos
ee_pos, ee_quat = get_link_state_differentiable(ee_link)
print(f"ee_pos: requires_grad={ee_pos.requires_grad}, shape={ee_pos.shape}")
# 4. Loss Calculation
# "find ee_distance between the actual_y minus the ee_pos"
target_pos = target_pose[:, :3] # Take first 3 for XYZ
# check require grad of target_pos
print(f"target_pos: requires_grad={target_pos.requires_grad}, shape={target_pos.shape}")
# Euclidean distance
dist = torch.norm(ee_pos - target_pos, dim=1)
loss = dist.mean()
print(f"Loss: {loss.item()}")
print(f"Loss grad_fn: {loss.grad_fn}")
# 5. Backward Pass
print("\n=== BACKWARD PASS ===")
try:
loss.backward()
print("Backward pass completed.")
except Exception as e:
print(f"Backward pass FAILED: {e}")
import traceback
traceback.print_exc()
# 6. Check Gradients
if control_point.grad is not None:
print("\nGradient found!")
print(f"control_point.grad: \n{control_point.grad}")
print(f"Grad norm: {control_point.grad.norm()}")
else:
print("\nGradient is None!")
if __name__ == "__main__":
main()Expected Behavior
Gradients should calculate when loss.backwards is called.
Screenshots/Videos
No response
Relevant log output
.venv) (base) cognitron@cognitron-workstation:~/Projects/AI_Projects/paint-diff$ python scripts/debug/test_fk_grad.py
Initializing Genesis...
[Genesis] [20:47:22] [INFO] ╭─────────────────────────────────────────────────────────────────────────────────────╮
[Genesis] [20:47:22] [INFO] │┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉ Genesis ┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉┈┉│
[Genesis] [20:47:22] [INFO] ╰─────────────────────────────────────────────────────────────────────────────────────╯
[Genesis] [20:47:22] [INFO] Running on [NVIDIA GeForce RTX 5090] with backend gs.cuda. Device memory: 31.33 GB.
[Genesis] [20:47:22] [INFO] 🚀 Genesis initialized. 🔖 version: 0.3.10, 🎨 theme: dark, 🌱 seed: None, 🐛 debug: False, 📏 precision: 32, 🔥 performance: False, 💬 verbose: INFO
[Genesis] [20:47:23] [INFO] Use GsTaichi dynamic array mode while enabling gradient computation is not recommended. Please enable performance mode at init for efficiency, i.e. 'gs.init(..., performance_mode=True)'.
[Genesis] [20:47:23] [INFO] Scene <6bf9c73> created.
[Genesis] [20:47:23] [INFO] Adding <gs.RigidEntity>. idx: 0, uid: <66d303c>, morph: <gs.morphs.Plane>, material: <gs.materials.Rigid>.
[Genesis] [20:47:23] [INFO] Adding <gs.RigidEntity>. idx: 1, uid: <59d6e4d>, morph: <gs.morphs.MJCF(file='/home/cognitron/shahzad/LearningProjects/Genesis/genesis/assets/xml/franka_emika_panda/panda_nohand.xml')>, material: <gs.materials.Rigid>.
[Genesis] [20:47:23] [INFO] Applying offset to base link's pose with user provided value in morph.
[Genesis] [20:47:23] [INFO] Building scene <6bf9c73>...
[Genesis] [20:47:24] [WARNING] Reference robot position exceeds joint limits.
[Genesis] [20:47:24] [INFO] Compiling simulation kernels...
[Genesis] [20:47:24] [INFO] Building visualizer...
=== TENSOR INFO ===
control_point: requires_grad=True, shape=torch.Size([1, 7])
target_pose: requires_grad=False, shape=torch.Size([1, 7])
=== RUNNING SIMULATION ===
ee_pos: requires_grad=True, shape=torch.Size([1, 3])
target_pos: requires_grad=False, shape=torch.Size([1, 3])
Loss: 0.9026259779930115
Loss grad_fn: <AliasBackward0 object at 0x72afe5f094b0>
=== BACKWARD PASS ===
^\Quit (core dumped)Environment
- OS: Ubuntu 24.04
- GPU/CPU : [NVIDIA GeForce RTX 5090]/ Ryzen 9 9950x
- GPU-driver version : (NVIDIA-SMI 580.105.08 Driver Version: 580.105.08 CUDA Version: 13.0 )
- CUDA / CUDA-toolkit version : Build cuda_12.8.r12.8/compiler.35583870_0
Release version or Commit ID
SHA/hash : 23470f4
Additional Context
I tried accesing the Rigid_entity of the link using this function but outcome is same , it just hangs until i terminate the process
def get_link_state_differentiable(link):
"""
Get link position and quaternion with GRADIENT CHAIN PRESERVED.
"""
get_rigid_entity = link._entity
solver_state = get_rigid_entity.get_state()
pos = solver_state.pos
quat = solver_state.quat
return pos, quatAlso Tried following the example and changing rigidOptions and Using Newton Solver
rigid_options=gs.options.RigidOptions(
enable_collision=False,
enable_self_collision=False,
enable_joint_limit=False,
disable_constraint=True,
use_contact_island=False,
use_hibernation=False,
gravity=(0, 0, 0), # valid FK test
constraint_solver=gs.constraint_solver.Newton,
)