Skip to content

Commit 5b00942

Browse files
authored
Merge branch 'main' into feature/remove-carb-utils
2 parents 2dec8c3 + 64a97f2 commit 5b00942

File tree

10 files changed

+92
-8
lines changed

10 files changed

+92
-8
lines changed

.github/workflows/pre-commit.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,6 @@ jobs:
1515
steps:
1616
- uses: actions/checkout@v3
1717
- uses: actions/setup-python@v3
18+
with:
19+
python-version: "3.12"
1820
- uses: pre-commit/[email protected]

CONTRIBUTORS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ Guidelines for modifications:
133133
* Stefan Van de Mosselaer
134134
* Stephan Pleines
135135
* Tiffany Chen
136+
* Trushant Adeshara
136137
* Tyler Lum
137138
* Victor Khaustov
138139
* Virgilio Gómez Lambo

docs/source/overview/imitation-learning/teleop_imitation.rst

Lines changed: 33 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,15 @@ Using the Mimic generated data we can now train a state-based BC agent for ``Isa
292292
Visualizing results
293293
^^^^^^^^^^^^^^^^^^^
294294

295+
.. tip::
296+
297+
**Important: Testing Multiple Checkpoint Epochs**
298+
299+
When evaluating policy performance, it is common for different training epochs to yield significantly different results.
300+
If you don't see the expected performance, **always test policies from various epochs** (not just the final checkpoint)
301+
to find the best-performing model. Model performance can vary substantially across training, and the final epoch
302+
is not always optimal.
303+
295304
By inferencing using the generated model, we can visualize the results of the policy:
296305

297306
.. tab-set::
@@ -315,6 +324,11 @@ By inferencing using the generated model, we can visualize the results of the po
315324
--device cpu --enable_cameras --task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0 --num_rollouts 50 \
316325
--checkpoint /PATH/TO/desired_model_checkpoint.pth
317326
327+
.. tip::
328+
329+
**If you don't see expected performance results:** Test policies from multiple checkpoint epochs, not just the final one.
330+
Policy performance can vary significantly across training epochs, and intermediate checkpoints often outperform the final model.
331+
318332
.. note::
319333

320334
**Expected Success Rates and Timings for Franka Cube Stack Task**
@@ -323,6 +337,7 @@ By inferencing using the generated model, we can visualize the results of the po
323337
* Data generation time: ~30 mins for state, ~4 hours for visuomotor (varies based on num envs the user runs)
324338
* BC RNN training time: 1000 epochs + ~30 mins (for state), 600 epochs + ~6 hours (for visuomotor)
325339
* BC RNN policy success rate: ~40-60% (for both state + visuomotor)
340+
* **Recommendation:** Evaluate checkpoints from various epochs throughout training to identify the best-performing model
326341

327342

328343
Demo 1: Data Generation and Policy Training for a Humanoid Robot
@@ -513,6 +528,11 @@ Visualize the results of the trained policy by running the following command, us
513528
.. note::
514529
Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
515530

531+
.. tip::
532+
533+
**If you don't see expected performance results:** It is critical to test policies from various checkpoint epochs.
534+
Performance can vary significantly between epochs, and the best-performing checkpoint is often not the final one.
535+
516536
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/gr-1_steering_wheel_pick_place_policy.gif
517537
:width: 100%
518538
:align: center
@@ -528,7 +548,7 @@ Visualize the results of the trained policy by running the following command, us
528548
* Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
529549
* Data generation success for this task is typically 65-80% over 1000 demonstrations, taking 18-40 minutes depending on GPU hardware and success rate (19 minutes on a RTX ADA 6000 @ 80% success rate).
530550
* Behavior Cloning (BC) policy success is typically 75-86% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 29 minutes on a RTX ADA 6000.
531-
* Recommendation: Train for 2000 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 1500th and 2000th epochs to select the best-performing policy.
551+
* **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
532552

533553

534554
Demo 2: Data Generation and Policy Training for Humanoid Robot Locomanipulation with Unitree G1
@@ -642,6 +662,11 @@ Visualize the trained policy performance:
642662
.. note::
643663
Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
644664

665+
.. tip::
666+
667+
**If you don't see expected performance results:** Always test policies from various checkpoint epochs.
668+
Different epochs can produce significantly different results, so evaluate multiple checkpoints to find the optimal model.
669+
645670
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/locomanipulation-g-1_steering_wheel_pick_place.gif
646671
:width: 100%
647672
:align: center
@@ -657,7 +682,7 @@ Visualize the trained policy performance:
657682
* Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
658683
* Data generation success for this task is typically 65-82% over 1000 demonstrations, taking 18-40 minutes depending on GPU hardware and success rate (18 minutes on a RTX ADA 6000 @ 82% success rate).
659684
* Behavior Cloning (BC) policy success is typically 75-85% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 40 minutes on a RTX ADA 6000.
660-
* Recommendation: Train for 2000 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 1500th and 2000th epochs to select the best-performing policy.
685+
* **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
661686

662687
Generate the dataset with manipulation and point-to-point navigation
663688
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -851,6 +876,11 @@ Visualize the results of the trained policy by running the following command, us
851876
.. note::
852877
Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
853878

879+
.. tip::
880+
881+
**If you don't see expected performance results:** Test policies from various checkpoint epochs, not just the final one.
882+
Policy performance can vary substantially across training, and intermediate checkpoints often yield better results.
883+
854884
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/gr-1_nut_pouring_policy.gif
855885
:width: 100%
856886
:align: center
@@ -866,7 +896,7 @@ Visualize the results of the trained policy by running the following command, us
866896
* Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
867897
* Data generation for 1000 demonstrations takes approximately 10 hours on a RTX ADA 6000.
868898
* Behavior Cloning (BC) policy success is typically 50-60% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 600 epochs (default). Training takes approximately 15 hours on a RTX ADA 6000.
869-
* Recommendation: Train for 600 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 300th and 600th epochs to select the best-performing policy.
899+
* **Recommendation:** Train for 600 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 300th and 600th epochs** to select the best-performing policy. Testing various epochs is critical for achieving optimal performance.
870900

871901
.. _common-pitfalls-generating-data:
872902

scripts/reinforcement_learning/rsl_rl/play.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,9 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
185185
# agent stepping
186186
actions = policy(obs)
187187
# env stepping
188-
obs, _, _, _ = env.step(actions)
188+
obs, _, dones, _ = env.step(actions)
189+
# reset recurrent states for episodes that have terminated
190+
policy_nn.reset(dones)
189191
if args_cli.video:
190192
timestep += 1
191193
# Exit the play loop after recording one video

source/isaaclab/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22

33
# Note: Semantic Versioning is used: https://semver.org/
4-
version = "0.47.5"
4+
version = "0.47.7"
55

66
# Description
77
title = "Isaac Lab framework for Robot Learning"

source/isaaclab/docs/CHANGELOG.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,25 @@
11
Changelog
22
---------
33

4+
0.47.7 (2025-10-31)
5+
~~~~~~~~~~~~~~~~~~~
6+
7+
Changed
8+
^^^^^^^
9+
10+
* Changed Pink IK controller qpsolver from osqp to daqp.
11+
* Changed Null Space matrix computation in Pink IK's Null Space Posture Task to a faster matrix pseudo inverse computation.
12+
13+
14+
0.47.6 (2025-11-01)
15+
~~~~~~~~~~~~~~~~~~~~
16+
17+
Fixed
18+
^^^^^
19+
20+
* Fixed an issue in recurrent policy evaluation in RSL-RL framework where the recurrent state was not reset after an episode termination.
21+
22+
423
0.47.5 (2025-10-30)
524
~~~~~~~~~~~~~~~~~~~
625

source/isaaclab/isaaclab/controllers/pink_ik/null_space_posture_task.py

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44
# SPDX-License-Identifier: BSD-3-Clause
55

66
import numpy as np
7+
import scipy.linalg.blas as blas
8+
import scipy.linalg.lapack as lapack
79

810
import pinocchio as pin
911
from pink.configuration import Configuration
@@ -75,6 +77,9 @@ class NullSpacePostureTask(Task):
7577
7678
"""
7779

80+
# Regularization factor for pseudoinverse computation to ensure numerical stability
81+
PSEUDOINVERSE_DAMPING_FACTOR: float = 1e-9
82+
7883
def __init__(
7984
self,
8085
cost: float,
@@ -237,6 +242,30 @@ def compute_jacobian(self, configuration: Configuration) -> np.ndarray:
237242
J_combined = np.concatenate(J_frame_tasks, axis=0)
238243

239244
# Compute null space projector: N = I - J^+ * J
240-
N_combined = np.eye(J_combined.shape[1]) - np.linalg.pinv(J_combined) @ J_combined
245+
# Use fast pseudoinverse computation with direct LAPACK/BLAS calls
246+
m, n = J_combined.shape
247+
248+
# Wide matrix (typical for robotics): use left pseudoinverse
249+
# J^+ = J^T @ inv(J @ J^T + λ²I)
250+
# This is faster because we invert an m×m matrix instead of n×n
251+
252+
# Compute J @ J^T using BLAS (faster than numpy)
253+
JJT = blas.dgemm(1.0, J_combined, J_combined.T)
254+
np.fill_diagonal(JJT, JJT.diagonal() + self.PSEUDOINVERSE_DAMPING_FACTOR**2)
255+
256+
# Use LAPACK's Cholesky factorization (dpotrf = Positive definite TRiangular Factorization)
257+
L, info = lapack.dpotrf(JJT, lower=1, clean=False, overwrite_a=True)
258+
259+
if info != 0:
260+
# Fallback if not positive definite: use numpy's pseudoinverse
261+
J_pinv = np.linalg.pinv(J_combined)
262+
return np.eye(n) - J_pinv @ J_combined
263+
264+
# Solve (J @ J^T + λ²I) @ X = J using LAPACK's triangular solver (dpotrs)
265+
# This directly solves the system without computing the full inverse
266+
X, _ = lapack.dpotrs(L, J_combined, lower=1)
267+
268+
# Compute null space projector: N = I - J^T @ X
269+
N_combined = np.eye(n) - J_combined.T @ X
241270

242271
return N_combined

source/isaaclab/isaaclab/controllers/pink_ik/pink_ik.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ def compute(
221221
self.pink_configuration,
222222
self.cfg.variable_input_tasks + self.cfg.fixed_input_tasks,
223223
dt,
224-
solver="osqp",
224+
solver="daqp",
225225
safety_break=self.cfg.fail_on_joint_limit_violation,
226226
)
227227
joint_angle_changes = velocity * dt

source/isaaclab/setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
INSTALL_REQUIRES += [
5353
# required by isaaclab.isaaclab.controllers.pink_ik
5454
f"pin-pink==3.1.0 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",
55+
f"daqp==0.7.2 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",
5556
# required by isaaclab.devices.openxr.retargeters.humanoid.fourier.gr1_t2_dex_retargeting_utils
5657
f"dex-retargeting==0.4.6 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS})",
5758
]

tools/template/templates/extension/config/extension.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ keywords = ["extension", "template", "isaaclab"]
2525
[[python.module]]
2626
name = "{{ name }}"
2727

28-
[isaaclab_settings]
28+
[isaac_lab_settings]
2929
# TODO: Uncomment and list any apt dependencies here.
3030
# If none, leave it commented out.
3131
# apt_deps = ["example_package"]

0 commit comments

Comments
 (0)