Merge branch 'main' into feature/remove-carb-utils

ooctipus · web-flow · commit 5b0094265d37 · 2025-11-05T15:27:44.000-08:00
diff --git a/.github/workflows/pre-commit.yaml b/.github/workflows/pre-commit.yaml
@@ -15,4 +15,6 @@ jobs:
     steps:
     - uses: actions/checkout@v3
     - uses: actions/setup-python@v3
+      with:
+        python-version: "3.12"
     - uses: pre-commit/action@v3.0.0
diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
@@ -133,6 +133,7 @@ Guidelines for modifications:
 * Stefan Van de Mosselaer
 * Stephan Pleines
 * Tiffany Chen
+* Trushant Adeshara
 * Tyler Lum
 * Victor Khaustov
 * Virgilio Gómez Lambo
diff --git a/docs/source/overview/imitation-learning/teleop_imitation.rst b/docs/source/overview/imitation-learning/teleop_imitation.rst
@@ -292,6 +292,15 @@ Using the Mimic generated data we can now train a state-based BC agent for ``Isa
 Visualizing results
 ^^^^^^^^^^^^^^^^^^^
 
+.. tip::
+
+   **Important: Testing Multiple Checkpoint Epochs**
+
+   When evaluating policy performance, it is common for different training epochs to yield significantly different results.
+   If you don't see the expected performance, **always test policies from various epochs** (not just the final checkpoint)
+   to find the best-performing model. Model performance can vary substantially across training, and the final epoch
+   is not always optimal.
+
 By inferencing using the generated model, we can visualize the results of the policy:
 
 .. tab-set::
@@ -315,6 +324,11 @@ By inferencing using the generated model, we can visualize the results of the po
          --device cpu --enable_cameras --task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0 --num_rollouts 50 \
          --checkpoint /PATH/TO/desired_model_checkpoint.pth
 
+.. tip::
+
+   **If you don't see expected performance results:** Test policies from multiple checkpoint epochs, not just the final one.
+   Policy performance can vary significantly across training epochs, and intermediate checkpoints often outperform the final model.
+
 .. note::
 
    **Expected Success Rates and Timings for Franka Cube Stack Task**
@@ -323,6 +337,7 @@ By inferencing using the generated model, we can visualize the results of the po
    * Data generation time: ~30 mins for state, ~4 hours for visuomotor (varies based on num envs the user runs)
    * BC RNN training time: 1000 epochs + ~30 mins (for state), 600 epochs + ~6 hours (for visuomotor)
    * BC RNN policy success rate: ~40-60% (for both state + visuomotor)
+   * **Recommendation:** Evaluate checkpoints from various epochs throughout training to identify the best-performing model
 
 
 Demo 1: Data Generation and Policy Training for a Humanoid Robot
@@ -513,6 +528,11 @@ Visualize the results of the trained policy by running the following command, us
 .. note::
    Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
 
+.. tip::
+
+   **If you don't see expected performance results:** It is critical to test policies from various checkpoint epochs.
+   Performance can vary significantly between epochs, and the best-performing checkpoint is often not the final one.
+
 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/gr-1_steering_wheel_pick_place_policy.gif
    :width: 100%
    :align: center
@@ -528,7 +548,7 @@ Visualize the results of the trained policy by running the following command, us
    * Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
    * Data generation success for this task is typically 65-80% over 1000 demonstrations, taking 18-40 minutes depending on GPU hardware and success rate (19 minutes on a RTX ADA 6000 @ 80% success rate).
    * Behavior Cloning (BC) policy success is typically 75-86% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 29 minutes on a RTX ADA 6000.
-   * Recommendation: Train for 2000 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 1500th and 2000th epochs to select the best-performing policy.
+   * **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
 
 
 Demo 2: Data Generation and Policy Training for Humanoid Robot Locomanipulation with Unitree G1
@@ -642,6 +662,11 @@ Visualize the trained policy performance:
 .. note::
    Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
 
+.. tip::
+
+   **If you don't see expected performance results:** Always test policies from various checkpoint epochs.
+   Different epochs can produce significantly different results, so evaluate multiple checkpoints to find the optimal model.
+
 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/locomanipulation-g-1_steering_wheel_pick_place.gif
    :width: 100%
    :align: center
@@ -657,7 +682,7 @@ Visualize the trained policy performance:
    * Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
    * Data generation success for this task is typically 65-82% over 1000 demonstrations, taking 18-40 minutes depending on GPU hardware and success rate (18 minutes on a RTX ADA 6000 @ 82% success rate).
    * Behavior Cloning (BC) policy success is typically 75-85% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 2000 epochs (default), depending on demonstration quality. Training takes approximately 40 minutes on a RTX ADA 6000.
-   * Recommendation: Train for 2000 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 1500th and 2000th epochs to select the best-performing policy.
+   * **Recommendation:** Train for 2000 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 1000th and 2000th epochs** to select the best-performing policy. Testing various epochs is essential for finding optimal performance.
 
 Generate the dataset with manipulation and point-to-point navigation
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -851,6 +876,11 @@ Visualize the results of the trained policy by running the following command, us
 .. note::
    Change the ``NORM_FACTOR`` in the above command with the values generated in the training step.
 
+.. tip::
+
+   **If you don't see expected performance results:** Test policies from various checkpoint epochs, not just the final one.
+   Policy performance can vary substantially across training, and intermediate checkpoints often yield better results.
+
 .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/gr-1_nut_pouring_policy.gif
    :width: 100%
    :align: center
@@ -866,7 +896,7 @@ Visualize the results of the trained policy by running the following command, us
    * Success rate for data generation depends on the quality of human demonstrations (how well the user performs them) and dataset annotation quality. Both data generation and downstream policy success are sensitive to these factors and can show high variance. See :ref:`Common Pitfalls when Generating Data <common-pitfalls-generating-data>` for tips to improve your dataset.
    * Data generation for 1000 demonstrations takes approximately 10 hours on a RTX ADA 6000.
    * Behavior Cloning (BC) policy success is typically 50-60% (evaluated on 50 rollouts) when trained on 1000 generated demonstrations for 600 epochs (default). Training takes approximately 15 hours on a RTX ADA 6000.
-   * Recommendation: Train for 600 epochs with 1000 generated demonstrations, and evaluate multiple checkpoints saved between the 300th and 600th epochs to select the best-performing policy.
+   * **Recommendation:** Train for 600 epochs with 1000 generated demonstrations, and **evaluate multiple checkpoints saved between the 300th and 600th epochs** to select the best-performing policy. Testing various epochs is critical for achieving optimal performance.
 
 .. _common-pitfalls-generating-data:
 
diff --git a/scripts/reinforcement_learning/rsl_rl/play.py b/scripts/reinforcement_learning/rsl_rl/play.py
@@ -185,7 +185,9 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
             # agent stepping
             actions = policy(obs)
             # env stepping
-            obs, _, _, _ = env.step(actions)
+            obs, _, dones, _ = env.step(actions)
+            # reset recurrent states for episodes that have terminated
+            policy_nn.reset(dones)
         if args_cli.video:
             timestep += 1
             # Exit the play loop after recording one video
diff --git a/source/isaaclab/config/extension.toml b/source/isaaclab/config/extension.toml
@@ -1,7 +1,7 @@
 [package]
 
 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.47.5"
+version = "0.47.7"
 
 # Description
 title = "Isaac Lab framework for Robot Learning"
diff --git a/source/isaaclab/docs/CHANGELOG.rst b/source/isaaclab/docs/CHANGELOG.rst
@@ -1,6 +1,25 @@
 Changelog
 ---------
 
+0.47.7 (2025-10-31)
+~~~~~~~~~~~~~~~~~~~
+
+Changed
+^^^^^^^
+
+* Changed Pink IK controller qpsolver from osqp to daqp.
+* Changed Null Space matrix computation in Pink IK's Null Space Posture Task to a faster matrix pseudo inverse computation.
+
+
+0.47.6 (2025-11-01)
+~~~~~~~~~~~~~~~~~~~~
+
+Fixed
+^^^^^
+
+* Fixed an issue in recurrent policy evaluation in RSL-RL framework where the recurrent state was not reset after an episode termination.
+
+
 0.47.5 (2025-10-30)
 ~~~~~~~~~~~~~~~~~~~
 
diff --git a/source/isaaclab/isaaclab/controllers/pink_ik/null_space_posture_task.py b/source/isaaclab/isaaclab/controllers/pink_ik/null_space_posture_task.py
@@ -4,6 +4,8 @@
 # SPDX-License-Identifier: BSD-3-Clause
 
 import numpy as np
+import scipy.linalg.blas as blas
+import scipy.linalg.lapack as lapack
 
 import pinocchio as pin
 from pink.configuration import Configuration
@@ -75,6 +77,9 @@ class NullSpacePostureTask(Task):
 
     """
 
+    # Regularization factor for pseudoinverse computation to ensure numerical stability
+    PSEUDOINVERSE_DAMPING_FACTOR: float = 1e-9
+
     def __init__(
         self,
         cost: float,
@@ -237,6 +242,30 @@ def compute_jacobian(self, configuration: Configuration) -> np.ndarray:
         J_combined = np.concatenate(J_frame_tasks, axis=0)
 
         # Compute null space projector: N = I - J^+ * J
-        N_combined = np.eye(J_combined.shape[1]) - np.linalg.pinv(J_combined) @ J_combined
+        # Use fast pseudoinverse computation with direct LAPACK/BLAS calls
+        m, n = J_combined.shape
+
+        # Wide matrix (typical for robotics): use left pseudoinverse
+        # J^+ = J^T @ inv(J @ J^T + λ²I)
+        # This is faster because we invert an m×m matrix instead of n×n
+
+        # Compute J @ J^T using BLAS (faster than numpy)
+        JJT = blas.dgemm(1.0, J_combined, J_combined.T)
+        np.fill_diagonal(JJT, JJT.diagonal() + self.PSEUDOINVERSE_DAMPING_FACTOR**2)
+
+        # Use LAPACK's Cholesky factorization (dpotrf = Positive definite TRiangular Factorization)
+        L, info = lapack.dpotrf(JJT, lower=1, clean=False, overwrite_a=True)
+
+        if info != 0:
+            # Fallback if not positive definite: use numpy's pseudoinverse
+            J_pinv = np.linalg.pinv(J_combined)
+            return np.eye(n) - J_pinv @ J_combined
+
+        # Solve (J @ J^T + λ²I) @ X = J using LAPACK's triangular solver (dpotrs)
+        # This directly solves the system without computing the full inverse
+        X, _ = lapack.dpotrs(L, J_combined, lower=1)
+
+        # Compute null space projector: N = I - J^T @ X
+        N_combined = np.eye(n) - J_combined.T @ X
 
         return N_combined
diff --git a/source/isaaclab/isaaclab/controllers/pink_ik/pink_ik.py b/source/isaaclab/isaaclab/controllers/pink_ik/pink_ik.py
@@ -221,7 +221,7 @@ def compute(
                 self.pink_configuration,
                 self.cfg.variable_input_tasks + self.cfg.fixed_input_tasks,
                 dt,
-                solver="osqp",
+                solver="daqp",
                 safety_break=self.cfg.fail_on_joint_limit_violation,
             )
             joint_angle_changes = velocity * dt
diff --git a/source/isaaclab/setup.py b/source/isaaclab/setup.py
@@ -52,6 +52,7 @@
 INSTALL_REQUIRES += [
     # required by isaaclab.isaaclab.controllers.pink_ik
     f"pin-pink==3.1.0 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",
+    f"daqp==0.7.2 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",
     # required by isaaclab.devices.openxr.retargeters.humanoid.fourier.gr1_t2_dex_retargeting_utils
     f"dex-retargeting==0.4.6 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS})",
 ]
diff --git a/tools/template/templates/extension/config/extension.toml b/tools/template/templates/extension/config/extension.toml
@@ -25,7 +25,7 @@ keywords = ["extension", "template", "isaaclab"]
 [[python.module]]
 name = "{{ name }}"
 
-[isaaclab_settings]
+[isaac_lab_settings]
 # TODO: Uncomment and list any apt dependencies here.
 #       If none, leave it commented out.
 # apt_deps = ["example_package"]

Original file line number	Diff line number	Diff line change
`@@ -221,7 +221,7 @@ def compute(`
`221`	`221`	`self.pink_configuration,`
`222`	`222`	`self.cfg.variable_input_tasks + self.cfg.fixed_input_tasks,`
`223`	`223`	`dt,`
`224`		`- solver="osqp",`
	`224`	`+ solver="daqp",`
`225`	`225`	`safety_break=self.cfg.fail_on_joint_limit_violation,`
`226`	`226`	`)`
`227`	`227`	`joint_angle_changes = velocity * dt`
Original file line number	Diff line number	Diff line change
`@@ -52,6 +52,7 @@`
`52`	`52`	`INSTALL_REQUIRES += [`
`53`	`53`	`# required by isaaclab.isaaclab.controllers.pink_ik`
`54`	`54`	`f"pin-pink==3.1.0 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",`
	`55`	`+ f"daqp==0.7.2 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS_ARM})",`
`55`	`56`	`# required by isaaclab.devices.openxr.retargeters.humanoid.fourier.gr1_t2_dex_retargeting_utils`
`56`	`57`	`f"dex-retargeting==0.4.6 ; platform_system == 'Linux' and ({SUPPORTED_ARCHS})",`
`57`	`58`	`]`