V-JEPA2-AC in Isaac Lab / Omniverse: Franka arm stops above cube without grasping

Dear all,

I am trying to integrate **V-JEPA2-AC** into an **Isaac Lab / Omniverse** manipulation simulation with a Franka arm and cubes.

The robot uses V-JEPA2-AC as the only action source. There are no scripted pick/place phases.

## Setup

**Simulator**: Isaac Lab / Omniverse
**Robot**: Franka Panda
**Task**: Isaac-Stack-Cube-Franka-IK-Rel-Blueprint-v0
**Control mode**: IK relative action task
**Camera**: external RGB camera, droid_exterior_image_1_left
**Policy source**: V-JEPA2-AC only

### Hyperparameters

```yaml
rollout: 2
samples: 64
topk: 5
cem_steps: 10
momentum_mean: 0.15
momentum_mean_gripper: 0.15
momentum_std: 0.25
momentum_std_gripper: 0.15
maxnorm: 0.075
verbose: false
```

### Goal

<img width="256" height="256" alt="Image" src="https://github.com/user-attachments/assets/fadf223b-c70c-41fa-b6d2-49600b84853a" />

### Simulation screenshot

<img width="256" height="256" alt="Image" src="https://github.com/user-attachments/assets/ef2280c2-c9de-456c-959c-1505f65eeecb" />

## Issue

The gripper moves toward the cube, but consistently stops a few centimeters above it. It does not descend far enough to make contact or grasp the cube.

I tried using:

1. A target image where the cube has been moved
2. A target image where the gripper is already holding the cube.

In both cases, the behavior is similar: the robot approaches the cube but does not complete the final descent / grasp.

## Expected Behavior

Given a target image where the cube has been moved, I would expect the model/controller to produce actions that:

1. Move the gripper above the cube,
2. Descend to the cube,
3. Close the gripper,
4. Lift or move the cube toward the target state.

## Actual behavior

The arm moves toward the cube, but the end-effector hovers slightly above or next to it and does not complete the grasp.

The failure mode is very consistent: the approach motion works, but the final downward/contact/grasp phase does not happen. Additionally, the gripper does not open and close consistently.


## What I tried

* Increasing rollout up to `4`.
* Increasing number of samples up to `256`.
* Increasing number of CEM steps up to `30`

## Current integration

### Inference

I call V-JEPA 2-AC as follows:

```python
...
with torch.no_grad():
    z_current = self.encode_single_frame(current_rgb)

    s = torch.tensor(
        ee_state7.reshape(1, 1, 7),
        dtype=torch.float32,
        device=self.device,
    )

    action = self.world_model.infer_next_action(
        z_current,
        s,
        self.z_goal,
    )

    action_np = action.detach().float().cpu().numpy().reshape(-1)
```

The returned action is mapped directly to the Isaac Lab action:

```python
actions[env_id, 0:3] = pos_gain * vjepa_action.delta_xyz
actions[env_id, 3:6] = rot_gain * vjepa_action.delta_rpy
actions[env_id, -1]  = gripper_gain * vjepa_action.gripper
```
with:

```python
pos_gain = 1.0
rot_gain = 1.0
gripper_gain = 1.0
```

### Minimal controller code

```python
@dataclass
class VJepaAction:
    delta_xyz: np.ndarray
    delta_rpy: np.ndarray
    gripper: float

class VJEPAMPC:
    def __init__(self, ...):
        ...

    def encode_single_frame(self, rgb: np.ndarray):
        ...

    def infer_action(
        self,
        current_rgb: np.ndarray,
        ee_state7: np.ndarray,
    ) -> VJepaAction:
        if not self.available:
            raise RuntimeError("V-JEPA2-AC is unavailable.")

        current_rgb = to_uint8_rgb(current_rgb)

        with torch.no_grad():
            ...

        return parse_vjepa_action(action_np)

class VJepaIsaacLabActionController:
    def __init__(self, ...):
        ...

    def _make_empty_actions(self):
        ...

    def _get_camera_rgb_batch(self):
        ...

    def _get_ee_state7_batch(self):
        ...

    def forward(self):
        if not self.vjepa.available:
            raise RuntimeError("V-JEPA2-AC is unavailable.")

        rgb_batch = self._get_camera_rgb_batch()
        ee_state7_batch = self._get_ee_state7_batch()

        actions = self._make_empty_actions()

        for env_id in range(self.num_envs):
            vjepa_action = self.vjepa.infer_action(
                current_rgb=rgb_batch[env_id],
                ee_state7=ee_state7_batch[env_id],
            )

            delta_xyz = np.asarray(
                vjepa_action.delta_xyz,
                dtype=np.float32,
            ).reshape(3)

            delta_rpy = np.asarray(
                vjepa_action.delta_rpy,
                dtype=np.float32,
            ).reshape(3)

            gripper = float(vjepa_action.gripper)

            actions[env_id, 0:3] = torch.as_tensor(
                self.pos_gain * delta_xyz,
                device=self.device,
                dtype=torch.float32,
            )

            actions[env_id, 3:6] = torch.as_tensor(
                self.rot_gain * delta_rpy,
                device=self.device,
                dtype=torch.float32,
            )

            actions[env_id, -1] = self.gripper_gain * gripper

        return actions
```

## Questions

* Could this behavior be caused by unsuitable inference/planning hyperparameters (e.g. planning horizon, number of candidate actions, etc.)?
* Could this behavior be caused by an integration convention mismatch?
* Is this expected out-of-distribution behavior? If so, would you recommend post-training the model?
* I also tried a target image where the gripper is already holding the cube. Is this a supported / recommended type of target for V-JEPA2-AC? Or should the target image show only the final object configuration, without requiring the model to infer grasp contact from the image?
* Does V-JEPA2-AC require a specific camera viewpoint, and are the original DROID camera conventions important for successful grasping?

Thank you very much in advance for you help and insights!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V-JEPA2-AC in Isaac Lab / Omniverse: Franka arm stops above cube without grasping #169

Setup

Hyperparameters

Goal

Simulation screenshot

Issue

Expected Behavior

Actual behavior

What I tried

Current integration

Inference

Minimal controller code

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

V-JEPA2-AC in Isaac Lab / Omniverse: Franka arm stops above cube without grasping #169

Description

Setup

Hyperparameters

Goal

Simulation screenshot

Issue

Expected Behavior

Actual behavior

What I tried

Current integration

Inference

Minimal controller code

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions