DOT Training was very slow on GoogleColab A100 instance

We migrated to the latest Lerobot and ran the DOT training with my dataset on an A100 instance of Google Colab.
See in:
https://github.com/masato-ka/lerobot/tree/policy/dot_policy
https://huggingface.co/datasets/masato-ka/so100_lego_sort

However, the learning was slower than ACT performed in the same environment. The logs show that data loading and gradient calculation are slower than expected.


ACT on ColabA100
```
NFO 2025-05-02 05:02:17 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:6.786 grdn:154.530 lr:1.0e-05 updt_s:0.066 data_s:0.004 
INFO 2025-05-02 05:02:29 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:3.049 grdn:85.140 lr:1.0e-05 updt_s:0.055 data_s:0.000 
INFO 2025-05-02 05:02:40 ts/train.py:232 step:600 smpl:5K ep:5 epch:0.10 loss:2.572 grdn:75.739 lr:1.0e-05 updt_s:0.056 data_s:0.000
```


DOT on ColabA100
```
NFO 2025-05-02 05:04:38 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:0.205 grdn:2.208 lr:1.0e-04 updt_s:0.111 data_s:0.241
INFO 2025-05-02 05:05:45 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:0.126 grdn:1.835 lr:1.0e-04 updt_s:0.101 data_s:0.235
INFO 2025-05-02 05:06:52 ts/train.py:232 step:600 smpl:5K ep:5 epch:0.10 loss:0.117 grdn:1.733 lr:1.0e-04 updt_s:0.097 data_s:0.238
```

Act on M2 Macbook Air
```
INFO 2025-05-02 13:42:05 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:6.827 grdn:155.095 lr:1.0e-05 updt_s:0.759 data_s:0.037
INFO 2025-05-02 13:45:06 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:3.058 grdn:85.390 lr:1.0e-05 updt_s:0.901 data_s:0.001
```

DOT on M2 Macbook Air 
```
INFO 2025-05-02 13:49:46 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:0.234 grdn:19.501 lr:1.0e-04 updt_s:0.246 data_s:0.040
INFO 2025-05-02 13:50:37 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:0.134 grdn:19.544 lr:1.0e-04 updt_s:0.249 data_s:0.001
```

Unless I am mistaken, the DOT gradient calculation should be faster than the ACT and the data loading should be the same.

Is this a implementation problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOT Training was very slow on GoogleColab A100 instance #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DOT Training was very slow on GoogleColab A100 instance #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions