We migrated to the latest Lerobot and ran the DOT training with my dataset on an A100 instance of Google Colab.
See in:
https://github.com/masato-ka/lerobot/tree/policy/dot_policy
https://huggingface.co/datasets/masato-ka/so100_lego_sort
However, the learning was slower than ACT performed in the same environment. The logs show that data loading and gradient calculation are slower than expected.
ACT on ColabA100
NFO 2025-05-02 05:02:17 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:6.786 grdn:154.530 lr:1.0e-05 updt_s:0.066 data_s:0.004
INFO 2025-05-02 05:02:29 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:3.049 grdn:85.140 lr:1.0e-05 updt_s:0.055 data_s:0.000
INFO 2025-05-02 05:02:40 ts/train.py:232 step:600 smpl:5K ep:5 epch:0.10 loss:2.572 grdn:75.739 lr:1.0e-05 updt_s:0.056 data_s:0.000
DOT on ColabA100
NFO 2025-05-02 05:04:38 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:0.205 grdn:2.208 lr:1.0e-04 updt_s:0.111 data_s:0.241
INFO 2025-05-02 05:05:45 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:0.126 grdn:1.835 lr:1.0e-04 updt_s:0.101 data_s:0.235
INFO 2025-05-02 05:06:52 ts/train.py:232 step:600 smpl:5K ep:5 epch:0.10 loss:0.117 grdn:1.733 lr:1.0e-04 updt_s:0.097 data_s:0.238
Act on M2 Macbook Air
INFO 2025-05-02 13:42:05 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:6.827 grdn:155.095 lr:1.0e-05 updt_s:0.759 data_s:0.037
INFO 2025-05-02 13:45:06 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:3.058 grdn:85.390 lr:1.0e-05 updt_s:0.901 data_s:0.001
DOT on M2 Macbook Air
INFO 2025-05-02 13:49:46 ts/train.py:232 step:200 smpl:2K ep:2 epch:0.03 loss:0.234 grdn:19.501 lr:1.0e-04 updt_s:0.246 data_s:0.040
INFO 2025-05-02 13:50:37 ts/train.py:232 step:400 smpl:3K ep:4 epch:0.07 loss:0.134 grdn:19.544 lr:1.0e-04 updt_s:0.249 data_s:0.001
Unless I am mistaken, the DOT gradient calculation should be faster than the ACT and the data loading should be the same.
Is this a implementation problem?
We migrated to the latest Lerobot and ran the DOT training with my dataset on an A100 instance of Google Colab.
See in:
https://github.com/masato-ka/lerobot/tree/policy/dot_policy
https://huggingface.co/datasets/masato-ka/so100_lego_sort
However, the learning was slower than ACT performed in the same environment. The logs show that data loading and gradient calculation are slower than expected.
ACT on ColabA100
DOT on ColabA100
Act on M2 Macbook Air
DOT on M2 Macbook Air
Unless I am mistaken, the DOT gradient calculation should be faster than the ACT and the data loading should be the same.
Is this a implementation problem?