[DTensor] Add transfer plan computation for cross-mesh DTensor redistribution#1639
Open
vmoens wants to merge 1 commit intogh/vmoens/80/basefrom
Open
[DTensor] Add transfer plan computation for cross-mesh DTensor redistribution#1639vmoens wants to merge 1 commit intogh/vmoens/80/basefrom
vmoens wants to merge 1 commit intogh/vmoens/80/basefrom
Conversation
Contributor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
Contributor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 40.5100μs | 15.3135μs | 65.3018 KOps/s | 66.6704 KOps/s | |
| test_plain_set_stack_nested | 37.0400μs | 15.1823μs | 65.8661 KOps/s | 66.1368 KOps/s | |
| test_plain_set_nested_inplace | 44.0410μs | 16.6865μs | 59.9287 KOps/s | 59.9007 KOps/s | |
| test_plain_set_stack_nested_inplace | 42.1510μs | 16.5489μs | 60.4270 KOps/s | 59.7537 KOps/s | |
| test_items | 38.4900μs | 6.1109μs | 163.6408 KOps/s | 165.9487 KOps/s | |
| test_items_nested | 0.5131ms | 0.4636ms | 2.1568 KOps/s | 2.1567 KOps/s | |
| test_items_nested_locked | 0.5155ms | 0.4675ms | 2.1393 KOps/s | 2.1354 KOps/s | |
| test_items_nested_leaf | 0.1661ms | 96.7166μs | 10.3395 KOps/s | 10.1746 KOps/s | |
| test_items_stack_nested | 0.5120ms | 0.4674ms | 2.1397 KOps/s | 2.1545 KOps/s | |
| test_items_stack_nested_leaf | 0.1355ms | 98.6486μs | 10.1370 KOps/s | 10.0290 KOps/s | |
| test_items_stack_nested_locked | 0.5117ms | 0.4687ms | 2.1335 KOps/s | 2.1468 KOps/s | |
| test_keys | 36.4610μs | 4.2331μs | 236.2341 KOps/s | 234.7756 KOps/s | |
| test_keys_nested | 0.1964ms | 0.1296ms | 7.7185 KOps/s | 7.7226 KOps/s | |
| test_keys_nested_locked | 0.7704ms | 0.1397ms | 7.1581 KOps/s | 7.2127 KOps/s | |
| test_keys_nested_leaf | 0.1654ms | 0.1217ms | 8.2157 KOps/s | 8.2975 KOps/s | |
| test_keys_stack_nested | 0.1769ms | 0.1309ms | 7.6422 KOps/s | 7.6467 KOps/s | |
| test_keys_stack_nested_leaf | 0.1556ms | 0.1213ms | 8.2449 KOps/s | 8.2653 KOps/s | |
| test_keys_stack_nested_locked | 0.1834ms | 0.1399ms | 7.1489 KOps/s | 7.2519 KOps/s | |
| test_values | 11.3022μs | 1.0240μs | 976.5860 KOps/s | 975.2531 KOps/s | |
| test_values_nested | 83.0610μs | 52.7437μs | 18.9596 KOps/s | 19.0916 KOps/s | |
| test_values_nested_locked | 94.1610μs | 56.3215μs | 17.7552 KOps/s | 17.8145 KOps/s | |
| test_values_nested_leaf | 96.5820μs | 60.6813μs | 16.4795 KOps/s | 17.0871 KOps/s | |
| test_values_stack_nested | 88.0610μs | 52.9292μs | 18.8932 KOps/s | 18.7569 KOps/s | |
| test_values_stack_nested_leaf | 93.4920μs | 60.6546μs | 16.4868 KOps/s | 16.5513 KOps/s | |
| test_values_stack_nested_locked | 87.2910μs | 56.4233μs | 17.7232 KOps/s | 17.8506 KOps/s | |
| test_membership | 11.4202μs | 0.8372μs | 1.1945 MOps/s | 1.1639 MOps/s | |
| test_membership_nested | 30.2100μs | 2.8838μs | 346.7629 KOps/s | 344.6520 KOps/s | |
| test_membership_nested_leaf | 25.4910μs | 2.8968μs | 345.2087 KOps/s | 341.1969 KOps/s | |
| test_membership_stacked_nested | 36.3910μs | 2.8713μs | 348.2736 KOps/s | 347.8237 KOps/s | |
| test_membership_stacked_nested_leaf | 22.2300μs | 2.8790μs | 347.3448 KOps/s | 348.6078 KOps/s | |
| test_membership_nested_last | 29.6910μs | 4.3460μs | 230.0944 KOps/s | 228.8973 KOps/s | |
| test_membership_nested_leaf_last | 30.6700μs | 4.3463μs | 230.0814 KOps/s | 227.5173 KOps/s | |
| test_membership_stacked_nested_last | 71.0810μs | 4.3516μs | 229.8011 KOps/s | 227.8177 KOps/s | |
| test_membership_stacked_nested_leaf_last | 31.7900μs | 4.3232μs | 231.3092 KOps/s | 230.6605 KOps/s | |
| test_nested_getleaf | 51.1410μs | 21.6296μs | 46.2330 KOps/s | 45.8269 KOps/s | |
| test_nested_get | 58.2610μs | 20.7229μs | 48.2557 KOps/s | 49.1516 KOps/s | |
| test_stacked_getleaf | 47.4010μs | 21.5452μs | 46.4141 KOps/s | 45.5110 KOps/s | |
| test_stacked_get | 57.5610μs | 20.5948μs | 48.5560 KOps/s | 47.8800 KOps/s | |
| test_nested_getitemleaf | 89.6220μs | 22.0759μs | 45.2983 KOps/s | 45.1702 KOps/s | |
| test_nested_getitem | 45.3810μs | 21.0461μs | 47.5147 KOps/s | 47.0940 KOps/s | |
| test_stacked_getitemleaf | 47.8200μs | 22.0222μs | 45.4087 KOps/s | 44.4461 KOps/s | |
| test_stacked_getitem | 48.8010μs | 21.0484μs | 47.5096 KOps/s | 46.5242 KOps/s | |
| test_lock_nested | 7.8906ms | 0.4856ms | 2.0595 KOps/s | 2.0952 KOps/s | |
| test_lock_stack_nested | 0.5357ms | 0.4798ms | 2.0842 KOps/s | 2.0583 KOps/s | |
| test_unlock_nested | 0.4606ms | 0.3885ms | 2.5739 KOps/s | 2.5737 KOps/s | |
| test_unlock_stack_nested | 0.4593ms | 0.3900ms | 2.5638 KOps/s | 2.5179 KOps/s | |
| test_flatten_speed | 0.1729ms | 0.1213ms | 8.2414 KOps/s | 8.1155 KOps/s | |
| test_unflatten_speed | 0.6417ms | 0.5774ms | 1.7319 KOps/s | 1.7350 KOps/s | |
| test_common_ops | 0.9198ms | 0.7015ms | 1.4255 KOps/s | 1.4382 KOps/s | |
| test_creation | 90.0210μs | 3.1642μs | 316.0374 KOps/s | 318.5614 KOps/s | |
| test_creation_empty | 51.4610μs | 6.9460μs | 143.9673 KOps/s | 141.6286 KOps/s | |
| test_creation_nested_1 | 49.0510μs | 11.5697μs | 86.4325 KOps/s | 86.5407 KOps/s | |
| test_creation_nested_2 | 46.6510μs | 13.3625μs | 74.8364 KOps/s | 74.7257 KOps/s | |
| test_creation_many_keys[10] | 54.0910μs | 20.9429μs | 47.7490 KOps/s | 47.0981 KOps/s | |
| test_creation_many_keys[50] | 0.1253ms | 89.2396μs | 11.2058 KOps/s | 10.9006 KOps/s | |
| test_creation_many_keys[100] | 0.2831ms | 0.1763ms | 5.6725 KOps/s | 5.5082 KOps/s | |
| test_creation_nested_many_keys[10] | 73.2210μs | 45.3257μs | 22.0625 KOps/s | 21.7019 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2274ms | 0.1831ms | 5.4623 KOps/s | 5.4804 KOps/s | |
| test_clone | 95.7920μs | 13.5147μs | 73.9933 KOps/s | 73.9038 KOps/s | |
| test_getitem[int] | 1.5267ms | 15.1437μs | 66.0342 KOps/s | 59.4539 KOps/s | |
| test_getitem[slice_int] | 0.1332ms | 24.1946μs | 41.3316 KOps/s | 41.3898 KOps/s | |
| test_getitem[range] | 0.1765ms | 63.8715μs | 15.6564 KOps/s | 15.6007 KOps/s | |
| test_getitem[tuple] | 0.1398ms | 24.1541μs | 41.4008 KOps/s | 42.0665 KOps/s | |
| test_getitem[list] | 0.1786ms | 58.6637μs | 17.0463 KOps/s | 16.6128 KOps/s | |
| test_setitem_dim[int] | 46.8710μs | 26.4326μs | 37.8320 KOps/s | 37.0551 KOps/s | |
| test_setitem_dim[slice_int] | 65.5210μs | 43.1571μs | 23.1711 KOps/s | 22.1555 KOps/s | |
| test_setitem_dim[range] | 0.1196ms | 95.6660μs | 10.4530 KOps/s | 9.9936 KOps/s | |
| test_setitem_dim[tuple] | 69.8410μs | 40.5963μs | 24.6328 KOps/s | 24.2068 KOps/s | |
| test_setitem | 54.7410μs | 18.0037μs | 55.5442 KOps/s | 55.2244 KOps/s | |
| test_set | 51.8710μs | 17.2381μs | 58.0110 KOps/s | 58.5751 KOps/s | |
| test_set_shared | 0.6216ms | 0.2130ms | 4.6959 KOps/s | 4.8817 KOps/s | |
| test_update | 0.2031ms | 21.7312μs | 46.0168 KOps/s | 44.9664 KOps/s | |
| test_update_nested | 68.6210μs | 33.1134μs | 30.1993 KOps/s | 30.0251 KOps/s | |
| test_update__nested | 0.4477ms | 34.1817μs | 29.2554 KOps/s | 28.9103 KOps/s | |
| test_set_nested | 60.7110μs | 20.3960μs | 49.0291 KOps/s | 51.5897 KOps/s | |
| test_set_nested_new | 58.2110μs | 26.0347μs | 38.4103 KOps/s | 41.6554 KOps/s | |
| test_select | 75.1310μs | 43.2440μs | 23.1246 KOps/s | 24.4331 KOps/s | |
| test_select_nested | 0.1050ms | 74.9398μs | 13.3441 KOps/s | 13.5116 KOps/s | |
| test_exclude_nested | 0.1314ms | 91.9688μs | 10.8733 KOps/s | 10.9017 KOps/s | |
| test_empty[True] | 0.4634ms | 0.3995ms | 2.5029 KOps/s | 2.5096 KOps/s | |
| test_empty[False] | 7.8475μs | 1.3270μs | 753.5974 KOps/s | 768.0517 KOps/s | |
| test_to | 0.1024ms | 71.7361μs | 13.9400 KOps/s | 13.4951 KOps/s | |
| test_to_nonblocking | 0.1222ms | 65.4454μs | 15.2799 KOps/s | 15.2487 KOps/s | |
| test_unbind_speed | 0.3731ms | 0.3338ms | 2.9956 KOps/s | 2.9876 KOps/s | |
| test_unbind_speed_stack0 | 0.4247ms | 0.3290ms | 3.0398 KOps/s | 3.0080 KOps/s | |
| test_unbind_speed_stack1 | 0.1035s | 0.8384ms | 1.1927 KOps/s | 1.1673 KOps/s | |
| test_split | 0.1035s | 1.2668ms | 789.4121 Ops/s | 782.4156 Ops/s | |
| test_chunk | 0.1032s | 1.2125ms | 824.7189 Ops/s | 918.6403 Ops/s | |
| test_to_cpu_blocking | 19.8678ms | 19.7660ms | 50.5919 Ops/s | 34.1241 Ops/s | |
| test_to_cpu_global_sync | 11.7691ms | 11.6839ms | 85.5879 Ops/s | 76.8206 Ops/s | |
| test_to_cpu_event_sync | 13.4181ms | 12.6749ms | 78.8960 Ops/s | 78.7987 Ops/s | |
| test_to_cpu_default | 0.1161s | 14.0025ms | 71.4159 Ops/s | 78.5336 Ops/s | |
| test_consolidate[False-None] | 4.2335ms | 4.1546ms | 240.6945 Ops/s | 214.6916 Ops/s | |
| test_consolidate[default-None] | 2.6717ms | 2.0491ms | 488.0300 Ops/s | 479.1963 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0520ms | 1.9743ms | 506.5149 Ops/s | 504.8202 Ops/s | |
| test_consolidate_njt[False-None] | 0.1913s | 10.1671ms | 98.3560 Ops/s | 116.7773 Ops/s | |
| test_to[False-False-None] | 2.2287ms | 2.1288ms | 469.7501 Ops/s | 465.5847 Ops/s | |
| test_to[True-False-None] | 2.1483ms | 1.9124ms | 522.9123 Ops/s | 519.6354 Ops/s | |
| test_to[within-False-None] | 6.3868ms | 6.1781ms | 161.8616 Ops/s | 162.6492 Ops/s | |
| test_to[True-default-None] | 9.0233ms | 8.8430ms | 113.0842 Ops/s | 109.1644 Ops/s | |
| test_to_njt[False-False-None] | 8.8336ms | 8.5131ms | 117.4664 Ops/s | 116.1406 Ops/s | |
| test_to_njt[True-False-None] | 7.1127ms | 6.9275ms | 144.3515 Ops/s | 141.6555 Ops/s | |
| test_to_njt[within-False-None] | 15.8162ms | 15.6220ms | 64.0124 Ops/s | 62.8856 Ops/s | |
| test_creation[device0] | 0.4486ms | 0.1167ms | 8.5680 KOps/s | 8.6259 KOps/s | |
| test_creation_from_tensor | 0.5865ms | 0.1142ms | 8.7602 KOps/s | 8.6896 KOps/s | |
| test_add_one[memmap_tensor0] | 0.3086ms | 6.6452μs | 150.4854 KOps/s | 146.8191 KOps/s | |
| test_contiguous[memmap_tensor0] | 26.3900μs | 0.6691μs | 1.4945 MOps/s | 2.1624 MOps/s | |
| test_stack[memmap_tensor0] | 32.7000μs | 4.6024μs | 217.2789 KOps/s | 219.1664 KOps/s | |
| test_memmaptd_index | 1.0726ms | 0.2649ms | 3.7755 KOps/s | 3.7607 KOps/s | |
| test_memmaptd_index_astensor | 0.5351ms | 0.3686ms | 2.7128 KOps/s | 2.7020 KOps/s | |
| test_memmaptd_index_op | 0.1639s | 0.7315ms | 1.3671 KOps/s | 1.5870 KOps/s | |
| test_serialize_model | 0.1394s | 0.1371s | 7.2957 Ops/s | 7.2560 Ops/s | |
| test_serialize_model_pickle | 1.3973s | 1.2125s | 0.8247 Ops/s | 0.8249 Ops/s | |
| test_serialize_weights | 0.1375s | 0.1359s | 7.3585 Ops/s | 7.3159 Ops/s | |
| test_serialize_weights_returnearly | 0.2825s | 94.1937ms | 10.6164 Ops/s | 14.4578 Ops/s | |
| test_serialize_weights_pickle | 1.3687s | 1.2137s | 0.8239 Ops/s | 0.8236 Ops/s | |
| test_reshape_pytree | 0.2005ms | 32.9318μs | 30.3658 KOps/s | 30.1674 KOps/s | |
| test_reshape_td | 89.1320μs | 45.2927μs | 22.0786 KOps/s | 22.0490 KOps/s | |
| test_view_pytree | 0.2194ms | 32.5197μs | 30.7506 KOps/s | 30.3671 KOps/s | |
| test_view_td | 0.1118ms | 53.1348μs | 18.8200 KOps/s | 18.3961 KOps/s | |
| test_unbind_pytree | 0.2346ms | 36.5464μs | 27.3625 KOps/s | 26.8394 KOps/s | |
| test_unbind_td | 0.1909ms | 49.7230μs | 20.1114 KOps/s | 19.9336 KOps/s | |
| test_split_pytree | 0.2483ms | 42.7336μs | 23.4008 KOps/s | 23.2711 KOps/s | |
| test_split_td | 0.2188ms | 63.6834μs | 15.7027 KOps/s | 15.5271 KOps/s | |
| test_add_pytree | 0.2045ms | 42.5387μs | 23.5080 KOps/s | 23.1369 KOps/s | |
| test_add_td | 0.1176ms | 55.3437μs | 18.0689 KOps/s | 17.8144 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.1924ms | 0.1401ms | 7.1368 KOps/s | 6.8622 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.2816ms | 0.2025ms | 4.9384 KOps/s | 4.9537 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1421ms | 0.1074ms | 9.3133 KOps/s | 8.8475 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4575ms | 0.1841ms | 5.4311 KOps/s | 5.3750 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.3024ms | 9.8977μs | 101.0339 KOps/s | 94.6967 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.5811ms | 53.3188μs | 18.7551 KOps/s | 18.3925 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1393ms | 9.8219μs | 101.8129 KOps/s | 102.0449 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4914ms | 69.3723μs | 14.4150 KOps/s | 14.0529 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2376ms | 0.1770ms | 5.6507 KOps/s | 3.3181 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3988ms | 0.2820ms | 3.5464 KOps/s | 3.4817 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.2138ms | 0.1174ms | 8.5144 KOps/s | 7.9935 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1159ms | 73.5535μs | 13.5955 KOps/s | 13.7048 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2050ms | 0.1570ms | 6.3685 KOps/s | 6.0588 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8087ms | 0.5387ms | 1.8563 KOps/s | 1.8497 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.4379ms | 0.3344ms | 2.9904 KOps/s | 2.9294 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2322ms | 0.1786ms | 5.5994 KOps/s | 5.0613 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1805ms | 90.2627μs | 11.0788 KOps/s | 11.1964 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.6740ms | 0.1211ms | 8.2573 KOps/s | 7.7812 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.8842ms | 0.4464ms | 2.2401 KOps/s | 2.1966 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.1906ms | 0.1573ms | 6.3591 KOps/s | 6.0389 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1201ms | 13.6590μs | 73.2117 KOps/s | 73.3661 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 0.5035ms | 42.2612μs | 23.6624 KOps/s | 23.7952 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.4519ms | 10.8009μs | 92.5848 KOps/s | 91.8209 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.5001ms | 53.0541μs | 18.8487 KOps/s | 18.7992 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0126ms | 0.1778ms | 5.6231 KOps/s | 5.4066 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.8599ms | 3.3187ms | 301.3217 Ops/s | 297.6538 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.9557ms | 0.1609ms | 6.2162 KOps/s | 5.9751 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 3.2885ms | 2.8536ms | 350.4325 Ops/s | 350.5013 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.2543ms | 0.1096ms | 9.1201 KOps/s | 8.7245 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.5133ms | 75.0424μs | 13.3258 KOps/s | 13.2396 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.6187ms | 95.5601μs | 10.4646 KOps/s | 10.1551 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.5210ms | 44.9058μs | 22.2688 KOps/s | 20.7297 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.5621ms | 96.0202μs | 10.4145 KOps/s | 10.0847 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2492ms | 44.4537μs | 22.4953 KOps/s | 21.7287 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.5253ms | 56.5248μs | 17.6913 KOps/s | 17.0063 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2172ms | 27.4556μs | 36.4224 KOps/s | 35.8536 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1304ms | 44.4844μs | 22.4798 KOps/s | 22.2755 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.4814ms | 22.6257μs | 44.1975 KOps/s | 44.1138 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 0.4725ms | 45.3822μs | 22.0351 KOps/s | 21.5143 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.4622ms | 22.6451μs | 44.1596 KOps/s | 43.6134 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.4997ms | 57.0164μs | 17.5388 KOps/s | 16.7974 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.5809ms | 27.5512μs | 36.2961 KOps/s | 36.0663 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 84.4320μs | 44.9726μs | 22.2357 KOps/s | 21.8972 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.4635ms | 22.5697μs | 44.3072 KOps/s | 43.6325 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 0.4873ms | 45.1276μs | 22.1594 KOps/s | 22.0899 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.4532ms | 22.4719μs | 44.5001 KOps/s | 43.7609 KOps/s | |
| test_compile_replace[single-eager] | 0.4921ms | 47.7454μs | 20.9444 KOps/s | 21.0288 KOps/s | |
| test_compile_replace[single-compile] | 0.1804ms | 0.1060ms | 9.4366 KOps/s | 9.1537 KOps/s | |
| test_compile_replace[multi-eager] | 1.0405ms | 0.5729ms | 1.7456 KOps/s | 1.7741 KOps/s | |
| test_compile_replace[multi-compile] | 0.1438ms | 0.1127ms | 8.8754 KOps/s | 8.6665 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.6056ms | 0.1731ms | 5.7768 KOps/s | 5.7892 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.2723ms | 0.1196ms | 8.3583 KOps/s | 8.1217 KOps/s | |
| test_compile_clone_shallow[20-eager] | 0.4468ms | 19.8725μs | 50.3207 KOps/s | 52.5398 KOps/s | |
| test_compile_clone_shallow[20-compile] | 0.4872ms | 11.2274μs | 89.0676 KOps/s | 85.1435 KOps/s | |
| test_compile_clone_shallow[40-eager] | 0.4535ms | 34.5778μs | 28.9203 KOps/s | 29.2583 KOps/s | |
| test_compile_clone_shallow[40-compile] | 67.6610μs | 12.7920μs | 78.1740 KOps/s | 75.8066 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.5073ms | 63.0851μs | 15.8516 KOps/s | 15.5398 KOps/s | |
| test_compile_clone_shallow[80-compile] | 0.4475ms | 15.0108μs | 66.6186 KOps/s | 64.0099 KOps/s | |
| test_compile_update_inplace[eager] | 0.5093ms | 59.5563μs | 16.7908 KOps/s | 17.0130 KOps/s | |
| test_compile_update_inplace[compile] | 0.5708ms | 0.1393ms | 7.1775 KOps/s | 6.9047 KOps/s | |
| test_mod_add[eager] | 0.4843ms | 49.8419μs | 20.0635 KOps/s | 20.2420 KOps/s | |
| test_mod_add[compile] | 0.3205ms | 0.1043ms | 9.5910 KOps/s | 9.4114 KOps/s | |
| test_mod_add[compile-overhead] | 0.2357ms | 0.1495ms | 6.6886 KOps/s | 6.5218 KOps/s | |
| test_mod_wrap[eager] | 0.7431ms | 0.2907ms | 3.4397 KOps/s | 3.4074 KOps/s | |
| test_mod_wrap[compile] | 0.4420ms | 0.3607ms | 2.7724 KOps/s | 2.8072 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.0034ms | 3.7894ms | 263.8942 Ops/s | 250.1506 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6084ms | 1.4942ms | 669.2355 Ops/s | 661.8537 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.5289ms | 1.4413ms | 693.8331 Ops/s | 678.4257 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.3034ms | 0.8939ms | 1.1187 KOps/s | 1.0945 KOps/s | |
| test_seq_add[eager] | 0.2372ms | 0.1610ms | 6.2105 KOps/s | 6.4532 KOps/s | |
| test_seq_add[compile] | 0.5209ms | 0.1187ms | 8.4236 KOps/s | 7.9138 KOps/s | |
| test_seq_add[compile-overhead] | 0.2302ms | 0.1614ms | 6.1952 KOps/s | 6.2071 KOps/s | |
| test_seq_wrap[eager] | 0.6240ms | 0.5464ms | 1.8302 KOps/s | 1.9115 KOps/s | |
| test_seq_wrap[compile] | 0.4235ms | 0.3667ms | 2.7273 KOps/s | 2.6553 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3346ms | 0.2640ms | 3.7884 KOps/s | 3.6608 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9486ms | 0.8753ms | 1.1424 KOps/s | 1.1775 KOps/s | |
| test_func_call_runtime[False-compile] | 1.1083ms | 0.9222ms | 1.0843 KOps/s | 1.0664 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5218ms | 0.4641ms | 2.1546 KOps/s | 2.1392 KOps/s | |
| test_func_call_runtime[True-eager] | 1.2159ms | 1.0936ms | 914.4053 Ops/s | 910.2943 Ops/s | |
| test_func_call_runtime[True-compile] | 1.0116ms | 0.9388ms | 1.0652 KOps/s | 1.0605 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5583ms | 0.4766ms | 2.0982 KOps/s | 2.0642 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 1.0179ms | 0.8406ms | 1.1897 KOps/s | 1.1708 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.0422ms | 0.9502ms | 1.0524 KOps/s | 1.0668 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.6329ms | 0.4631ms | 2.1593 KOps/s | 2.1140 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3953ms | 1.2321ms | 811.5934 Ops/s | 800.7785 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.1294ms | 0.9556ms | 1.0465 KOps/s | 1.0183 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5987ms | 0.5091ms | 1.9642 KOps/s | 1.9209 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8791ms | 2.3773ms | 420.6493 Ops/s | 415.8002 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0982ms | 0.9811ms | 1.0193 KOps/s | 991.3028 Ops/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5636ms | 0.5146ms | 1.9434 KOps/s | 1.9079 KOps/s | |
| test_distributed | 0.6722ms | 0.1534ms | 6.5190 KOps/s | 6.3736 KOps/s | |
| test_tdmodule | 0.3947ms | 27.6611μs | 36.1519 KOps/s | 35.5631 KOps/s | |
| test_tdmodule_dispatch | 78.6320μs | 45.1207μs | 22.1628 KOps/s | 21.9093 KOps/s | |
| test_tdseq | 77.5110μs | 26.7269μs | 37.4154 KOps/s | 37.3301 KOps/s | |
| test_tdseq_dispatch | 69.4310μs | 46.7983μs | 21.3683 KOps/s | 20.9876 KOps/s | |
| test_instantiation_functorch | 2.1454ms | 2.0844ms | 479.7445 Ops/s | 476.9101 Ops/s | |
| test_exec_functorch | 0.2475ms | 0.1777ms | 5.6268 KOps/s | 5.4605 KOps/s | |
| test_exec_functional_call | 0.2105ms | 0.1615ms | 6.1918 KOps/s | 6.0892 KOps/s | |
| test_exec_td_decorator | 0.4380ms | 0.2362ms | 4.2330 KOps/s | 4.0755 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0419ms | 0.8321ms | 1.2018 KOps/s | 1.1580 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0400ms | 0.8300ms | 1.2048 KOps/s | 1.1590 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 1.0039ms | 0.7186ms | 1.3916 KOps/s | 1.3467 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8929ms | 0.7156ms | 1.3974 KOps/s | 1.3345 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.2028ms | 20.6326ms | 48.4671 Ops/s | 47.9364 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.3554ms | 20.6374ms | 48.4558 Ops/s | 47.9084 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 21.1218ms | 20.4469ms | 48.9072 Ops/s | 48.3888 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.6631ms | 20.4102ms | 48.9951 Ops/s | 48.3589 Ops/s | |
| test_to_module_speed[True] | 1.5686ms | 1.4869ms | 672.5447 Ops/s | 675.5950 Ops/s | |
| test_to_module_speed[False] | 1.5552ms | 1.4690ms | 680.7153 Ops/s | 687.2723 Ops/s | |
| test_tc_init | 73.7910μs | 44.3123μs | 22.5671 KOps/s | 22.3258 KOps/s | |
| test_tc_init_tensor_only | 33.5810μs | 9.7679μs | 102.3764 KOps/s | 103.3791 KOps/s | |
| test_tc_init_nested | 0.1233ms | 88.1582μs | 11.3432 KOps/s | 11.1883 KOps/s | |
| test_tc_init_many_fields | 39.4500μs | 16.3144μs | 61.2957 KOps/s | 60.8914 KOps/s | |
| test_tc_first_layer_tensor | 31.5700μs | 1.8372μs | 544.3064 KOps/s | 551.8987 KOps/s | |
| test_tc_first_layer_tensor_only | 1.5495μs | 0.3963μs | 2.5231 MOps/s | 2.5109 MOps/s | |
| test_tc_first_layer_tensor_set | 27.0400μs | 3.9351μs | 254.1210 KOps/s | 254.2313 KOps/s | |
| test_tc_first_layer_tensor_only_set | 24.5600μs | 3.2652μs | 306.2606 KOps/s | 300.6727 KOps/s | |
| test_tc_first_layer_nontensor | 26.6900μs | 6.2300μs | 160.5142 KOps/s | 160.7150 KOps/s | |
| test_tc_second_layer_tensor | 34.4210μs | 4.4901μs | 222.7108 KOps/s | 223.7364 KOps/s | |
| test_tc_second_layer_nontensor | 42.8510μs | 8.8142μs | 113.4534 KOps/s | 115.1649 KOps/s | |
| test_unbind | 0.2649s | 17.9828ms | 55.6088 Ops/s | 66.5562 Ops/s | |
| test_full_like | 7.5270ms | 4.3436ms | 230.2237 Ops/s | 59.2888 Ops/s | |
| test_zeros_like | 5.0215ms | 4.3789ms | 228.3676 Ops/s | 59.3357 Ops/s | |
| test_ones_like | 4.5911ms | 4.3751ms | 228.5672 Ops/s | 59.5702 Ops/s | |
| test_clone | 6.5930ms | 6.4375ms | 155.3397 Ops/s | 56.2241 Ops/s | |
| test_squeeze | 79.2910μs | 13.9319μs | 71.7776 KOps/s | 69.9429 KOps/s | |
| test_unsqueeze | 0.1745ms | 0.1116ms | 8.9595 KOps/s | 8.9747 KOps/s | |
| test_split | 0.2935ms | 0.1828ms | 5.4715 KOps/s | 5.3034 KOps/s | |
| test_permute | 0.3204ms | 0.2027ms | 4.9329 KOps/s | 4.6784 KOps/s | |
| test_stack | 55.0164ms | 54.2037ms | 18.4489 Ops/s | 19.3471 Ops/s | |
| test_cat | 54.1943ms | 51.6324ms | 19.3677 Ops/s | 19.4073 Ops/s | |
| test_sequential_tensordict | 0.6281ms | 0.2215ms | 4.5150 KOps/s | 4.5973 KOps/s | |
| test_sequential_graph_module | 0.1588ms | 0.1263ms | 7.9204 KOps/s | 8.1941 KOps/s | |
| test_nested_tensordict | 0.3384ms | 0.2841ms | 3.5201 KOps/s | 3.5288 KOps/s | |
| test_nested_graph_module | 0.5495ms | 0.1359ms | 7.3568 KOps/s | 7.3948 KOps/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 29.9510μs | 14.4746μs | 69.0863 KOps/s | 70.3572 KOps/s | |
| test_plain_set_stack_nested | 38.4310μs | 14.8071μs | 67.5354 KOps/s | 68.4167 KOps/s | |
| test_plain_set_nested_inplace | 37.3010μs | 16.2131μs | 61.6785 KOps/s | 61.8554 KOps/s | |
| test_plain_set_stack_nested_inplace | 40.6910μs | 16.2810μs | 61.4213 KOps/s | 62.6643 KOps/s | |
| test_items | 33.8000μs | 5.5689μs | 179.5691 KOps/s | 179.9665 KOps/s | |
| test_items_nested | 0.4881ms | 0.4490ms | 2.2271 KOps/s | 2.2359 KOps/s | |
| test_items_nested_locked | 0.5027ms | 0.4532ms | 2.2064 KOps/s | 2.2107 KOps/s | |
| test_items_nested_leaf | 0.1512ms | 91.9447μs | 10.8761 KOps/s | 10.8080 KOps/s | |
| test_items_stack_nested | 0.4797ms | 0.4505ms | 2.2195 KOps/s | 2.2440 KOps/s | |
| test_items_stack_nested_leaf | 0.1541ms | 94.4595μs | 10.5865 KOps/s | 10.6554 KOps/s | |
| test_items_stack_nested_locked | 0.5449ms | 0.4538ms | 2.2038 KOps/s | 2.2231 KOps/s | |
| test_keys | 28.0310μs | 4.1360μs | 241.7796 KOps/s | 240.9997 KOps/s | |
| test_keys_nested | 0.1816ms | 0.1283ms | 7.7942 KOps/s | 7.8401 KOps/s | |
| test_keys_nested_locked | 0.7720ms | 0.1357ms | 7.3682 KOps/s | 7.3580 KOps/s | |
| test_keys_nested_leaf | 0.1639ms | 0.1186ms | 8.4295 KOps/s | 8.5272 KOps/s | |
| test_keys_stack_nested | 0.1691ms | 0.1290ms | 7.7523 KOps/s | 7.8091 KOps/s | |
| test_keys_stack_nested_leaf | 0.1495ms | 0.1183ms | 8.4512 KOps/s | 8.4457 KOps/s | |
| test_keys_stack_nested_locked | 0.1639ms | 0.1363ms | 7.3342 KOps/s | 7.3446 KOps/s | |
| test_values | 6.1502μs | 1.0115μs | 988.5902 KOps/s | 998.6009 KOps/s | |
| test_values_nested | 75.9710μs | 51.5790μs | 19.3877 KOps/s | 19.6293 KOps/s | |
| test_values_nested_locked | 90.5910μs | 55.1383μs | 18.1362 KOps/s | 18.4124 KOps/s | |
| test_values_nested_leaf | 0.1064ms | 59.1945μs | 16.8935 KOps/s | 17.0319 KOps/s | |
| test_values_stack_nested | 85.0210μs | 51.7311μs | 19.3307 KOps/s | 19.5398 KOps/s | |
| test_values_stack_nested_leaf | 93.0620μs | 59.2799μs | 16.8691 KOps/s | 17.1487 KOps/s | |
| test_values_stack_nested_locked | 0.1002ms | 55.2764μs | 18.0909 KOps/s | 18.5954 KOps/s | |
| test_membership | 4.8418μs | 0.8146μs | 1.2275 MOps/s | 1.2204 MOps/s | |
| test_membership_nested | 32.6310μs | 2.7648μs | 361.6950 KOps/s | 362.3219 KOps/s | |
| test_membership_nested_leaf | 19.3455μs | 2.6404μs | 378.7293 KOps/s | 360.8127 KOps/s | |
| test_membership_stacked_nested | 31.9000μs | 2.7177μs | 367.9635 KOps/s | 362.2735 KOps/s | |
| test_membership_stacked_nested_leaf | 28.9600μs | 2.7408μs | 364.8513 KOps/s | 366.4649 KOps/s | |
| test_membership_nested_last | 32.3000μs | 4.1465μs | 241.1668 KOps/s | 241.7224 KOps/s | |
| test_membership_nested_leaf_last | 34.4110μs | 4.1278μs | 242.2603 KOps/s | 241.9704 KOps/s | |
| test_membership_stacked_nested_last | 28.0000μs | 4.1215μs | 242.6305 KOps/s | 244.2299 KOps/s | |
| test_membership_stacked_nested_leaf_last | 39.4810μs | 4.1062μs | 243.5338 KOps/s | 243.0951 KOps/s | |
| test_nested_getleaf | 50.1810μs | 21.0310μs | 47.5489 KOps/s | 49.6247 KOps/s | |
| test_nested_get | 53.8210μs | 19.4881μs | 51.3133 KOps/s | 52.6842 KOps/s | |
| test_stacked_getleaf | 80.2220μs | 20.3114μs | 49.2334 KOps/s | 50.1236 KOps/s | |
| test_stacked_get | 42.6910μs | 19.5531μs | 51.1427 KOps/s | 52.5698 KOps/s | |
| test_nested_getitemleaf | 65.5010μs | 20.8844μs | 47.8826 KOps/s | 49.0730 KOps/s | |
| test_nested_getitem | 46.5310μs | 19.7303μs | 50.6834 KOps/s | 51.4573 KOps/s | |
| test_stacked_getitemleaf | 45.9010μs | 20.8309μs | 48.0056 KOps/s | 48.9079 KOps/s | |
| test_stacked_getitem | 62.2820μs | 20.0357μs | 49.9110 KOps/s | 51.1194 KOps/s | |
| test_lock_nested | 8.0686ms | 0.4650ms | 2.1506 KOps/s | 2.1924 KOps/s | |
| test_lock_stack_nested | 0.4932ms | 0.4570ms | 2.1883 KOps/s | 2.1611 KOps/s | |
| test_unlock_nested | 0.4447ms | 0.3723ms | 2.6862 KOps/s | 2.6995 KOps/s | |
| test_unlock_stack_nested | 0.4151ms | 0.3709ms | 2.6964 KOps/s | 2.6729 KOps/s | |
| test_flatten_speed | 0.1597ms | 0.1174ms | 8.5172 KOps/s | 8.5876 KOps/s | |
| test_unflatten_speed | 0.5794ms | 0.5477ms | 1.8257 KOps/s | 1.8296 KOps/s | |
| test_common_ops | 0.8562ms | 0.6781ms | 1.4747 KOps/s | 1.4584 KOps/s | |
| test_creation | 70.7510μs | 2.9560μs | 338.2925 KOps/s | 337.5566 KOps/s | |
| test_creation_empty | 45.1310μs | 6.6304μs | 150.8195 KOps/s | 152.5791 KOps/s | |
| test_creation_nested_1 | 44.8910μs | 11.0896μs | 90.1742 KOps/s | 91.5865 KOps/s | |
| test_creation_nested_2 | 38.1510μs | 12.5533μs | 79.6606 KOps/s | 79.5836 KOps/s | |
| test_creation_many_keys[10] | 40.3900μs | 19.7750μs | 50.5690 KOps/s | 50.4786 KOps/s | |
| test_creation_many_keys[50] | 0.1159ms | 85.7428μs | 11.6628 KOps/s | 11.9174 KOps/s | |
| test_creation_many_keys[100] | 0.2247ms | 0.1703ms | 5.8727 KOps/s | 5.9365 KOps/s | |
| test_creation_nested_many_keys[10] | 72.4320μs | 42.7515μs | 23.3910 KOps/s | 23.4469 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2127ms | 0.1751ms | 5.7115 KOps/s | 5.7755 KOps/s | |
| test_clone | 39.7000μs | 12.4730μs | 80.1731 KOps/s | 75.7557 KOps/s | |
| test_getitem[int] | 1.6508ms | 14.5103μs | 68.9166 KOps/s | 62.1008 KOps/s | |
| test_getitem[slice_int] | 0.1313ms | 23.1169μs | 43.2583 KOps/s | 43.0199 KOps/s | |
| test_getitem[range] | 0.1716ms | 60.3854μs | 16.5603 KOps/s | 16.3056 KOps/s | |
| test_getitem[tuple] | 0.1389ms | 22.6100μs | 44.2281 KOps/s | 43.6674 KOps/s | |
| test_getitem[list] | 0.1736ms | 57.1289μs | 17.5043 KOps/s | 17.6820 KOps/s | |
| test_setitem_dim[int] | 45.2610μs | 25.3471μs | 39.4522 KOps/s | 39.5695 KOps/s | |
| test_setitem_dim[slice_int] | 74.6610μs | 41.7705μs | 23.9403 KOps/s | 23.8929 KOps/s | |
| test_setitem_dim[range] | 0.1276ms | 91.3920μs | 10.9419 KOps/s | 10.4145 KOps/s | |
| test_setitem_dim[tuple] | 59.9810μs | 38.7164μs | 25.8289 KOps/s | 24.4909 KOps/s | |
| test_setitem | 54.5310μs | 16.9992μs | 58.8265 KOps/s | 55.2165 KOps/s | |
| test_set | 0.5076ms | 16.0838μs | 62.1744 KOps/s | 59.7936 KOps/s | |
| test_set_shared | 0.6236ms | 0.2054ms | 4.8674 KOps/s | 4.7702 KOps/s | |
| test_update | 0.1869ms | 21.0581μs | 47.4876 KOps/s | 43.9517 KOps/s | |
| test_update_nested | 64.7120μs | 31.6188μs | 31.6268 KOps/s | 29.4382 KOps/s | |
| test_update__nested | 0.5234ms | 32.8994μs | 30.3957 KOps/s | 27.7926 KOps/s | |
| test_set_nested | 54.6810μs | 19.1531μs | 52.2109 KOps/s | 49.6973 KOps/s | |
| test_set_nested_new | 59.6210μs | 23.0374μs | 43.4077 KOps/s | 40.0664 KOps/s | |
| test_select | 68.8410μs | 38.4854μs | 25.9839 KOps/s | 24.8204 KOps/s | |
| test_select_nested | 99.9020μs | 71.1734μs | 14.0502 KOps/s | 14.0214 KOps/s | |
| test_exclude_nested | 0.1148ms | 87.2090μs | 11.4667 KOps/s | 11.3100 KOps/s | |
| test_empty[True] | 0.4556ms | 0.3842ms | 2.6025 KOps/s | 2.5994 KOps/s | |
| test_empty[False] | 6.3427μs | 1.2565μs | 795.8755 KOps/s | 793.4727 KOps/s | |
| test_to | 0.1027ms | 71.6340μs | 13.9599 KOps/s | 14.0452 KOps/s | |
| test_to_nonblocking | 0.1110ms | 67.7509μs | 14.7599 KOps/s | 15.6195 KOps/s | |
| test_unbind_speed | 0.3511ms | 0.3199ms | 3.1262 KOps/s | 3.1617 KOps/s | |
| test_unbind_speed_stack0 | 0.3829ms | 0.3173ms | 3.1515 KOps/s | 3.1851 KOps/s | |
| test_unbind_speed_stack1 | 0.1040s | 0.8081ms | 1.2375 KOps/s | 1.2369 KOps/s | |
| test_split | 0.1040s | 1.2188ms | 820.4528 Ops/s | 827.3493 Ops/s | |
| test_chunk | 0.1043s | 1.1565ms | 864.6781 Ops/s | 968.2886 Ops/s | |
| test_to_cpu_blocking | 19.5850ms | 19.5258ms | 51.2142 Ops/s | 46.2068 Ops/s | |
| test_to_cpu_global_sync | 11.3958ms | 11.3046ms | 88.4599 Ops/s | 88.5779 Ops/s | |
| test_to_cpu_event_sync | 12.6012ms | 12.3321ms | 81.0890 Ops/s | 81.2191 Ops/s | |
| test_to_cpu_default | 12.6064ms | 12.3354ms | 81.0675 Ops/s | 81.1779 Ops/s | |
| test_consolidate[False-None] | 4.0282ms | 3.9598ms | 252.5374 Ops/s | 222.9650 Ops/s | |
| test_consolidate[default-None] | 2.0184ms | 1.9238ms | 519.8022 Ops/s | 502.6801 Ops/s | |
| test_consolidate[reduce-overhead-None] | 1.9170ms | 1.8431ms | 542.5663 Ops/s | 520.1939 Ops/s | |
| test_consolidate_njt[False-None] | 8.4179ms | 8.1817ms | 122.2241 Ops/s | 120.7256 Ops/s | |
| test_to[False-False-None] | 2.1522ms | 2.0648ms | 484.2979 Ops/s | 479.8149 Ops/s | |
| test_to[True-False-None] | 2.1716ms | 1.8522ms | 539.8931 Ops/s | 525.5844 Ops/s | |
| test_to[within-False-None] | 6.2031ms | 5.8644ms | 170.5191 Ops/s | 165.6574 Ops/s | |
| test_to[True-default-None] | 8.8218ms | 8.6810ms | 115.1945 Ops/s | 113.4335 Ops/s | |
| test_to_njt[False-False-None] | 8.4674ms | 8.2635ms | 121.0147 Ops/s | 120.3563 Ops/s | |
| test_to_njt[True-False-None] | 6.8743ms | 6.7520ms | 148.1041 Ops/s | 148.1472 Ops/s | |
| test_to_njt[within-False-None] | 15.2683ms | 15.1747ms | 65.8990 Ops/s | 66.3607 Ops/s | |
| test_creation[device0] | 0.3525ms | 0.1117ms | 8.9509 KOps/s | 8.4630 KOps/s | |
| test_creation_from_tensor | 0.3916ms | 0.1091ms | 9.1635 KOps/s | 8.6482 KOps/s | |
| test_add_one[memmap_tensor0] | 0.2254ms | 6.1031μs | 163.8518 KOps/s | 156.8092 KOps/s | |
| test_contiguous[memmap_tensor0] | 17.3100μs | 0.6025μs | 1.6596 MOps/s | 2.2522 MOps/s | |
| test_stack[memmap_tensor0] | 29.8100μs | 4.3685μs | 228.9096 KOps/s | 215.2068 KOps/s | |
| test_memmaptd_index | 1.0549ms | 0.2620ms | 3.8165 KOps/s | 3.8182 KOps/s | |
| test_memmaptd_index_astensor | 0.5168ms | 0.3636ms | 2.7501 KOps/s | 2.7762 KOps/s | |
| test_memmaptd_index_op | 0.8357ms | 0.5952ms | 1.6800 KOps/s | 1.6445 KOps/s | |
| test_serialize_model | 0.1401s | 0.1364s | 7.3334 Ops/s | 5.8654 Ops/s | |
| test_serialize_model_pickle | 1.3480s | 1.2102s | 0.8263 Ops/s | 0.8384 Ops/s | |
| test_serialize_weights | 0.1374s | 0.1355s | 7.3786 Ops/s | 7.3286 Ops/s | |
| test_serialize_weights_returnearly | 0.4333s | 91.4515ms | 10.9348 Ops/s | 10.5539 Ops/s | |
| test_serialize_weights_pickle | 1.3507s | 1.2109s | 0.8258 Ops/s | 0.8227 Ops/s | |
| test_reshape_pytree | 0.2225ms | 30.8765μs | 32.3871 KOps/s | 31.2228 KOps/s | |
| test_reshape_td | 68.3210μs | 43.9953μs | 22.7297 KOps/s | 22.8735 KOps/s | |
| test_view_pytree | 0.2150ms | 30.5850μs | 32.6958 KOps/s | 31.6981 KOps/s | |
| test_view_td | 76.1610μs | 50.3693μs | 19.8534 KOps/s | 20.0181 KOps/s | |
| test_unbind_pytree | 0.2325ms | 34.4356μs | 29.0397 KOps/s | 28.1408 KOps/s | |
| test_unbind_td | 0.1075ms | 47.5965μs | 21.0099 KOps/s | 21.1483 KOps/s | |
| test_split_pytree | 0.2554ms | 39.7371μs | 25.1654 KOps/s | 24.4558 KOps/s | |
| test_split_td | 0.1485ms | 58.7680μs | 17.0161 KOps/s | 16.3856 KOps/s | |
| test_add_pytree | 0.2255ms | 39.7526μs | 25.1556 KOps/s | 24.4978 KOps/s | |
| test_add_td | 0.1045ms | 52.0055μs | 19.2287 KOps/s | 18.8809 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2125ms | 0.1397ms | 7.1602 KOps/s | 7.0295 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.4245ms | 0.2070ms | 4.8309 KOps/s | 5.1782 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.6752ms | 0.1082ms | 9.2421 KOps/s | 8.7872 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.6522ms | 0.1865ms | 5.3630 KOps/s | 5.6410 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.2503ms | 10.3265μs | 96.8384 KOps/s | 103.5455 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 77.2220μs | 50.9487μs | 19.6276 KOps/s | 19.4398 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 46.2610μs | 9.5292μs | 104.9408 KOps/s | 104.7371 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4473ms | 64.2487μs | 15.5645 KOps/s | 15.2384 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2376ms | 0.1742ms | 5.7404 KOps/s | 5.5242 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.4061ms | 0.2757ms | 3.6270 KOps/s | 3.6383 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.1889ms | 0.1167ms | 8.5696 KOps/s | 8.4864 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1186ms | 72.5933μs | 13.7754 KOps/s | 13.5592 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2202ms | 0.1565ms | 6.3899 KOps/s | 6.2737 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.9155ms | 0.5268ms | 1.8981 KOps/s | 1.8948 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.3842ms | 0.3282ms | 3.0469 KOps/s | 3.0455 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2272ms | 0.1773ms | 5.6417 KOps/s | 3.3697 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1349ms | 91.3949μs | 10.9415 KOps/s | 11.0864 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.2070ms | 0.1201ms | 8.3282 KOps/s | 7.8547 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6685ms | 0.4339ms | 2.3048 KOps/s | 2.2176 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.1944ms | 0.1554ms | 6.4341 KOps/s | 6.2204 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 43.3010μs | 13.1670μs | 75.9477 KOps/s | 74.8361 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 74.3010μs | 40.6168μs | 24.6204 KOps/s | 24.9185 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 66.2010μs | 10.3497μs | 96.6216 KOps/s | 94.5111 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4259ms | 50.5396μs | 19.7865 KOps/s | 19.5230 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 1.9381ms | 0.1700ms | 5.8830 KOps/s | 5.4184 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.3482ms | 3.2374ms | 308.8885 Ops/s | 302.0445 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.9173ms | 0.1567ms | 6.3829 KOps/s | 6.2232 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.8839ms | 2.7564ms | 362.7860 Ops/s | 356.4926 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1446ms | 0.1065ms | 9.3867 KOps/s | 8.8786 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3095ms | 73.8427μs | 13.5423 KOps/s | 13.9160 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.2027ms | 96.2995μs | 10.3843 KOps/s | 10.4544 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2503ms | 43.2035μs | 23.1463 KOps/s | 21.6090 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1444ms | 98.7164μs | 10.1300 KOps/s | 10.3920 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2698ms | 45.1328μs | 22.1568 KOps/s | 22.9686 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1099ms | 57.2495μs | 17.4674 KOps/s | 17.1955 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2254ms | 27.0528μs | 36.9647 KOps/s | 38.0890 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 96.9520μs | 42.8970μs | 23.3116 KOps/s | 22.4267 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2668ms | 21.3862μs | 46.7591 KOps/s | 46.2092 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 75.8610μs | 44.5869μs | 22.4281 KOps/s | 22.2481 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2604ms | 21.5860μs | 46.3264 KOps/s | 46.5913 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1069ms | 57.7712μs | 17.3097 KOps/s | 17.1301 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2233ms | 26.7918μs | 37.3248 KOps/s | 37.8390 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 82.7920μs | 44.1563μs | 22.6468 KOps/s | 22.3488 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2647ms | 21.3704μs | 46.7936 KOps/s | 46.3613 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 80.3020μs | 44.3870μs | 22.5291 KOps/s | 22.3233 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2498ms | 21.5562μs | 46.3904 KOps/s | 46.7378 KOps/s | |
| test_compile_replace[single-eager] | 0.1005ms | 48.2132μs | 20.7412 KOps/s | 21.3571 KOps/s | |
| test_compile_replace[single-compile] | 0.1780ms | 0.1026ms | 9.7509 KOps/s | 9.4054 KOps/s | |
| test_compile_replace[multi-eager] | 0.6414ms | 0.5654ms | 1.7686 KOps/s | 1.8008 KOps/s | |
| test_compile_replace[multi-compile] | 0.1669ms | 0.1126ms | 8.8821 KOps/s | 8.9205 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2085ms | 0.1683ms | 5.9420 KOps/s | 5.8394 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.1738ms | 0.1192ms | 8.3917 KOps/s | 8.4679 KOps/s | |
| test_compile_clone_shallow[20-eager] | 58.8520μs | 18.6122μs | 53.7282 KOps/s | 54.3742 KOps/s | |
| test_compile_clone_shallow[20-compile] | 39.9200μs | 10.8501μs | 92.1649 KOps/s | 89.7936 KOps/s | |
| test_compile_clone_shallow[40-eager] | 69.0320μs | 32.8805μs | 30.4131 KOps/s | 30.6981 KOps/s | |
| test_compile_clone_shallow[40-compile] | 40.7010μs | 12.1000μs | 82.6446 KOps/s | 82.2381 KOps/s | |
| test_compile_clone_shallow[80-eager] | 96.6420μs | 61.9429μs | 16.1439 KOps/s | 16.4497 KOps/s | |
| test_compile_clone_shallow[80-compile] | 50.1710μs | 14.6803μs | 68.1184 KOps/s | 67.4496 KOps/s | |
| test_compile_update_inplace[eager] | 0.1045ms | 57.8300μs | 17.2921 KOps/s | 17.3509 KOps/s | |
| test_compile_update_inplace[compile] | 0.1896ms | 0.1354ms | 7.3872 KOps/s | 7.1918 KOps/s | |
| test_mod_add[eager] | 93.5420μs | 47.4959μs | 21.0544 KOps/s | 20.7287 KOps/s | |
| test_mod_add[compile] | 0.1569ms | 0.1020ms | 9.8038 KOps/s | 9.1179 KOps/s | |
| test_mod_add[compile-overhead] | 0.2541ms | 0.1460ms | 6.8513 KOps/s | 6.6825 KOps/s | |
| test_mod_wrap[eager] | 0.3559ms | 0.2838ms | 3.5240 KOps/s | 3.3969 KOps/s | |
| test_mod_wrap[compile] | 0.4017ms | 0.3399ms | 2.9417 KOps/s | 2.9073 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.2408ms | 3.9803ms | 251.2402 Ops/s | 248.7134 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6520ms | 1.4635ms | 683.2720 Ops/s | 659.7152 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.5053ms | 1.4172ms | 705.6383 Ops/s | 702.0643 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2463ms | 0.8654ms | 1.1555 KOps/s | 1.1289 KOps/s | |
| test_seq_add[eager] | 0.2142ms | 0.1560ms | 6.4091 KOps/s | 6.4102 KOps/s | |
| test_seq_add[compile] | 0.5438ms | 0.1115ms | 8.9684 KOps/s | 8.6403 KOps/s | |
| test_seq_add[compile-overhead] | 0.1953ms | 0.1509ms | 6.6279 KOps/s | 6.4133 KOps/s | |
| test_seq_wrap[eager] | 0.5923ms | 0.5235ms | 1.9102 KOps/s | 1.9489 KOps/s | |
| test_seq_wrap[compile] | 0.4722ms | 0.3733ms | 2.6785 KOps/s | 2.7610 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3763ms | 0.2570ms | 3.8913 KOps/s | 3.8327 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9040ms | 0.8082ms | 1.2374 KOps/s | 1.2070 KOps/s | |
| test_func_call_runtime[False-compile] | 0.9300ms | 0.8836ms | 1.1317 KOps/s | 1.1039 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.4870ms | 0.4431ms | 2.2567 KOps/s | 2.2417 KOps/s | |
| test_func_call_runtime[True-eager] | 1.1410ms | 1.0417ms | 959.9346 Ops/s | 940.9236 Ops/s | |
| test_func_call_runtime[True-compile] | 0.9705ms | 0.8895ms | 1.1243 KOps/s | 1.1109 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5042ms | 0.4565ms | 2.1907 KOps/s | 2.1678 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 1.5462ms | 0.8113ms | 1.2327 KOps/s | 1.2001 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.1085ms | 0.8823ms | 1.1334 KOps/s | 1.1226 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.4966ms | 0.4449ms | 2.2478 KOps/s | 2.2250 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.2856ms | 1.1809ms | 846.7858 Ops/s | 830.1097 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 0.9874ms | 0.9282ms | 1.0773 KOps/s | 1.0536 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5276ms | 0.4893ms | 2.0436 KOps/s | 2.0194 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8286ms | 2.3211ms | 430.8296 Ops/s | 428.5711 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0199ms | 0.9534ms | 1.0489 KOps/s | 1.0426 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5469ms | 0.4985ms | 2.0059 KOps/s | 2.0041 KOps/s | |
| test_distributed | 0.5721ms | 0.1511ms | 6.6186 KOps/s | 5.7106 KOps/s | |
| test_tdmodule | 0.4250ms | 27.2310μs | 36.7229 KOps/s | 37.5028 KOps/s | |
| test_tdmodule_dispatch | 73.5220μs | 44.2255μs | 22.6114 KOps/s | 22.8906 KOps/s | |
| test_tdseq | 46.2900μs | 26.2316μs | 38.1219 KOps/s | 38.1126 KOps/s | |
| test_tdseq_dispatch | 66.2820μs | 46.3619μs | 21.5695 KOps/s | 21.7397 KOps/s | |
| test_instantiation_functorch | 2.0758ms | 1.9834ms | 504.1743 Ops/s | 503.7004 Ops/s | |
| test_exec_functorch | 0.2286ms | 0.1733ms | 5.7690 KOps/s | 5.7484 KOps/s | |
| test_exec_functional_call | 0.2239ms | 0.1550ms | 6.4528 KOps/s | 6.4715 KOps/s | |
| test_exec_td_decorator | 0.4360ms | 0.2247ms | 4.4495 KOps/s | 4.3638 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 0.9777ms | 0.8018ms | 1.2471 KOps/s | 1.2273 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 0.9848ms | 0.7996ms | 1.2507 KOps/s | 1.2277 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.8945ms | 0.6921ms | 1.4449 KOps/s | 1.4310 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8729ms | 0.6909ms | 1.4475 KOps/s | 1.4257 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 20.2735ms | 20.1165ms | 49.7105 Ops/s | 49.5519 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 20.7495ms | 20.1821ms | 49.5489 Ops/s | 49.4994 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.6017ms | 19.9651ms | 50.0874 Ops/s | 50.0068 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.6242ms | 20.0127ms | 49.9682 Ops/s | 49.9142 Ops/s | |
| test_to_module_speed[True] | 1.9627ms | 1.3813ms | 723.9311 Ops/s | 714.4193 Ops/s | |
| test_to_module_speed[False] | 2.1921ms | 1.3663ms | 731.9232 Ops/s | 722.7401 Ops/s | |
| test_tc_init | 87.2420μs | 43.1205μs | 23.1908 KOps/s | 22.5554 KOps/s | |
| test_tc_init_tensor_only | 38.1310μs | 9.2560μs | 108.0383 KOps/s | 107.7774 KOps/s | |
| test_tc_init_nested | 0.3794ms | 85.8077μs | 11.6540 KOps/s | 11.6720 KOps/s | |
| test_tc_init_many_fields | 41.2910μs | 15.5260μs | 64.4082 KOps/s | 64.4895 KOps/s | |
| test_tc_first_layer_tensor | 24.3910μs | 1.7087μs | 585.2442 KOps/s | 587.5501 KOps/s | |
| test_tc_first_layer_tensor_only | 1.6695μs | 0.3842μs | 2.6025 MOps/s | 2.6049 MOps/s | |
| test_tc_first_layer_tensor_set | 28.0500μs | 3.6653μs | 272.8292 KOps/s | 273.0340 KOps/s | |
| test_tc_first_layer_tensor_only_set | 23.7100μs | 3.1102μs | 321.5250 KOps/s | 318.2200 KOps/s | |
| test_tc_first_layer_nontensor | 29.3500μs | 5.8894μs | 169.7978 KOps/s | 166.7028 KOps/s | |
| test_tc_second_layer_tensor | 27.2110μs | 4.1721μs | 239.6897 KOps/s | 246.1866 KOps/s | |
| test_tc_second_layer_nontensor | 34.5010μs | 8.2797μs | 120.7772 KOps/s | 119.9295 KOps/s | |
| test_unbind | 0.2633s | 17.1216ms | 58.4057 Ops/s | 57.7874 Ops/s | |
| test_full_like | 5.0753ms | 4.3787ms | 228.3776 Ops/s | 227.0288 Ops/s | |
| test_zeros_like | 4.9252ms | 4.3642ms | 229.1378 Ops/s | 228.9288 Ops/s | |
| test_ones_like | 5.0137ms | 4.3782ms | 228.4049 Ops/s | 229.1025 Ops/s | |
| test_clone | 6.9092ms | 6.5122ms | 153.5571 Ops/s | 153.4009 Ops/s | |
| test_squeeze | 0.1803ms | 13.5758μs | 73.6604 KOps/s | 73.6824 KOps/s | |
| test_unsqueeze | 0.2799ms | 0.1092ms | 9.1575 KOps/s | 9.1813 KOps/s | |
| test_split | 0.2498ms | 0.1791ms | 5.5830 KOps/s | 5.5412 KOps/s | |
| test_permute | 0.2597ms | 0.2067ms | 4.8387 KOps/s | 4.9896 KOps/s | |
| test_stack | 44.3271ms | 43.1766ms | 23.1607 Ops/s | 23.1353 Ops/s | |
| test_cat | 43.4310ms | 43.1665ms | 23.1661 Ops/s | 23.1944 Ops/s | |
| test_sequential_tensordict | 0.3209ms | 0.2127ms | 4.7010 KOps/s | 4.7183 KOps/s | |
| test_sequential_graph_module | 0.1664ms | 0.1166ms | 8.5776 KOps/s | 8.7058 KOps/s | |
| test_nested_tensordict | 0.3800ms | 0.2735ms | 3.6569 KOps/s | 3.6183 KOps/s | |
| test_nested_graph_module | 0.1727ms | 0.1257ms | 7.9527 KOps/s | 8.0026 KOps/s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Add tensordict/_dtensor.py with:
minimal set of P2P transfers (which byte ranges go from which src rank
to which dst rank)
_TorchDistributedBackend and _UCXXBackend implementations
The transfer plan is pure computation (no GPU, no distributed runtime)
and can be tested in isolation. It supports Shard, Replicate, and Partial
placements on arbitrary n-D meshes, uneven sharding, and custom rank maps.
All new classes are private (underscore-prefixed).
Made-with: Cursor