[DTensor] Add Strategy B (local-shard transfer + redistribute on receiver)#1641
Open
vmoens wants to merge 1 commit intogh/vmoens/82/basefrom
Open
[DTensor] Add Strategy B (local-shard transfer + redistribute on receiver)#1641vmoens wants to merge 1 commit intogh/vmoens/82/basefrom
vmoens wants to merge 1 commit intogh/vmoens/82/basefrom
Conversation
Contributor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
This was referenced Mar 6, 2026
Contributor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 32.1810μs | 15.1468μs | 66.0206 KOps/s | 66.9536 KOps/s | |
| test_plain_set_stack_nested | 42.3010μs | 15.3918μs | 64.9696 KOps/s | 65.5792 KOps/s | |
| test_plain_set_nested_inplace | 44.7110μs | 16.9114μs | 59.1316 KOps/s | 59.2037 KOps/s | |
| test_plain_set_stack_nested_inplace | 58.8020μs | 16.8795μs | 59.2433 KOps/s | 59.6818 KOps/s | |
| test_items | 39.0500μs | 6.1571μs | 162.4146 KOps/s | 167.5811 KOps/s | |
| test_items_nested | 0.5356ms | 0.4720ms | 2.1185 KOps/s | 2.1215 KOps/s | |
| test_items_nested_locked | 0.5183ms | 0.4754ms | 2.1036 KOps/s | 2.1158 KOps/s | |
| test_items_nested_leaf | 0.1278ms | 99.2144μs | 10.0792 KOps/s | 10.1834 KOps/s | |
| test_items_stack_nested | 0.5347ms | 0.4698ms | 2.1284 KOps/s | 2.1366 KOps/s | |
| test_items_stack_nested_leaf | 0.1346ms | 98.7403μs | 10.1276 KOps/s | 10.1144 KOps/s | |
| test_items_stack_nested_locked | 0.5065ms | 0.4765ms | 2.0986 KOps/s | 2.1011 KOps/s | |
| test_keys | 27.2800μs | 4.2660μs | 234.4097 KOps/s | 236.4689 KOps/s | |
| test_keys_nested | 0.1980ms | 0.1320ms | 7.5765 KOps/s | 7.5985 KOps/s | |
| test_keys_nested_locked | 1.9953ms | 0.1416ms | 7.0636 KOps/s | 7.1051 KOps/s | |
| test_keys_nested_leaf | 0.1839ms | 0.1221ms | 8.1902 KOps/s | 8.2239 KOps/s | |
| test_keys_stack_nested | 0.1670ms | 0.1312ms | 7.6229 KOps/s | 7.6003 KOps/s | |
| test_keys_stack_nested_leaf | 0.1499ms | 0.1221ms | 8.1922 KOps/s | 8.1470 KOps/s | |
| test_keys_stack_nested_locked | 0.1688ms | 0.1405ms | 7.1153 KOps/s | 7.1142 KOps/s | |
| test_values | 5.8280μs | 1.0374μs | 963.9323 KOps/s | 970.7805 KOps/s | |
| test_values_nested | 83.9210μs | 54.1491μs | 18.4675 KOps/s | 18.6893 KOps/s | |
| test_values_nested_locked | 80.0620μs | 57.2992μs | 17.4523 KOps/s | 17.6347 KOps/s | |
| test_values_nested_leaf | 85.8010μs | 61.1878μs | 16.3431 KOps/s | 16.4695 KOps/s | |
| test_values_stack_nested | 93.8110μs | 53.8462μs | 18.5714 KOps/s | 18.7211 KOps/s | |
| test_values_stack_nested_leaf | 0.1419ms | 61.5948μs | 16.2351 KOps/s | 16.3494 KOps/s | |
| test_values_stack_nested_locked | 96.5720μs | 57.3054μs | 17.4504 KOps/s | 17.5155 KOps/s | |
| test_membership | 6.3635μs | 0.8662μs | 1.1544 MOps/s | 1.1847 MOps/s | |
| test_membership_nested | 32.4810μs | 2.9825μs | 335.2856 KOps/s | 343.8662 KOps/s | |
| test_membership_nested_leaf | 66.5720μs | 2.9871μs | 334.7706 KOps/s | 343.1503 KOps/s | |
| test_membership_stacked_nested | 34.6400μs | 2.9246μs | 341.9248 KOps/s | 341.6643 KOps/s | |
| test_membership_stacked_nested_leaf | 21.3810μs | 2.9498μs | 339.0114 KOps/s | 344.3124 KOps/s | |
| test_membership_nested_last | 40.3510μs | 4.4311μs | 225.6768 KOps/s | 227.0283 KOps/s | |
| test_membership_nested_leaf_last | 35.4610μs | 4.4412μs | 225.1644 KOps/s | 229.0837 KOps/s | |
| test_membership_stacked_nested_last | 43.2110μs | 4.4026μs | 227.1365 KOps/s | 229.2578 KOps/s | |
| test_membership_stacked_nested_leaf_last | 26.4600μs | 4.3903μs | 227.7729 KOps/s | 231.3810 KOps/s | |
| test_nested_getleaf | 56.5110μs | 21.7565μs | 45.9633 KOps/s | 46.9593 KOps/s | |
| test_nested_get | 46.7210μs | 20.7477μs | 48.1981 KOps/s | 48.0656 KOps/s | |
| test_stacked_getleaf | 54.2710μs | 21.5849μs | 46.3287 KOps/s | 46.7389 KOps/s | |
| test_stacked_get | 54.4910μs | 20.7693μs | 48.1480 KOps/s | 48.8391 KOps/s | |
| test_nested_getitemleaf | 59.1720μs | 22.4359μs | 44.5715 KOps/s | 45.7240 KOps/s | |
| test_nested_getitem | 49.9310μs | 20.9883μs | 47.6455 KOps/s | 48.2107 KOps/s | |
| test_stacked_getitemleaf | 48.7710μs | 22.2896μs | 44.8641 KOps/s | 45.4545 KOps/s | |
| test_stacked_getitem | 52.4020μs | 21.0130μs | 47.5895 KOps/s | 48.7947 KOps/s | |
| test_lock_nested | 0.5463ms | 0.4827ms | 2.0716 KOps/s | 2.0980 KOps/s | |
| test_lock_stack_nested | 0.5796ms | 0.4838ms | 2.0669 KOps/s | 2.0649 KOps/s | |
| test_unlock_nested | 0.4572ms | 0.3930ms | 2.5443 KOps/s | 2.5637 KOps/s | |
| test_unlock_stack_nested | 0.4292ms | 0.3917ms | 2.5528 KOps/s | 2.5376 KOps/s | |
| test_flatten_speed | 0.1531ms | 0.1232ms | 8.1190 KOps/s | 8.1090 KOps/s | |
| test_unflatten_speed | 0.6326ms | 0.5855ms | 1.7080 KOps/s | 1.7285 KOps/s | |
| test_common_ops | 0.8364ms | 0.7028ms | 1.4228 KOps/s | 1.4201 KOps/s | |
| test_creation | 92.7220μs | 3.0396μs | 328.9925 KOps/s | 319.0697 KOps/s | |
| test_creation_empty | 27.8100μs | 6.9935μs | 142.9908 KOps/s | 144.0987 KOps/s | |
| test_creation_nested_1 | 34.5510μs | 11.6727μs | 85.6699 KOps/s | 86.5848 KOps/s | |
| test_creation_nested_2 | 38.8000μs | 13.3228μs | 75.0594 KOps/s | 74.6021 KOps/s | |
| test_creation_many_keys[10] | 50.7310μs | 21.0395μs | 47.5297 KOps/s | 48.0968 KOps/s | |
| test_creation_many_keys[50] | 0.1209ms | 90.2223μs | 11.0837 KOps/s | 11.1251 KOps/s | |
| test_creation_many_keys[100] | 0.2061ms | 0.1750ms | 5.7149 KOps/s | 5.6420 KOps/s | |
| test_creation_nested_many_keys[10] | 84.0720μs | 44.9793μs | 22.2325 KOps/s | 22.3232 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2261ms | 0.1834ms | 5.4528 KOps/s | 5.4002 KOps/s | |
| test_clone | 43.7610μs | 13.5151μs | 73.9914 KOps/s | 74.1943 KOps/s | |
| test_getitem[int] | 1.5612ms | 15.0222μs | 66.5680 KOps/s | 59.3741 KOps/s | |
| test_getitem[slice_int] | 0.1327ms | 24.1744μs | 41.3660 KOps/s | 41.1236 KOps/s | |
| test_getitem[range] | 0.1753ms | 64.0474μs | 15.6134 KOps/s | 15.6341 KOps/s | |
| test_getitem[tuple] | 0.1383ms | 23.7951μs | 42.0254 KOps/s | 41.4073 KOps/s | |
| test_getitem[list] | 0.1849ms | 59.9424μs | 16.6827 KOps/s | 17.0034 KOps/s | |
| test_setitem_dim[int] | 48.2710μs | 26.0197μs | 38.4324 KOps/s | 38.0077 KOps/s | |
| test_setitem_dim[slice_int] | 65.5510μs | 43.0457μs | 23.2311 KOps/s | 22.7716 KOps/s | |
| test_setitem_dim[range] | 0.1222ms | 95.8264μs | 10.4355 KOps/s | 10.5173 KOps/s | |
| test_setitem_dim[tuple] | 62.9110μs | 39.7554μs | 25.1538 KOps/s | 24.2820 KOps/s | |
| test_setitem | 49.3610μs | 17.8660μs | 55.9723 KOps/s | 55.7887 KOps/s | |
| test_set | 43.8210μs | 17.0892μs | 58.5164 KOps/s | 58.2053 KOps/s | |
| test_set_shared | 0.5002ms | 0.2051ms | 4.8745 KOps/s | 4.9014 KOps/s | |
| test_update | 0.3330ms | 21.7580μs | 45.9602 KOps/s | 45.9180 KOps/s | |
| test_update_nested | 66.3510μs | 33.5549μs | 29.8019 KOps/s | 29.6310 KOps/s | |
| test_update__nested | 0.4519ms | 34.3032μs | 29.1518 KOps/s | 28.7383 KOps/s | |
| test_set_nested | 49.4410μs | 18.9229μs | 52.8459 KOps/s | 52.6048 KOps/s | |
| test_set_nested_new | 63.8120μs | 24.2971μs | 41.1572 KOps/s | 41.4499 KOps/s | |
| test_select | 78.9120μs | 41.3308μs | 24.1951 KOps/s | 24.2355 KOps/s | |
| test_select_nested | 0.1113ms | 74.4233μs | 13.4367 KOps/s | 13.2320 KOps/s | |
| test_exclude_nested | 0.1283ms | 92.7921μs | 10.7768 KOps/s | 10.7313 KOps/s | |
| test_empty[True] | 0.4570ms | 0.4023ms | 2.4855 KOps/s | 2.4852 KOps/s | |
| test_empty[False] | 9.5377μs | 1.3160μs | 759.8712 KOps/s | 758.8738 KOps/s | |
| test_to | 0.1057ms | 71.4464μs | 13.9965 KOps/s | 13.5615 KOps/s | |
| test_to_nonblocking | 0.1039ms | 64.8198μs | 15.4274 KOps/s | 15.3954 KOps/s | |
| test_unbind_speed | 0.3708ms | 0.3330ms | 3.0026 KOps/s | 2.9906 KOps/s | |
| test_unbind_speed_stack0 | 0.4152ms | 0.3331ms | 3.0024 KOps/s | 3.0098 KOps/s | |
| test_unbind_speed_stack1 | 0.1038s | 0.8458ms | 1.1823 KOps/s | 1.1741 KOps/s | |
| test_split | 0.1033s | 1.2698ms | 787.5393 Ops/s | 782.3303 Ops/s | |
| test_chunk | 0.1036s | 1.2154ms | 822.7732 Ops/s | 922.4359 Ops/s | |
| test_to_cpu_blocking | 19.4731ms | 19.3896ms | 51.5740 Ops/s | 35.1867 Ops/s | |
| test_to_cpu_global_sync | 11.5389ms | 11.4543ms | 87.3032 Ops/s | 78.9137 Ops/s | |
| test_to_cpu_event_sync | 12.7311ms | 12.4365ms | 80.4082 Ops/s | 80.6696 Ops/s | |
| test_to_cpu_default | 0.1155s | 13.7389ms | 72.7861 Ops/s | 80.6910 Ops/s | |
| test_consolidate[False-None] | 4.2299ms | 4.1610ms | 240.3281 Ops/s | 216.8407 Ops/s | |
| test_consolidate[default-None] | 2.1527ms | 2.0072ms | 498.2046 Ops/s | 487.1938 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0092ms | 1.9125ms | 522.8664 Ops/s | 507.9723 Ops/s | |
| test_consolidate_njt[False-None] | 8.8629ms | 8.6190ms | 116.0224 Ops/s | 117.6604 Ops/s | |
| test_to[False-False-None] | 2.2964ms | 2.1104ms | 473.8402 Ops/s | 470.6205 Ops/s | |
| test_to[True-False-None] | 2.2513ms | 1.9376ms | 516.1085 Ops/s | 511.7222 Ops/s | |
| test_to[within-False-None] | 6.2786ms | 6.1682ms | 162.1229 Ops/s | 162.6628 Ops/s | |
| test_to[True-default-None] | 9.3031ms | 8.9433ms | 111.8152 Ops/s | 110.5601 Ops/s | |
| test_to_njt[False-False-None] | 8.6230ms | 8.5094ms | 117.5173 Ops/s | 117.8506 Ops/s | |
| test_to_njt[True-False-None] | 7.3353ms | 6.9771ms | 143.3266 Ops/s | 143.6131 Ops/s | |
| test_to_njt[within-False-None] | 15.9085ms | 15.6804ms | 63.7737 Ops/s | 64.1588 Ops/s | |
| test_creation[device0] | 0.3839ms | 0.1139ms | 8.7815 KOps/s | 8.7002 KOps/s | |
| test_creation_from_tensor | 0.4129ms | 0.1115ms | 8.9667 KOps/s | 8.5047 KOps/s | |
| test_add_one[memmap_tensor0] | 0.2939ms | 6.6013μs | 151.4864 KOps/s | 150.9784 KOps/s | |
| test_contiguous[memmap_tensor0] | 26.3200μs | 0.6761μs | 1.4791 MOps/s | 1.9012 MOps/s | |
| test_stack[memmap_tensor0] | 33.3110μs | 4.6485μs | 215.1223 KOps/s | 220.8156 KOps/s | |
| test_memmaptd_index | 1.1468ms | 0.2720ms | 3.6771 KOps/s | 3.7740 KOps/s | |
| test_memmaptd_index_astensor | 0.5318ms | 0.3775ms | 2.6489 KOps/s | 2.7067 KOps/s | |
| test_memmaptd_index_op | 0.8914ms | 0.6246ms | 1.6009 KOps/s | 1.6053 KOps/s | |
| test_serialize_model | 0.3161s | 0.1657s | 6.0333 Ops/s | 7.3102 Ops/s | |
| test_serialize_model_pickle | 2.1095s | 1.3626s | 0.7339 Ops/s | 0.8254 Ops/s | |
| test_serialize_weights | 0.1391s | 0.1368s | 7.3085 Ops/s | 7.3401 Ops/s | |
| test_serialize_weights_returnearly | 0.4557s | 93.6088ms | 10.6828 Ops/s | 10.3950 Ops/s | |
| test_serialize_weights_pickle | 1.3775s | 1.2180s | 0.8210 Ops/s | 0.8186 Ops/s | |
| test_reshape_pytree | 0.2076ms | 33.0853μs | 30.2249 KOps/s | 30.7492 KOps/s | |
| test_reshape_td | 0.2161ms | 45.9692μs | 21.7537 KOps/s | 22.2743 KOps/s | |
| test_view_pytree | 0.2207ms | 32.7049μs | 30.5764 KOps/s | 30.8383 KOps/s | |
| test_view_td | 93.1220μs | 53.6069μs | 18.6543 KOps/s | 18.0148 KOps/s | |
| test_unbind_pytree | 0.2440ms | 36.7841μs | 27.1856 KOps/s | 27.6223 KOps/s | |
| test_unbind_td | 0.1026ms | 50.3707μs | 19.8528 KOps/s | 19.8542 KOps/s | |
| test_split_pytree | 0.2494ms | 42.7075μs | 23.4151 KOps/s | 23.3278 KOps/s | |
| test_split_td | 0.1764ms | 64.2355μs | 15.5677 KOps/s | 15.3542 KOps/s | |
| test_add_pytree | 0.2346ms | 44.7264μs | 22.3582 KOps/s | 23.4353 KOps/s | |
| test_add_td | 97.5120μs | 59.8405μs | 16.7111 KOps/s | 17.6940 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2298ms | 0.1456ms | 6.8683 KOps/s | 6.6940 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.3070ms | 0.2037ms | 4.9100 KOps/s | 5.0094 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1809ms | 0.1085ms | 9.2195 KOps/s | 9.1408 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4350ms | 0.1806ms | 5.5378 KOps/s | 5.5728 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.5438ms | 10.7770μs | 92.7906 KOps/s | 97.0721 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.1389ms | 54.8017μs | 18.2476 KOps/s | 18.5604 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1468ms | 9.7741μs | 102.3110 KOps/s | 102.5887 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4648ms | 70.2915μs | 14.2265 KOps/s | 14.6184 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2396ms | 0.1748ms | 5.7217 KOps/s | 5.4352 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.4062ms | 0.2825ms | 3.5401 KOps/s | 3.5559 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.2572ms | 0.1163ms | 8.5982 KOps/s | 8.3570 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1186ms | 73.0184μs | 13.6952 KOps/s | 13.3137 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2058ms | 0.1570ms | 6.3705 KOps/s | 6.1541 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8704ms | 0.5222ms | 1.9149 KOps/s | 1.8938 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.5103ms | 0.3364ms | 2.9724 KOps/s | 2.9605 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.3392ms | 0.1784ms | 5.6057 KOps/s | 5.2271 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.5512ms | 91.3722μs | 10.9442 KOps/s | 11.0511 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.6031ms | 0.1180ms | 8.4780 KOps/s | 8.0530 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.9203ms | 0.4292ms | 2.3300 KOps/s | 2.2828 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.2153ms | 0.1637ms | 6.1095 KOps/s | 6.1582 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 76.2110μs | 13.6038μs | 73.5087 KOps/s | 75.0175 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 79.3510μs | 42.2465μs | 23.6706 KOps/s | 23.6349 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 89.9720μs | 10.8265μs | 92.3661 KOps/s | 91.2390 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4120ms | 52.5714μs | 19.0217 KOps/s | 18.9061 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0444ms | 0.1744ms | 5.7353 KOps/s | 5.4847 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.7626ms | 3.3156ms | 301.6008 Ops/s | 298.7962 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 2.0568ms | 0.1637ms | 6.1105 KOps/s | 6.0612 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 3.0112ms | 2.7957ms | 357.6930 Ops/s | 358.2589 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.2316ms | 0.1083ms | 9.2368 KOps/s | 8.7653 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.5292ms | 73.8997μs | 13.5318 KOps/s | 13.4443 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1357ms | 95.3919μs | 10.4831 KOps/s | 10.2594 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2564ms | 44.5166μs | 22.4635 KOps/s | 22.4871 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.2646ms | 96.0234μs | 10.4141 KOps/s | 10.2069 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2468ms | 44.2655μs | 22.5909 KOps/s | 22.3824 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.2152ms | 56.6530μs | 17.6513 KOps/s | 17.3836 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2217ms | 27.5752μs | 36.2645 KOps/s | 35.3967 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1529ms | 44.4304μs | 22.5071 KOps/s | 22.0552 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2389ms | 22.6116μs | 44.2251 KOps/s | 44.3341 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 83.3710μs | 45.6824μs | 21.8903 KOps/s | 21.7985 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2678ms | 22.5748μs | 44.2971 KOps/s | 44.3017 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 96.6020μs | 57.6537μs | 17.3450 KOps/s | 17.2775 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2404ms | 27.5627μs | 36.2809 KOps/s | 36.0228 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 87.0420μs | 45.3031μs | 22.0736 KOps/s | 22.0020 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2612ms | 22.7525μs | 43.9513 KOps/s | 44.4379 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 76.8710μs | 45.5995μs | 21.9301 KOps/s | 21.8727 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.1950ms | 22.4864μs | 44.4713 KOps/s | 44.4217 KOps/s | |
| test_compile_replace[single-eager] | 89.6120μs | 47.0910μs | 21.2355 KOps/s | 20.9408 KOps/s | |
| test_compile_replace[single-compile] | 0.1845ms | 0.1045ms | 9.5686 KOps/s | 9.3573 KOps/s | |
| test_compile_replace[multi-eager] | 0.6523ms | 0.5659ms | 1.7670 KOps/s | 1.7610 KOps/s | |
| test_compile_replace[multi-compile] | 0.2462ms | 0.1150ms | 8.6924 KOps/s | 8.7640 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2114ms | 0.1658ms | 6.0321 KOps/s | 6.0138 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.4606ms | 0.1183ms | 8.4502 KOps/s | 8.3246 KOps/s | |
| test_compile_clone_shallow[20-eager] | 40.8600μs | 19.9150μs | 50.2134 KOps/s | 51.2542 KOps/s | |
| test_compile_clone_shallow[20-compile] | 0.1131ms | 11.5656μs | 86.4629 KOps/s | 88.0743 KOps/s | |
| test_compile_clone_shallow[40-eager] | 55.3110μs | 34.9478μs | 28.6141 KOps/s | 29.1445 KOps/s | |
| test_compile_clone_shallow[40-compile] | 67.4310μs | 12.9507μs | 77.2158 KOps/s | 65.5660 KOps/s | |
| test_compile_clone_shallow[80-eager] | 93.1620μs | 64.7994μs | 15.4322 KOps/s | 15.8322 KOps/s | |
| test_compile_clone_shallow[80-compile] | 0.1338ms | 15.1225μs | 66.1264 KOps/s | 67.2489 KOps/s | |
| test_compile_update_inplace[eager] | 95.8420μs | 60.4709μs | 16.5369 KOps/s | 16.6286 KOps/s | |
| test_compile_update_inplace[compile] | 0.2629ms | 0.1387ms | 7.2076 KOps/s | 7.0666 KOps/s | |
| test_mod_add[eager] | 91.3710μs | 48.7276μs | 20.5222 KOps/s | 20.3061 KOps/s | |
| test_mod_add[compile] | 0.2022ms | 0.1033ms | 9.6842 KOps/s | 9.2906 KOps/s | |
| test_mod_add[compile-overhead] | 0.3413ms | 0.1493ms | 6.6989 KOps/s | 6.5525 KOps/s | |
| test_mod_wrap[eager] | 0.3880ms | 0.2872ms | 3.4825 KOps/s | 3.4268 KOps/s | |
| test_mod_wrap[compile] | 0.4396ms | 0.3537ms | 2.8274 KOps/s | 2.7086 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.5076ms | 4.0493ms | 246.9581 Ops/s | 250.2105 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6183ms | 1.4827ms | 674.4373 Ops/s | 669.6368 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.6568ms | 1.4443ms | 692.3622 Ops/s | 688.1573 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2718ms | 0.8830ms | 1.1326 KOps/s | 1.1043 KOps/s | |
| test_seq_add[eager] | 0.2408ms | 0.1602ms | 6.2418 KOps/s | 6.4741 KOps/s | |
| test_seq_add[compile] | 0.5721ms | 0.1135ms | 8.8126 KOps/s | 8.4787 KOps/s | |
| test_seq_add[compile-overhead] | 0.2295ms | 0.1593ms | 6.2758 KOps/s | 6.3060 KOps/s | |
| test_seq_wrap[eager] | 0.6669ms | 0.5364ms | 1.8644 KOps/s | 1.9247 KOps/s | |
| test_seq_wrap[compile] | 0.4691ms | 0.3648ms | 2.7409 KOps/s | 2.7263 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3259ms | 0.2657ms | 3.7630 KOps/s | 3.6363 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9708ms | 0.8361ms | 1.1961 KOps/s | 1.1656 KOps/s | |
| test_func_call_runtime[False-compile] | 1.1015ms | 0.9067ms | 1.1029 KOps/s | 1.0932 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5274ms | 0.4650ms | 2.1505 KOps/s | 2.1470 KOps/s | |
| test_func_call_runtime[True-eager] | 1.2911ms | 1.0692ms | 935.2462 Ops/s | 932.8066 Ops/s | |
| test_func_call_runtime[True-compile] | 1.0133ms | 0.9228ms | 1.0837 KOps/s | 1.0442 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5513ms | 0.4758ms | 2.1016 KOps/s | 2.0688 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 1.0179ms | 0.8808ms | 1.1354 KOps/s | 1.1948 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.0342ms | 0.9261ms | 1.0798 KOps/s | 1.0873 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5199ms | 0.4644ms | 2.1532 KOps/s | 2.1385 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3037ms | 1.2114ms | 825.5106 Ops/s | 811.2490 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.3048ms | 0.9904ms | 1.0097 KOps/s | 1.0356 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5713ms | 0.5107ms | 1.9581 KOps/s | 1.9458 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8511ms | 2.3599ms | 423.7469 Ops/s | 421.5673 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.1068ms | 0.9785ms | 1.0220 KOps/s | 1.0099 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.6429ms | 0.5161ms | 1.9376 KOps/s | 1.9212 KOps/s | |
| test_distributed | 0.6426ms | 0.1524ms | 6.5618 KOps/s | 6.4797 KOps/s | |
| test_tdmodule | 0.5126ms | 29.5166μs | 33.8792 KOps/s | 35.6232 KOps/s | |
| test_tdmodule_dispatch | 83.2020μs | 48.2500μs | 20.7254 KOps/s | 21.7412 KOps/s | |
| test_tdseq | 42.0910μs | 29.0220μs | 34.4567 KOps/s | 36.9620 KOps/s | |
| test_tdseq_dispatch | 71.2420μs | 50.8562μs | 19.6633 KOps/s | 20.6630 KOps/s | |
| test_instantiation_functorch | 2.1724ms | 2.0915ms | 478.1368 Ops/s | 474.1385 Ops/s | |
| test_exec_functorch | 0.3045ms | 0.1793ms | 5.5758 KOps/s | 5.5400 KOps/s | |
| test_exec_functional_call | 0.6110ms | 0.1617ms | 6.1851 KOps/s | 6.2255 KOps/s | |
| test_exec_td_decorator | 0.4468ms | 0.2398ms | 4.1704 KOps/s | 4.2321 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.2508ms | 0.8288ms | 1.2065 KOps/s | 1.2046 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.2691ms | 0.8228ms | 1.2154 KOps/s | 1.2000 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 1.1922ms | 0.7082ms | 1.4120 KOps/s | 1.3905 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 1.1442ms | 0.7079ms | 1.4126 KOps/s | 1.3877 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.0638ms | 20.3833ms | 49.0597 Ops/s | 48.6559 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.0182ms | 20.4122ms | 48.9904 Ops/s | 48.7575 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.5852ms | 20.1750ms | 49.5663 Ops/s | 49.1742 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.6188ms | 20.2062ms | 49.4897 Ops/s | 49.3036 Ops/s | |
| test_to_module_speed[True] | 1.9559ms | 1.4829ms | 674.3381 Ops/s | 666.6685 Ops/s | |
| test_to_module_speed[False] | 1.9000ms | 1.4656ms | 682.3000 Ops/s | 677.0204 Ops/s | |
| test_tc_init | 78.4720μs | 44.1731μs | 22.6382 KOps/s | 21.6726 KOps/s | |
| test_tc_init_tensor_only | 37.6610μs | 9.8583μs | 101.4375 KOps/s | 101.1959 KOps/s | |
| test_tc_init_nested | 0.1322ms | 88.9981μs | 11.2362 KOps/s | 11.0806 KOps/s | |
| test_tc_init_many_fields | 44.3810μs | 16.5169μs | 60.5440 KOps/s | 60.6681 KOps/s | |
| test_tc_first_layer_tensor | 26.2910μs | 1.7887μs | 559.0527 KOps/s | 547.0724 KOps/s | |
| test_tc_first_layer_tensor_only | 1.6551μs | 0.3980μs | 2.5123 MOps/s | 2.5551 MOps/s | |
| test_tc_first_layer_tensor_set | 38.2400μs | 3.9215μs | 255.0037 KOps/s | 256.8133 KOps/s | |
| test_tc_first_layer_tensor_only_set | 29.4000μs | 3.2961μs | 303.3904 KOps/s | 308.6595 KOps/s | |
| test_tc_first_layer_nontensor | 70.2310μs | 6.1509μs | 162.5782 KOps/s | 162.6540 KOps/s | |
| test_tc_second_layer_tensor | 39.1210μs | 4.3450μs | 230.1505 KOps/s | 229.2418 KOps/s | |
| test_tc_second_layer_nontensor | 36.3710μs | 8.6864μs | 115.1230 KOps/s | 115.0659 KOps/s | |
| test_unbind | 0.2489s | 14.2096ms | 70.3751 Ops/s | 65.2251 Ops/s | |
| test_full_like | 7.4904ms | 4.4217ms | 226.1598 Ops/s | 135.7072 Ops/s | |
| test_zeros_like | 4.9139ms | 4.3789ms | 228.3656 Ops/s | 227.7485 Ops/s | |
| test_ones_like | 5.0021ms | 4.3989ms | 227.3313 Ops/s | 227.4915 Ops/s | |
| test_clone | 6.8466ms | 6.5937ms | 151.6608 Ops/s | 151.8710 Ops/s | |
| test_squeeze | 0.1593ms | 14.5028μs | 68.9521 KOps/s | 66.9965 KOps/s | |
| test_unsqueeze | 0.2695ms | 0.1120ms | 8.9259 KOps/s | 8.5408 KOps/s | |
| test_split | 0.2919ms | 0.1861ms | 5.3748 KOps/s | 5.0122 KOps/s | |
| test_permute | 0.6461ms | 0.2048ms | 4.8820 KOps/s | 4.7573 KOps/s | |
| test_stack | 52.5094ms | 51.6754ms | 19.3516 Ops/s | 19.3453 Ops/s | |
| test_cat | 52.1261ms | 51.5872ms | 19.3846 Ops/s | 19.3896 Ops/s | |
| test_sequential_tensordict | 0.2946ms | 0.2239ms | 4.4658 KOps/s | 4.5153 KOps/s | |
| test_sequential_graph_module | 0.2683ms | 0.1208ms | 8.2798 KOps/s | 8.1561 KOps/s | |
| test_nested_tensordict | 0.4667ms | 0.3001ms | 3.3321 KOps/s | 3.4455 KOps/s | |
| test_nested_graph_module | 0.1942ms | 0.1361ms | 7.3455 KOps/s | 7.7342 KOps/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 70.0210μs | 14.9025μs | 67.1029 KOps/s | 67.4927 KOps/s | |
| test_plain_set_stack_nested | 35.7600μs | 15.3172μs | 65.2860 KOps/s | 65.5820 KOps/s | |
| test_plain_set_nested_inplace | 42.9400μs | 16.7802μs | 59.5942 KOps/s | 59.9319 KOps/s | |
| test_plain_set_stack_nested_inplace | 49.2200μs | 16.6127μs | 60.1948 KOps/s | 59.7499 KOps/s | |
| test_items | 36.6400μs | 5.9482μs | 168.1174 KOps/s | 164.9846 KOps/s | |
| test_items_nested | 0.5273ms | 0.4653ms | 2.1492 KOps/s | 2.1365 KOps/s | |
| test_items_nested_locked | 0.5309ms | 0.4690ms | 2.1324 KOps/s | 2.1475 KOps/s | |
| test_items_nested_leaf | 0.1198ms | 97.6912μs | 10.2363 KOps/s | 10.1974 KOps/s | |
| test_items_stack_nested | 0.5011ms | 0.4633ms | 2.1585 KOps/s | 2.1546 KOps/s | |
| test_items_stack_nested_leaf | 0.1353ms | 99.2045μs | 10.0802 KOps/s | 10.1370 KOps/s | |
| test_items_stack_nested_locked | 0.5471ms | 0.4699ms | 2.1283 KOps/s | 2.1427 KOps/s | |
| test_keys | 29.4000μs | 4.2232μs | 236.7860 KOps/s | 236.3948 KOps/s | |
| test_keys_nested | 0.1645ms | 0.1299ms | 7.6958 KOps/s | 7.7237 KOps/s | |
| test_keys_nested_locked | 1.9065ms | 0.1405ms | 7.1157 KOps/s | 7.2193 KOps/s | |
| test_keys_nested_leaf | 0.1595ms | 0.1214ms | 8.2347 KOps/s | 8.3253 KOps/s | |
| test_keys_stack_nested | 0.1953ms | 0.1306ms | 7.6550 KOps/s | 7.6775 KOps/s | |
| test_keys_stack_nested_leaf | 0.1549ms | 0.1211ms | 8.2568 KOps/s | 8.3460 KOps/s | |
| test_keys_stack_nested_locked | 0.1754ms | 0.1384ms | 7.2253 KOps/s | 7.2502 KOps/s | |
| test_values | 3.7371μs | 0.9972μs | 1.0028 MOps/s | 976.3076 KOps/s | |
| test_values_nested | 88.9110μs | 52.8237μs | 18.9309 KOps/s | 19.1762 KOps/s | |
| test_values_nested_locked | 82.3510μs | 56.4141μs | 17.7261 KOps/s | 17.9614 KOps/s | |
| test_values_nested_leaf | 87.0210μs | 60.5944μs | 16.5032 KOps/s | 16.7238 KOps/s | |
| test_values_stack_nested | 84.9110μs | 53.1956μs | 18.7985 KOps/s | 19.0615 KOps/s | |
| test_values_stack_nested_leaf | 92.9910μs | 60.4435μs | 16.5444 KOps/s | 16.7065 KOps/s | |
| test_values_stack_nested_locked | 0.1267ms | 56.4845μs | 17.7040 KOps/s | 18.0458 KOps/s | |
| test_membership | 5.3500μs | 0.8638μs | 1.1577 MOps/s | 1.1816 MOps/s | |
| test_membership_nested | 32.9900μs | 2.9360μs | 340.6020 KOps/s | 347.6101 KOps/s | |
| test_membership_nested_leaf | 30.2800μs | 2.9432μs | 339.7703 KOps/s | 350.1143 KOps/s | |
| test_membership_stacked_nested | 39.6500μs | 2.9388μs | 340.2754 KOps/s | 346.6530 KOps/s | |
| test_membership_stacked_nested_leaf | 31.5010μs | 2.9112μs | 343.4991 KOps/s | 347.2106 KOps/s | |
| test_membership_nested_last | 34.7700μs | 4.4077μs | 226.8756 KOps/s | 231.3285 KOps/s | |
| test_membership_nested_leaf_last | 39.6810μs | 4.3454μs | 230.1261 KOps/s | 229.0440 KOps/s | |
| test_membership_stacked_nested_last | 35.4610μs | 4.4229μs | 226.0983 KOps/s | 229.5250 KOps/s | |
| test_membership_stacked_nested_leaf_last | 40.9700μs | 4.3812μs | 228.2475 KOps/s | 229.9962 KOps/s | |
| test_nested_getleaf | 48.5410μs | 21.9044μs | 45.6530 KOps/s | 46.1097 KOps/s | |
| test_nested_get | 71.6010μs | 20.6680μs | 48.3839 KOps/s | 48.9018 KOps/s | |
| test_stacked_getleaf | 51.0000μs | 21.6846μs | 46.1157 KOps/s | 46.2680 KOps/s | |
| test_stacked_get | 51.6800μs | 20.4729μs | 48.8451 KOps/s | 48.7964 KOps/s | |
| test_nested_getitemleaf | 53.3500μs | 22.1884μs | 45.0685 KOps/s | 45.5723 KOps/s | |
| test_nested_getitem | 48.5310μs | 20.9507μs | 47.7310 KOps/s | 47.9075 KOps/s | |
| test_stacked_getitemleaf | 41.2310μs | 22.1553μs | 45.1359 KOps/s | 45.6772 KOps/s | |
| test_stacked_getitem | 62.3610μs | 20.9663μs | 47.6957 KOps/s | 47.9379 KOps/s | |
| test_lock_nested | 0.5826ms | 0.4784ms | 2.0904 KOps/s | 2.0770 KOps/s | |
| test_lock_stack_nested | 0.5827ms | 0.4833ms | 2.0693 KOps/s | 2.0499 KOps/s | |
| test_unlock_nested | 0.5241ms | 0.3898ms | 2.5654 KOps/s | 2.5701 KOps/s | |
| test_unlock_stack_nested | 0.4621ms | 0.3908ms | 2.5590 KOps/s | 2.5341 KOps/s | |
| test_flatten_speed | 0.1616ms | 0.1213ms | 8.2447 KOps/s | 8.1240 KOps/s | |
| test_unflatten_speed | 0.6181ms | 0.5737ms | 1.7430 KOps/s | 1.7592 KOps/s | |
| test_common_ops | 0.8883ms | 0.7084ms | 1.4116 KOps/s | 1.4271 KOps/s | |
| test_creation | 0.1058ms | 3.1745μs | 315.0125 KOps/s | 317.0173 KOps/s | |
| test_creation_empty | 32.8300μs | 6.9840μs | 143.1842 KOps/s | 141.7945 KOps/s | |
| test_creation_nested_1 | 40.4100μs | 11.5269μs | 86.7536 KOps/s | 86.4803 KOps/s | |
| test_creation_nested_2 | 74.1310μs | 13.2103μs | 75.6983 KOps/s | 74.3462 KOps/s | |
| test_creation_many_keys[10] | 56.0700μs | 21.1531μs | 47.2744 KOps/s | 47.4640 KOps/s | |
| test_creation_many_keys[50] | 0.1241ms | 89.8037μs | 11.1354 KOps/s | 11.0934 KOps/s | |
| test_creation_many_keys[100] | 0.2303ms | 0.1768ms | 5.6566 KOps/s | 5.6100 KOps/s | |
| test_creation_nested_many_keys[10] | 89.4110μs | 45.0526μs | 22.1963 KOps/s | 22.2149 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2543ms | 0.1840ms | 5.4334 KOps/s | 5.4032 KOps/s | |
| test_clone | 51.9510μs | 13.0777μs | 76.4659 KOps/s | 76.0663 KOps/s | |
| test_getitem[int] | 1.4897ms | 15.1676μs | 65.9300 KOps/s | 60.5236 KOps/s | |
| test_getitem[slice_int] | 0.1473ms | 24.0143μs | 41.6419 KOps/s | 41.5276 KOps/s | |
| test_getitem[range] | 0.1873ms | 64.3114μs | 15.5493 KOps/s | 15.8168 KOps/s | |
| test_getitem[tuple] | 0.1416ms | 23.6764μs | 42.2362 KOps/s | 42.4855 KOps/s | |
| test_getitem[list] | 0.2104ms | 57.2951μs | 17.4535 KOps/s | 17.1695 KOps/s | |
| test_setitem_dim[int] | 39.7810μs | 24.8517μs | 40.2387 KOps/s | 38.0571 KOps/s | |
| test_setitem_dim[slice_int] | 65.0310μs | 42.8803μs | 23.3207 KOps/s | 22.9625 KOps/s | |
| test_setitem_dim[range] | 0.1376ms | 95.9416μs | 10.4230 KOps/s | 10.4985 KOps/s | |
| test_setitem_dim[tuple] | 79.1010μs | 39.8204μs | 25.1128 KOps/s | 24.4569 KOps/s | |
| test_setitem | 51.8710μs | 17.5950μs | 56.8344 KOps/s | 56.1191 KOps/s | |
| test_set | 46.9900μs | 16.9739μs | 58.9138 KOps/s | 59.3895 KOps/s | |
| test_set_shared | 0.5571ms | 0.2102ms | 4.7578 KOps/s | 4.9094 KOps/s | |
| test_update | 0.4067ms | 21.3782μs | 46.7767 KOps/s | 46.5369 KOps/s | |
| test_update_nested | 76.3210μs | 32.3924μs | 30.8714 KOps/s | 30.0669 KOps/s | |
| test_update__nested | 0.5057ms | 33.7980μs | 29.5876 KOps/s | 29.0195 KOps/s | |
| test_set_nested | 48.5610μs | 18.7228μs | 53.4107 KOps/s | 53.5866 KOps/s | |
| test_set_nested_new | 64.4310μs | 23.5900μs | 42.3908 KOps/s | 41.5842 KOps/s | |
| test_select | 78.1110μs | 40.2547μs | 24.8418 KOps/s | 24.4331 KOps/s | |
| test_select_nested | 0.1314ms | 75.6211μs | 13.2238 KOps/s | 13.3544 KOps/s | |
| test_exclude_nested | 0.1245ms | 93.2843μs | 10.7199 KOps/s | 10.8564 KOps/s | |
| test_empty[True] | 0.4726ms | 0.4005ms | 2.4969 KOps/s | 2.4986 KOps/s | |
| test_empty[False] | 10.9367μs | 1.3441μs | 743.9899 KOps/s | 758.7407 KOps/s | |
| test_to | 0.1045ms | 74.7460μs | 13.3786 KOps/s | 13.4407 KOps/s | |
| test_to_nonblocking | 0.1105ms | 64.7264μs | 15.4497 KOps/s | 15.4278 KOps/s | |
| test_unbind_speed | 0.3883ms | 0.3352ms | 2.9832 KOps/s | 3.0104 KOps/s | |
| test_unbind_speed_stack0 | 0.3800ms | 0.3326ms | 3.0063 KOps/s | 3.0418 KOps/s | |
| test_unbind_speed_stack1 | 0.1035s | 1.0576ms | 945.5808 Ops/s | 1.1912 KOps/s | |
| test_split | 1.2514ms | 1.1486ms | 870.6000 Ops/s | 786.3578 Ops/s | |
| test_chunk | 0.1038s | 1.2097ms | 826.6805 Ops/s | 916.7735 Ops/s | |
| test_to_cpu_blocking | 19.3758ms | 19.2917ms | 51.8357 Ops/s | 51.3026 Ops/s | |
| test_to_cpu_global_sync | 11.3359ms | 11.2012ms | 89.2765 Ops/s | 78.9689 Ops/s | |
| test_to_cpu_event_sync | 0.1152s | 13.4113ms | 74.5641 Ops/s | 80.2244 Ops/s | |
| test_to_cpu_default | 12.4567ms | 12.1935ms | 82.0109 Ops/s | 80.6764 Ops/s | |
| test_consolidate[False-None] | 4.2616ms | 4.1606ms | 240.3515 Ops/s | 216.9879 Ops/s | |
| test_consolidate[default-None] | 2.1458ms | 2.0100ms | 497.5119 Ops/s | 487.1495 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0355ms | 1.9435ms | 514.5298 Ops/s | 508.6437 Ops/s | |
| test_consolidate_njt[False-None] | 8.6913ms | 8.4440ms | 118.4276 Ops/s | 117.7129 Ops/s | |
| test_to[False-False-None] | 2.3227ms | 2.0744ms | 482.0637 Ops/s | 473.9723 Ops/s | |
| test_to[True-False-None] | 2.1310ms | 1.8945ms | 527.8518 Ops/s | 520.0717 Ops/s | |
| test_to[within-False-None] | 6.3861ms | 6.1350ms | 162.9991 Ops/s | 163.1155 Ops/s | |
| test_to[True-default-None] | 8.9004ms | 8.6410ms | 115.7279 Ops/s | 111.2757 Ops/s | |
| test_to_njt[False-False-None] | 8.5305ms | 8.4515ms | 118.3218 Ops/s | 117.1108 Ops/s | |
| test_to_njt[True-False-None] | 7.0806ms | 6.8804ms | 145.3407 Ops/s | 142.1915 Ops/s | |
| test_to_njt[within-False-None] | 15.6386ms | 15.3588ms | 65.1092 Ops/s | 63.7603 Ops/s | |
| test_creation[device0] | 0.4141ms | 0.1144ms | 8.7447 KOps/s | 8.6670 KOps/s | |
| test_creation_from_tensor | 0.3957ms | 0.1118ms | 8.9442 KOps/s | 8.8637 KOps/s | |
| test_add_one[memmap_tensor0] | 0.3490ms | 6.4542μs | 154.9385 KOps/s | 155.9302 KOps/s | |
| test_contiguous[memmap_tensor0] | 14.2600μs | 0.6840μs | 1.4620 MOps/s | 2.1148 MOps/s | |
| test_stack[memmap_tensor0] | 46.9600μs | 4.7189μs | 211.9120 KOps/s | 212.4636 KOps/s | |
| test_memmaptd_index | 0.9804ms | 0.2749ms | 3.6372 KOps/s | 3.7386 KOps/s | |
| test_memmaptd_index_astensor | 0.5560ms | 0.3765ms | 2.6561 KOps/s | 2.7436 KOps/s | |
| test_memmaptd_index_op | 0.7807ms | 0.6222ms | 1.6071 KOps/s | 1.6353 KOps/s | |
| test_serialize_model | 0.1389s | 0.1364s | 7.3322 Ops/s | 7.2803 Ops/s | |
| test_serialize_model_pickle | 1.3680s | 1.1894s | 0.8407 Ops/s | 0.8261 Ops/s | |
| test_serialize_weights | 0.1386s | 0.1360s | 7.3520 Ops/s | 7.3839 Ops/s | |
| test_serialize_weights_returnearly | 0.4390s | 93.1447ms | 10.7360 Ops/s | 11.2691 Ops/s | |
| test_serialize_weights_pickle | 1.3506s | 1.2143s | 0.8235 Ops/s | 0.8231 Ops/s | |
| test_reshape_pytree | 0.2052ms | 34.0810μs | 29.3419 KOps/s | 30.7136 KOps/s | |
| test_reshape_td | 83.1210μs | 47.2263μs | 21.1746 KOps/s | 22.4973 KOps/s | |
| test_view_pytree | 0.2160ms | 33.8747μs | 29.5206 KOps/s | 30.9972 KOps/s | |
| test_view_td | 87.9710μs | 53.2281μs | 18.7871 KOps/s | 18.9527 KOps/s | |
| test_unbind_pytree | 0.2512ms | 36.8672μs | 27.1244 KOps/s | 27.7626 KOps/s | |
| test_unbind_td | 82.3310μs | 50.3740μs | 19.8515 KOps/s | 20.2738 KOps/s | |
| test_split_pytree | 0.2184ms | 44.6286μs | 22.4072 KOps/s | 23.8758 KOps/s | |
| test_split_td | 0.1164ms | 68.2950μs | 14.6424 KOps/s | 15.5342 KOps/s | |
| test_add_pytree | 0.2286ms | 44.3322μs | 22.5570 KOps/s | 23.9562 KOps/s | |
| test_add_td | 91.0720μs | 57.5619μs | 17.3726 KOps/s | 18.3801 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.3247ms | 0.1445ms | 6.9205 KOps/s | 6.8813 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.2902ms | 0.2056ms | 4.8646 KOps/s | 4.9825 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1966ms | 0.1080ms | 9.2579 KOps/s | 9.0051 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4271ms | 0.1812ms | 5.5190 KOps/s | 5.5632 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.3082ms | 10.5469μs | 94.8150 KOps/s | 96.9610 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 82.1910μs | 53.8457μs | 18.5716 KOps/s | 18.2623 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 45.5700μs | 9.6465μs | 103.6646 KOps/s | 99.7691 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4265ms | 69.4631μs | 14.3961 KOps/s | 14.6397 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2278ms | 0.1776ms | 5.6313 KOps/s | 5.2680 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3283ms | 0.2777ms | 3.6005 KOps/s | 3.4991 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.1684ms | 0.1176ms | 8.5047 KOps/s | 8.0644 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1190ms | 73.5225μs | 13.6013 KOps/s | 13.4587 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.1936ms | 0.1587ms | 6.2996 KOps/s | 6.0994 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8014ms | 0.5285ms | 1.8920 KOps/s | 1.7904 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.5086ms | 0.3324ms | 3.0083 KOps/s | 2.9570 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2306ms | 0.1795ms | 5.5720 KOps/s | 5.2719 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1357ms | 90.1068μs | 11.0979 KOps/s | 11.1199 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.1620ms | 0.1198ms | 8.3468 KOps/s | 7.9888 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6583ms | 0.4368ms | 2.2893 KOps/s | 2.2565 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.2022ms | 0.1586ms | 6.3037 KOps/s | 6.1288 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 42.8700μs | 13.3910μs | 74.6770 KOps/s | 70.5391 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 76.6210μs | 41.9615μs | 23.8314 KOps/s | 24.2200 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 48.4910μs | 10.9713μs | 91.1468 KOps/s | 91.6971 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4054ms | 52.1982μs | 19.1578 KOps/s | 19.0813 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0150ms | 0.1744ms | 5.7334 KOps/s | 5.5068 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.3722ms | 3.2867ms | 304.2594 Ops/s | 299.9706 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 2.0316ms | 0.1622ms | 6.1651 KOps/s | 6.0482 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.9207ms | 2.7932ms | 358.0111 Ops/s | 356.5978 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1720ms | 0.1103ms | 9.0649 KOps/s | 8.8330 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3163ms | 73.2233μs | 13.6569 KOps/s | 13.3764 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1519ms | 96.5276μs | 10.3597 KOps/s | 10.0810 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2491ms | 44.4916μs | 22.4761 KOps/s | 22.2730 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1393ms | 97.6363μs | 10.2421 KOps/s | 10.1081 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2769ms | 44.4980μs | 22.4729 KOps/s | 22.1346 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1009ms | 56.8707μs | 17.5837 KOps/s | 17.4205 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2186ms | 26.7467μs | 37.3877 KOps/s | 36.2341 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 81.4210μs | 43.9204μs | 22.7684 KOps/s | 22.2723 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2514ms | 22.3896μs | 44.6635 KOps/s | 44.4624 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 79.3010μs | 45.6584μs | 21.9018 KOps/s | 22.0406 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2863ms | 22.5522μs | 44.3415 KOps/s | 44.9063 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1157ms | 57.1069μs | 17.5110 KOps/s | 17.0801 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2773ms | 26.9246μs | 37.1408 KOps/s | 36.3254 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 89.2210μs | 44.3599μs | 22.5429 KOps/s | 22.4137 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2589ms | 22.4277μs | 44.5878 KOps/s | 44.9856 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 75.8210μs | 44.1036μs | 22.6739 KOps/s | 22.3387 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2519ms | 22.2362μs | 44.9717 KOps/s | 45.1486 KOps/s | |
| test_compile_replace[single-eager] | 84.0110μs | 46.5068μs | 21.5022 KOps/s | 21.0591 KOps/s | |
| test_compile_replace[single-compile] | 0.1851ms | 0.1048ms | 9.5437 KOps/s | 9.2366 KOps/s | |
| test_compile_replace[multi-eager] | 0.6060ms | 0.5601ms | 1.7855 KOps/s | 1.7670 KOps/s | |
| test_compile_replace[multi-compile] | 0.1772ms | 0.1114ms | 8.9776 KOps/s | 8.4086 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2417ms | 0.1783ms | 5.6086 KOps/s | 5.9433 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.1789ms | 0.1190ms | 8.4046 KOps/s | 8.2019 KOps/s | |
| test_compile_clone_shallow[20-eager] | 44.8400μs | 19.3473μs | 51.6869 KOps/s | 51.1154 KOps/s | |
| test_compile_clone_shallow[20-compile] | 81.5410μs | 11.2698μs | 88.7328 KOps/s | 88.0581 KOps/s | |
| test_compile_clone_shallow[40-eager] | 66.3810μs | 34.3775μs | 29.0888 KOps/s | 29.1795 KOps/s | |
| test_compile_clone_shallow[40-compile] | 44.8300μs | 12.5829μs | 79.4726 KOps/s | 78.3976 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.1018ms | 62.9000μs | 15.8982 KOps/s | 15.6393 KOps/s | |
| test_compile_clone_shallow[80-compile] | 50.5600μs | 14.5533μs | 68.7128 KOps/s | 65.9108 KOps/s | |
| test_compile_update_inplace[eager] | 97.3220μs | 59.7122μs | 16.7470 KOps/s | 16.7158 KOps/s | |
| test_compile_update_inplace[compile] | 0.2106ms | 0.1394ms | 7.1734 KOps/s | 6.7007 KOps/s | |
| test_mod_add[eager] | 99.2610μs | 50.3944μs | 19.8435 KOps/s | 19.6742 KOps/s | |
| test_mod_add[compile] | 0.1551ms | 0.1050ms | 9.5281 KOps/s | 9.3658 KOps/s | |
| test_mod_add[compile-overhead] | 0.2324ms | 0.1470ms | 6.8043 KOps/s | 6.5235 KOps/s | |
| test_mod_wrap[eager] | 0.3800ms | 0.3026ms | 3.3043 KOps/s | 3.3755 KOps/s | |
| test_mod_wrap[compile] | 0.3991ms | 0.3460ms | 2.8902 KOps/s | 2.8114 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.3809ms | 4.0795ms | 245.1299 Ops/s | 253.4924 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6236ms | 1.5002ms | 666.5711 Ops/s | 658.1628 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.6361ms | 1.5553ms | 642.9525 Ops/s | 683.0620 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.4647ms | 1.0000ms | 1.0000 KOps/s | 1.0961 KOps/s | |
| test_seq_add[eager] | 0.2233ms | 0.1516ms | 6.5957 KOps/s | 6.2680 KOps/s | |
| test_seq_add[compile] | 0.1725ms | 0.1119ms | 8.9332 KOps/s | 8.4925 KOps/s | |
| test_seq_add[compile-overhead] | 0.2260ms | 0.1658ms | 6.0324 KOps/s | 6.1969 KOps/s | |
| test_seq_wrap[eager] | 0.6172ms | 0.5523ms | 1.8108 KOps/s | 1.8757 KOps/s | |
| test_seq_wrap[compile] | 0.4690ms | 0.3869ms | 2.5847 KOps/s | 2.6826 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3371ms | 0.2662ms | 3.7564 KOps/s | 3.7167 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9022ms | 0.8331ms | 1.2003 KOps/s | 1.1648 KOps/s | |
| test_func_call_runtime[False-compile] | 0.9706ms | 0.9072ms | 1.1023 KOps/s | 1.0866 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5613ms | 0.4642ms | 2.1541 KOps/s | 2.1384 KOps/s | |
| test_func_call_runtime[True-eager] | 1.1363ms | 1.0781ms | 927.5463 Ops/s | 920.0990 Ops/s | |
| test_func_call_runtime[True-compile] | 0.9818ms | 0.9222ms | 1.0844 KOps/s | 1.0648 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5368ms | 0.4772ms | 2.0954 KOps/s | 2.0762 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9468ms | 0.8861ms | 1.1286 KOps/s | 1.1690 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.0012ms | 0.9159ms | 1.0918 KOps/s | 1.0788 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5330ms | 0.4672ms | 2.1404 KOps/s | 2.1303 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3060ms | 1.2272ms | 814.8518 Ops/s | 804.3518 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.0109ms | 0.9572ms | 1.0447 KOps/s | 1.0276 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5798ms | 0.5107ms | 1.9580 KOps/s | 1.9258 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8496ms | 2.3629ms | 423.2044 Ops/s | 416.3101 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0564ms | 0.9777ms | 1.0228 KOps/s | 1.0082 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5682ms | 0.5167ms | 1.9355 KOps/s | 1.9150 KOps/s | |
| test_distributed | 2.7916ms | 0.1749ms | 5.7161 KOps/s | 6.5135 KOps/s | |
| test_tdmodule | 0.1779ms | 28.3807μs | 35.2352 KOps/s | 35.8465 KOps/s | |
| test_tdmodule_dispatch | 72.7610μs | 45.2216μs | 22.1133 KOps/s | 21.8889 KOps/s | |
| test_tdseq | 66.3010μs | 27.1554μs | 36.8251 KOps/s | 36.8426 KOps/s | |
| test_tdseq_dispatch | 75.4710μs | 47.5791μs | 21.0176 KOps/s | 20.9630 KOps/s | |
| test_instantiation_functorch | 2.1943ms | 2.0794ms | 480.9151 Ops/s | 480.1015 Ops/s | |
| test_exec_functorch | 0.2304ms | 0.1803ms | 5.5463 KOps/s | 5.5594 KOps/s | |
| test_exec_functional_call | 0.2231ms | 0.1637ms | 6.1091 KOps/s | 6.2130 KOps/s | |
| test_exec_td_decorator | 0.4400ms | 0.2390ms | 4.1849 KOps/s | 4.2503 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 0.9979ms | 0.8215ms | 1.2173 KOps/s | 1.2050 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0183ms | 0.8413ms | 1.1887 KOps/s | 1.1695 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.9378ms | 0.7199ms | 1.3891 KOps/s | 1.3472 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.9199ms | 0.7166ms | 1.3954 KOps/s | 1.3562 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 20.9544ms | 20.4908ms | 48.8025 Ops/s | 47.1333 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.0121ms | 20.5128ms | 48.7500 Ops/s | 48.1325 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.8740ms | 20.2702ms | 49.3336 Ops/s | 48.9000 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.8591ms | 20.2718ms | 49.3295 Ops/s | 48.9145 Ops/s | |
| test_to_module_speed[True] | 2.0504ms | 1.4758ms | 677.5965 Ops/s | 673.4486 Ops/s | |
| test_to_module_speed[False] | 1.9318ms | 1.4536ms | 687.9563 Ops/s | 680.8005 Ops/s | |
| test_tc_init | 81.0210μs | 45.1692μs | 22.1390 KOps/s | 22.1994 KOps/s | |
| test_tc_init_tensor_only | 37.4010μs | 9.6840μs | 103.2635 KOps/s | 101.9391 KOps/s | |
| test_tc_init_nested | 0.1191ms | 87.8365μs | 11.3848 KOps/s | 11.3506 KOps/s | |
| test_tc_init_many_fields | 48.1200μs | 16.3521μs | 61.1542 KOps/s | 60.8158 KOps/s | |
| test_tc_first_layer_tensor | 28.5110μs | 1.8326μs | 545.6834 KOps/s | 540.4270 KOps/s | |
| test_tc_first_layer_tensor_only | 3.5200μs | 0.4102μs | 2.4380 MOps/s | 2.4594 MOps/s | |
| test_tc_first_layer_tensor_set | 28.4000μs | 3.9700μs | 251.8920 KOps/s | 252.9086 KOps/s | |
| test_tc_first_layer_tensor_only_set | 23.2000μs | 3.2942μs | 303.5661 KOps/s | 305.0497 KOps/s | |
| test_tc_first_layer_nontensor | 48.2000μs | 6.1494μs | 162.6178 KOps/s | 160.6697 KOps/s | |
| test_tc_second_layer_tensor | 33.8600μs | 4.4201μs | 226.2390 KOps/s | 227.4721 KOps/s | |
| test_tc_second_layer_nontensor | 49.2210μs | 8.6439μs | 115.6881 KOps/s | 115.0925 KOps/s | |
| test_unbind | 0.2628s | 16.5450ms | 60.4412 Ops/s | 53.0614 Ops/s | |
| test_full_like | 5.4464ms | 4.3942ms | 227.5740 Ops/s | 58.3779 Ops/s | |
| test_zeros_like | 4.9649ms | 4.3873ms | 227.9285 Ops/s | 59.6096 Ops/s | |
| test_ones_like | 4.6448ms | 4.4049ms | 227.0183 Ops/s | 59.5299 Ops/s | |
| test_clone | 6.9395ms | 6.5257ms | 153.2391 Ops/s | 56.4190 Ops/s | |
| test_squeeze | 60.4210μs | 14.5311μs | 68.8180 KOps/s | 70.9669 KOps/s | |
| test_unsqueeze | 0.1594ms | 0.1107ms | 9.0347 KOps/s | 8.9854 KOps/s | |
| test_split | 0.2438ms | 0.1847ms | 5.4140 KOps/s | 5.3905 KOps/s | |
| test_permute | 0.2746ms | 0.2108ms | 4.7446 KOps/s | 4.8579 KOps/s | |
| test_stack | 53.3136ms | 51.9025ms | 19.2669 Ops/s | 19.3997 Ops/s | |
| test_cat | 52.4766ms | 51.6763ms | 19.3512 Ops/s | 19.4082 Ops/s | |
| test_sequential_tensordict | 0.3276ms | 0.2262ms | 4.4207 KOps/s | 4.5361 KOps/s | |
| test_sequential_graph_module | 0.5419ms | 0.1218ms | 8.2132 KOps/s | 8.2197 KOps/s | |
| test_nested_tensordict | 0.3323ms | 0.2888ms | 3.4629 KOps/s | 3.4701 KOps/s | |
| test_nested_graph_module | 0.2501ms | 0.1336ms | 7.4847 KOps/s | 7.7406 KOps/s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Implement _dtensor_send_redistribute and _dtensor_recv_redistribute:
(placements, mesh topology, mesh dim names) + local tensor data
for the caller to reconstruct as DTensors via from_local() + redistribute()
This avoids materializing full tensors (no memory spike on sender),
and transfers only the data each sender rank actually holds.
Made-with: Cursor