[DTensor] Add Strategy C (optimal P2P transfer using transfer plan)#1642
Open
vmoens wants to merge 1 commit intogh/vmoens/83/basefrom
Open
[DTensor] Add Strategy C (optimal P2P transfer using transfer plan)#1642vmoens wants to merge 1 commit intogh/vmoens/83/basefrom
vmoens wants to merge 1 commit intogh/vmoens/83/basefrom
Conversation
vmoens
added a commit
that referenced
this pull request
Mar 6, 2026
Implement _dtensor_send_optimal and _dtensor_recv_optimal: - Sender: computes transfer plan from src/dst meshes and placements, extracts only the needed slices from local shards, and sends via P2P - Receiver: computes same plan, receives slices into the right positions of the local buffer, wraps as DTensor via from_local() - Both torch.distributed and UCXX transports supported Update "auto" strategy resolution to pick "optimal" when dst_mesh/src_mesh and dst_placements/src_placements are provided, falling back to "materialize" otherwise. Add _mesh_to_rank_map and _mesh_all_ranks helpers to _dtensor.py. Made-with: Cursor ghstack-source-id: faa4852 Pull-Request: #1642
Contributor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
1 similar comment
Contributor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 63.5410μs | 14.8422μs | 67.3756 KOps/s | 67.4779 KOps/s | |
| test_plain_set_stack_nested | 39.9410μs | 15.5317μs | 64.3845 KOps/s | 66.7030 KOps/s | |
| test_plain_set_nested_inplace | 41.0610μs | 16.8204μs | 59.4516 KOps/s | 60.3784 KOps/s | |
| test_plain_set_stack_nested_inplace | 55.5010μs | 16.7946μs | 59.5431 KOps/s | 60.1751 KOps/s | |
| test_items | 33.3110μs | 6.1059μs | 163.7771 KOps/s | 165.9065 KOps/s | |
| test_items_nested | 0.5294ms | 0.4744ms | 2.1080 KOps/s | 2.1435 KOps/s | |
| test_items_nested_locked | 0.5328ms | 0.4783ms | 2.0907 KOps/s | 2.1279 KOps/s | |
| test_items_nested_leaf | 0.1352ms | 98.9065μs | 10.1106 KOps/s | 10.0339 KOps/s | |
| test_items_stack_nested | 0.5133ms | 0.4714ms | 2.1213 KOps/s | 2.1330 KOps/s | |
| test_items_stack_nested_leaf | 0.1972ms | 96.1778μs | 10.3974 KOps/s | 10.1982 KOps/s | |
| test_items_stack_nested_locked | 0.5114ms | 0.4751ms | 2.1050 KOps/s | 2.1110 KOps/s | |
| test_keys | 35.0210μs | 4.2124μs | 237.3921 KOps/s | 236.6578 KOps/s | |
| test_keys_nested | 0.1674ms | 0.1323ms | 7.5579 KOps/s | 7.7783 KOps/s | |
| test_keys_nested_locked | 1.8076ms | 0.1405ms | 7.1154 KOps/s | 7.2287 KOps/s | |
| test_keys_nested_leaf | 0.1710ms | 0.1237ms | 8.0838 KOps/s | 8.3452 KOps/s | |
| test_keys_stack_nested | 0.1716ms | 0.1333ms | 7.5030 KOps/s | 7.7013 KOps/s | |
| test_keys_stack_nested_leaf | 0.1614ms | 0.1234ms | 8.1043 KOps/s | 8.2831 KOps/s | |
| test_keys_stack_nested_locked | 0.1795ms | 0.1414ms | 7.0705 KOps/s | 7.3056 KOps/s | |
| test_values | 13.2150μs | 1.0860μs | 920.8045 KOps/s | 983.7982 KOps/s | |
| test_values_nested | 78.8810μs | 53.8739μs | 18.5619 KOps/s | 19.1392 KOps/s | |
| test_values_nested_locked | 90.6310μs | 57.4366μs | 17.4105 KOps/s | 18.2288 KOps/s | |
| test_values_nested_leaf | 85.7920μs | 62.4178μs | 16.0211 KOps/s | 16.5140 KOps/s | |
| test_values_stack_nested | 79.2020μs | 53.9612μs | 18.5318 KOps/s | 18.9697 KOps/s | |
| test_values_stack_nested_leaf | 91.1420μs | 62.1072μs | 16.1012 KOps/s | 16.5799 KOps/s | |
| test_values_stack_nested_locked | 85.7520μs | 57.2675μs | 17.4619 KOps/s | 17.8699 KOps/s | |
| test_membership | 12.4355μs | 0.9354μs | 1.0691 MOps/s | 1.1831 MOps/s | |
| test_membership_nested | 33.0910μs | 2.9652μs | 337.2423 KOps/s | 352.0521 KOps/s | |
| test_membership_nested_leaf | 34.2800μs | 2.9364μs | 340.5509 KOps/s | 361.7451 KOps/s | |
| test_membership_stacked_nested | 36.3910μs | 2.9660μs | 337.1579 KOps/s | 347.7078 KOps/s | |
| test_membership_stacked_nested_leaf | 32.9300μs | 2.9136μs | 343.2191 KOps/s | 350.3915 KOps/s | |
| test_membership_nested_last | 31.9400μs | 4.4765μs | 223.3887 KOps/s | 230.3215 KOps/s | |
| test_membership_nested_leaf_last | 24.6700μs | 4.4678μs | 223.8229 KOps/s | 241.3661 KOps/s | |
| test_membership_stacked_nested_last | 35.3610μs | 4.4425μs | 225.0969 KOps/s | 233.6988 KOps/s | |
| test_membership_stacked_nested_leaf_last | 34.8710μs | 4.4149μs | 226.5074 KOps/s | 232.8786 KOps/s | |
| test_nested_getleaf | 46.2210μs | 22.1779μs | 45.0900 KOps/s | 46.3626 KOps/s | |
| test_nested_get | 46.7000μs | 21.0231μs | 47.5667 KOps/s | 48.6833 KOps/s | |
| test_stacked_getleaf | 60.6510μs | 21.9236μs | 45.6130 KOps/s | 46.8377 KOps/s | |
| test_stacked_get | 43.3110μs | 20.8411μs | 47.9822 KOps/s | 48.7104 KOps/s | |
| test_nested_getitemleaf | 48.3010μs | 22.4891μs | 44.4659 KOps/s | 45.3645 KOps/s | |
| test_nested_getitem | 49.2110μs | 21.2888μs | 46.9730 KOps/s | 47.8138 KOps/s | |
| test_stacked_getitemleaf | 46.4400μs | 22.3134μs | 44.8161 KOps/s | 45.4303 KOps/s | |
| test_stacked_getitem | 56.3310μs | 21.4489μs | 46.6224 KOps/s | 47.4663 KOps/s | |
| test_lock_nested | 0.5827ms | 0.4830ms | 2.0704 KOps/s | 2.0969 KOps/s | |
| test_lock_stack_nested | 0.5400ms | 0.4842ms | 2.0651 KOps/s | 2.0589 KOps/s | |
| test_unlock_nested | 0.4785ms | 0.3951ms | 2.5313 KOps/s | 2.5649 KOps/s | |
| test_unlock_stack_nested | 0.4460ms | 0.3928ms | 2.5459 KOps/s | 2.5287 KOps/s | |
| test_flatten_speed | 0.1901ms | 0.1226ms | 8.1564 KOps/s | 8.1743 KOps/s | |
| test_unflatten_speed | 0.6447ms | 0.5780ms | 1.7301 KOps/s | 1.7577 KOps/s | |
| test_common_ops | 0.8491ms | 0.7025ms | 1.4235 KOps/s | 1.4297 KOps/s | |
| test_creation | 69.3620μs | 3.1856μs | 313.9112 KOps/s | 317.8948 KOps/s | |
| test_creation_empty | 34.2710μs | 7.0478μs | 141.8875 KOps/s | 144.2943 KOps/s | |
| test_creation_nested_1 | 39.6010μs | 11.5792μs | 86.3617 KOps/s | 86.9181 KOps/s | |
| test_creation_nested_2 | 37.8010μs | 13.3786μs | 74.7462 KOps/s | 75.9618 KOps/s | |
| test_creation_many_keys[10] | 47.6200μs | 21.0876μs | 47.4211 KOps/s | 47.8188 KOps/s | |
| test_creation_many_keys[50] | 0.1172ms | 89.7460μs | 11.1426 KOps/s | 11.1709 KOps/s | |
| test_creation_many_keys[100] | 0.2226ms | 0.1769ms | 5.6544 KOps/s | 5.6927 KOps/s | |
| test_creation_nested_many_keys[10] | 77.0310μs | 44.8885μs | 22.2774 KOps/s | 22.5533 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2271ms | 0.1829ms | 5.4680 KOps/s | 5.4915 KOps/s | |
| test_clone | 52.2510μs | 13.7557μs | 72.6972 KOps/s | 73.8053 KOps/s | |
| test_getitem[int] | 1.5721ms | 15.1546μs | 65.9865 KOps/s | 61.2086 KOps/s | |
| test_getitem[slice_int] | 0.1374ms | 24.4245μs | 40.9425 KOps/s | 38.8856 KOps/s | |
| test_getitem[range] | 0.1762ms | 64.2365μs | 15.5675 KOps/s | 14.5892 KOps/s | |
| test_getitem[tuple] | 0.1431ms | 24.2605μs | 41.2193 KOps/s | 40.2311 KOps/s | |
| test_getitem[list] | 0.1839ms | 59.5900μs | 16.7813 KOps/s | 15.8133 KOps/s | |
| test_setitem_dim[int] | 45.3100μs | 26.7171μs | 37.4292 KOps/s | 36.2321 KOps/s | |
| test_setitem_dim[slice_int] | 65.9610μs | 43.8871μs | 22.7858 KOps/s | 22.9661 KOps/s | |
| test_setitem_dim[range] | 0.1205ms | 95.9528μs | 10.4218 KOps/s | 10.0420 KOps/s | |
| test_setitem_dim[tuple] | 63.5010μs | 40.8413μs | 24.4850 KOps/s | 23.2801 KOps/s | |
| test_setitem | 49.8910μs | 18.2063μs | 54.9259 KOps/s | 55.6095 KOps/s | |
| test_set | 48.0910μs | 17.4474μs | 57.3152 KOps/s | 58.3236 KOps/s | |
| test_set_shared | 0.4937ms | 0.2037ms | 4.9101 KOps/s | 4.9339 KOps/s | |
| test_update | 0.3562ms | 22.1055μs | 45.2376 KOps/s | 45.9553 KOps/s | |
| test_update_nested | 68.5520μs | 34.1211μs | 29.3073 KOps/s | 30.1087 KOps/s | |
| test_update__nested | 0.4458ms | 34.4798μs | 29.0025 KOps/s | 28.8121 KOps/s | |
| test_set_nested | 57.4710μs | 19.4805μs | 51.3334 KOps/s | 52.3842 KOps/s | |
| test_set_nested_new | 61.7910μs | 25.9744μs | 38.4994 KOps/s | 41.4958 KOps/s | |
| test_select | 74.0610μs | 41.5471μs | 24.0691 KOps/s | 24.3320 KOps/s | |
| test_select_nested | 0.1073ms | 74.1962μs | 13.4778 KOps/s | 13.3557 KOps/s | |
| test_exclude_nested | 0.1259ms | 91.2545μs | 10.9584 KOps/s | 10.9641 KOps/s | |
| test_empty[True] | 0.4594ms | 0.4036ms | 2.4777 KOps/s | 2.5115 KOps/s | |
| test_empty[False] | 9.1552μs | 1.3101μs | 763.2897 KOps/s | 770.1656 KOps/s | |
| test_to | 0.1039ms | 71.8261μs | 13.9225 KOps/s | 13.7625 KOps/s | |
| test_to_nonblocking | 0.1172ms | 66.2219μs | 15.1007 KOps/s | 15.4876 KOps/s | |
| test_unbind_speed | 0.3640ms | 0.3355ms | 2.9803 KOps/s | 2.9981 KOps/s | |
| test_unbind_speed_stack0 | 0.3895ms | 0.3352ms | 2.9830 KOps/s | 3.0068 KOps/s | |
| test_unbind_speed_stack1 | 0.1040s | 0.8402ms | 1.1902 KOps/s | 1.1853 KOps/s | |
| test_split | 0.1038s | 1.2729ms | 785.5784 Ops/s | 788.7417 Ops/s | |
| test_chunk | 0.1036s | 1.2137ms | 823.9101 Ops/s | 927.7292 Ops/s | |
| test_to_cpu_blocking | 28.6590ms | 28.5435ms | 35.0342 Ops/s | 35.0003 Ops/s | |
| test_to_cpu_global_sync | 11.7856ms | 11.6732ms | 85.6662 Ops/s | 77.3851 Ops/s | |
| test_to_cpu_event_sync | 12.8666ms | 12.6640ms | 78.9640 Ops/s | 79.3636 Ops/s | |
| test_to_cpu_default | 0.1151s | 13.8692ms | 72.1021 Ops/s | 78.9556 Ops/s | |
| test_consolidate[False-None] | 4.4088ms | 4.1830ms | 239.0609 Ops/s | 217.1123 Ops/s | |
| test_consolidate[default-None] | 2.1884ms | 2.0631ms | 484.7183 Ops/s | 480.0511 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0473ms | 1.9635ms | 509.2894 Ops/s | 497.8494 Ops/s | |
| test_consolidate_njt[False-None] | 8.6958ms | 8.4975ms | 117.6816 Ops/s | 116.2700 Ops/s | |
| test_to[False-False-None] | 2.2200ms | 2.1294ms | 469.6216 Ops/s | 471.4270 Ops/s | |
| test_to[True-False-None] | 2.2191ms | 1.9626ms | 509.5318 Ops/s | 516.5224 Ops/s | |
| test_to[within-False-None] | 6.3161ms | 6.1877ms | 161.6104 Ops/s | 161.8663 Ops/s | |
| test_to[True-default-None] | 8.9547ms | 8.8417ms | 113.1002 Ops/s | 111.6604 Ops/s | |
| test_to_njt[False-False-None] | 8.5572ms | 8.4922ms | 117.7551 Ops/s | 115.4276 Ops/s | |
| test_to_njt[True-False-None] | 7.1220ms | 6.9308ms | 144.2843 Ops/s | 141.1588 Ops/s | |
| test_to_njt[within-False-None] | 16.0942ms | 15.6255ms | 63.9981 Ops/s | 63.8558 Ops/s | |
| test_creation[device0] | 0.3450ms | 0.1131ms | 8.8448 KOps/s | 8.4564 KOps/s | |
| test_creation_from_tensor | 0.3600ms | 0.1107ms | 9.0365 KOps/s | 8.7000 KOps/s | |
| test_add_one[memmap_tensor0] | 0.1620ms | 6.6722μs | 149.8745 KOps/s | 148.2786 KOps/s | |
| test_contiguous[memmap_tensor0] | 13.3300μs | 0.6811μs | 1.4681 MOps/s | 2.1327 MOps/s | |
| test_stack[memmap_tensor0] | 30.9100μs | 4.6728μs | 214.0046 KOps/s | 212.5163 KOps/s | |
| test_memmaptd_index | 0.1709s | 0.3556ms | 2.8122 KOps/s | 3.7513 KOps/s | |
| test_memmaptd_index_astensor | 0.5231ms | 0.3748ms | 2.6684 KOps/s | 2.6842 KOps/s | |
| test_memmaptd_index_op | 0.9374ms | 0.6293ms | 1.5892 KOps/s | 1.5961 KOps/s | |
| test_serialize_model | 0.1390s | 0.1374s | 7.2788 Ops/s | 7.3184 Ops/s | |
| test_serialize_model_pickle | 1.3476s | 1.1921s | 0.8389 Ops/s | 0.8243 Ops/s | |
| test_serialize_weights | 0.1370s | 0.1351s | 7.4039 Ops/s | 7.2912 Ops/s | |
| test_serialize_weights_returnearly | 0.4709s | 93.3962ms | 10.7071 Ops/s | 14.4919 Ops/s | |
| test_serialize_weights_pickle | 1.3645s | 1.2134s | 0.8241 Ops/s | 0.8228 Ops/s | |
| test_reshape_pytree | 0.2027ms | 32.6470μs | 30.6307 KOps/s | 30.1148 KOps/s | |
| test_reshape_td | 68.6010μs | 46.2106μs | 21.6400 KOps/s | 21.8391 KOps/s | |
| test_view_pytree | 0.2216ms | 32.9330μs | 30.3647 KOps/s | 30.4559 KOps/s | |
| test_view_td | 93.4820μs | 53.7874μs | 18.5917 KOps/s | 18.3468 KOps/s | |
| test_unbind_pytree | 0.2415ms | 37.0172μs | 27.0145 KOps/s | 26.9044 KOps/s | |
| test_unbind_td | 0.1575ms | 51.0250μs | 19.5982 KOps/s | 19.8895 KOps/s | |
| test_split_pytree | 0.2001ms | 42.9738μs | 23.2700 KOps/s | 23.3676 KOps/s | |
| test_split_td | 88.5720μs | 66.0593μs | 15.1379 KOps/s | 15.2168 KOps/s | |
| test_add_pytree | 0.2110ms | 43.6594μs | 22.9046 KOps/s | 22.5916 KOps/s | |
| test_add_td | 91.9210μs | 57.5167μs | 17.3862 KOps/s | 17.6338 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2066ms | 0.1412ms | 7.0817 KOps/s | 6.9361 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.3096ms | 0.2025ms | 4.9387 KOps/s | 4.9827 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1622ms | 0.1085ms | 9.2140 KOps/s | 9.0786 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4331ms | 0.1892ms | 5.2852 KOps/s | 5.3851 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.2398ms | 11.6822μs | 85.6007 KOps/s | 97.2557 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.1159ms | 55.1620μs | 18.1284 KOps/s | 18.4383 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1184ms | 9.8696μs | 101.3208 KOps/s | 102.0708 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4674ms | 69.2978μs | 14.4305 KOps/s | 14.3771 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.3162ms | 0.1771ms | 5.6460 KOps/s | 5.1893 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3399ms | 0.2825ms | 3.5393 KOps/s | 3.5154 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.2021ms | 0.1169ms | 8.5576 KOps/s | 8.0986 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1279ms | 72.8292μs | 13.7308 KOps/s | 13.5797 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2282ms | 0.1585ms | 6.3088 KOps/s | 6.1213 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8152ms | 0.5299ms | 1.8870 KOps/s | 1.8391 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.4868ms | 0.3377ms | 2.9614 KOps/s | 2.9804 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2262ms | 0.1785ms | 5.6023 KOps/s | 5.1117 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1321ms | 88.9259μs | 11.2453 KOps/s | 11.3486 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.3623ms | 0.1195ms | 8.3717 KOps/s | 7.8411 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6661ms | 0.4396ms | 2.2746 KOps/s | 2.2122 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.3084ms | 0.1590ms | 6.2900 KOps/s | 6.1737 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1243ms | 13.4474μs | 74.3637 KOps/s | 72.1188 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 75.4110μs | 41.0877μs | 24.3382 KOps/s | 23.9412 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.1211ms | 10.8629μs | 92.0563 KOps/s | 91.6614 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4111ms | 52.8936μs | 18.9059 KOps/s | 19.1496 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0176ms | 0.1736ms | 5.7588 KOps/s | 5.4704 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.4142ms | 3.2996ms | 303.0689 Ops/s | 301.7799 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.9750ms | 0.1621ms | 6.1701 KOps/s | 6.1006 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 3.0238ms | 2.8237ms | 354.1427 Ops/s | 353.4710 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.2026ms | 0.1088ms | 9.1896 KOps/s | 8.6482 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3436ms | 74.3737μs | 13.4456 KOps/s | 12.5038 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.2254ms | 96.4756μs | 10.3653 KOps/s | 10.0652 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2522ms | 44.7224μs | 22.3601 KOps/s | 20.7754 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1360ms | 96.7560μs | 10.3353 KOps/s | 10.0377 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2698ms | 44.6036μs | 22.4197 KOps/s | 20.7390 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.1504ms | 56.9202μs | 17.5685 KOps/s | 16.4943 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2149ms | 27.6934μs | 36.1096 KOps/s | 33.5170 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1515ms | 44.6358μs | 22.4035 KOps/s | 22.2835 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2687ms | 22.6201μs | 44.2084 KOps/s | 44.2949 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 91.9220μs | 45.7958μs | 21.8361 KOps/s | 21.3554 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2798ms | 22.6490μs | 44.1521 KOps/s | 44.5135 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1471ms | 58.5126μs | 17.0903 KOps/s | 16.4705 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.3319ms | 27.3882μs | 36.5120 KOps/s | 34.3392 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 0.1912ms | 45.5433μs | 21.9571 KOps/s | 21.8155 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2640ms | 22.6081μs | 44.2319 KOps/s | 44.0454 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 86.3520μs | 45.9030μs | 21.7851 KOps/s | 21.5393 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2731ms | 22.6135μs | 44.2213 KOps/s | 44.4977 KOps/s | |
| test_compile_replace[single-eager] | 91.0510μs | 47.8190μs | 20.9122 KOps/s | 19.9907 KOps/s | |
| test_compile_replace[single-compile] | 0.2162ms | 0.1048ms | 9.5380 KOps/s | 9.3932 KOps/s | |
| test_compile_replace[multi-eager] | 0.6176ms | 0.5611ms | 1.7823 KOps/s | 1.7567 KOps/s | |
| test_compile_replace[multi-compile] | 0.2630ms | 0.1119ms | 8.9344 KOps/s | 8.6976 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.3141ms | 0.1692ms | 5.9092 KOps/s | 5.8152 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.2657ms | 0.1237ms | 8.0860 KOps/s | 7.9582 KOps/s | |
| test_compile_clone_shallow[20-eager] | 53.8910μs | 19.6056μs | 51.0058 KOps/s | 52.4525 KOps/s | |
| test_compile_clone_shallow[20-compile] | 61.2910μs | 11.6813μs | 85.6069 KOps/s | 79.9701 KOps/s | |
| test_compile_clone_shallow[40-eager] | 61.3620μs | 34.3717μs | 29.0937 KOps/s | 29.8226 KOps/s | |
| test_compile_clone_shallow[40-compile] | 0.1134ms | 12.5638μs | 79.5936 KOps/s | 79.4856 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.2315ms | 64.3340μs | 15.5439 KOps/s | 15.9375 KOps/s | |
| test_compile_clone_shallow[80-compile] | 48.8110μs | 15.1449μs | 66.0290 KOps/s | 67.0180 KOps/s | |
| test_compile_update_inplace[eager] | 0.2551ms | 59.8946μs | 16.6960 KOps/s | 17.0785 KOps/s | |
| test_compile_update_inplace[compile] | 0.2709ms | 0.1392ms | 7.1818 KOps/s | 6.7086 KOps/s | |
| test_mod_add[eager] | 96.7020μs | 50.6818μs | 19.7310 KOps/s | 19.3515 KOps/s | |
| test_mod_add[compile] | 0.3262ms | 0.1044ms | 9.5775 KOps/s | 9.0296 KOps/s | |
| test_mod_add[compile-overhead] | 0.2751ms | 0.1489ms | 6.7167 KOps/s | 6.5807 KOps/s | |
| test_mod_wrap[eager] | 0.3719ms | 0.2895ms | 3.4545 KOps/s | 3.2564 KOps/s | |
| test_mod_wrap[compile] | 0.4902ms | 0.3487ms | 2.8682 KOps/s | 2.7444 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.1621ms | 3.9850ms | 250.9409 Ops/s | 248.8311 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6146ms | 1.4963ms | 668.3271 Ops/s | 654.5776 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.5608ms | 1.4549ms | 687.3145 Ops/s | 632.0698 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2595ms | 0.8902ms | 1.1234 KOps/s | 1.1035 KOps/s | |
| test_seq_add[eager] | 0.2739ms | 0.1533ms | 6.5241 KOps/s | 6.4744 KOps/s | |
| test_seq_add[compile] | 0.5028ms | 0.1135ms | 8.8133 KOps/s | 8.4993 KOps/s | |
| test_seq_add[compile-overhead] | 0.2958ms | 0.1527ms | 6.5480 KOps/s | 6.2288 KOps/s | |
| test_seq_wrap[eager] | 0.5919ms | 0.5180ms | 1.9305 KOps/s | 1.9074 KOps/s | |
| test_seq_wrap[compile] | 0.5218ms | 0.3686ms | 2.7133 KOps/s | 2.6368 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3440ms | 0.2637ms | 3.7929 KOps/s | 3.7292 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9195ms | 0.8423ms | 1.1872 KOps/s | 1.1901 KOps/s | |
| test_func_call_runtime[False-compile] | 0.9906ms | 0.9088ms | 1.1004 KOps/s | 1.0384 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.7294ms | 0.4595ms | 2.1764 KOps/s | 2.1363 KOps/s | |
| test_func_call_runtime[True-eager] | 1.1310ms | 1.0665ms | 937.6522 Ops/s | 917.5741 Ops/s | |
| test_func_call_runtime[True-compile] | 0.9769ms | 0.9233ms | 1.0830 KOps/s | 1.0691 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.7372ms | 0.4798ms | 2.0842 KOps/s | 2.0730 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9177ms | 0.8435ms | 1.1855 KOps/s | 1.1853 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.1350ms | 0.9335ms | 1.0712 KOps/s | 1.0812 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5821ms | 0.4618ms | 2.1656 KOps/s | 2.1238 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.6368ms | 1.2276ms | 814.5876 Ops/s | 814.9435 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.4737ms | 0.9583ms | 1.0435 KOps/s | 1.0328 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5930ms | 0.5091ms | 1.9641 KOps/s | 1.9335 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.9345ms | 2.3832ms | 419.6015 Ops/s | 419.4754 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0814ms | 0.9724ms | 1.0284 KOps/s | 963.3089 Ops/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.6461ms | 0.5162ms | 1.9371 KOps/s | 1.9054 KOps/s | |
| test_distributed | 3.2129ms | 0.1695ms | 5.8994 KOps/s | 6.5102 KOps/s | |
| test_tdmodule | 0.2703ms | 27.4591μs | 36.4179 KOps/s | 37.1385 KOps/s | |
| test_tdmodule_dispatch | 90.0910μs | 45.4110μs | 22.0211 KOps/s | 22.2808 KOps/s | |
| test_tdseq | 46.3010μs | 26.7053μs | 37.4458 KOps/s | 37.4146 KOps/s | |
| test_tdseq_dispatch | 0.1364ms | 47.0997μs | 21.2315 KOps/s | 21.2377 KOps/s | |
| test_instantiation_functorch | 2.2369ms | 2.0935ms | 477.6617 Ops/s | 479.5420 Ops/s | |
| test_exec_functorch | 0.2308ms | 0.1800ms | 5.5571 KOps/s | 5.5496 KOps/s | |
| test_exec_functional_call | 0.2176ms | 0.1596ms | 6.2649 KOps/s | 6.2435 KOps/s | |
| test_exec_td_decorator | 0.4671ms | 0.2372ms | 4.2162 KOps/s | 4.2160 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0075ms | 0.8240ms | 1.2136 KOps/s | 1.2060 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0221ms | 0.8225ms | 1.2158 KOps/s | 1.2088 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.9003ms | 0.7113ms | 1.4059 KOps/s | 1.3988 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8811ms | 0.7111ms | 1.4062 KOps/s | 1.3963 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.3825ms | 20.5767ms | 48.5987 Ops/s | 48.4373 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 20.7464ms | 20.5913ms | 48.5641 Ops/s | 48.4412 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.5477ms | 20.3827ms | 49.0613 Ops/s | 48.9332 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.4996ms | 20.3878ms | 49.0489 Ops/s | 48.9159 Ops/s | |
| test_to_module_speed[True] | 1.5864ms | 1.4756ms | 677.7009 Ops/s | 669.4012 Ops/s | |
| test_to_module_speed[False] | 1.6326ms | 1.4411ms | 693.8908 Ops/s | 686.1291 Ops/s | |
| test_tc_init | 0.2100ms | 45.0815μs | 22.1821 KOps/s | 22.5393 KOps/s | |
| test_tc_init_tensor_only | 36.8500μs | 9.7658μs | 102.3981 KOps/s | 101.7992 KOps/s | |
| test_tc_init_nested | 0.5260ms | 88.8865μs | 11.2503 KOps/s | 11.3275 KOps/s | |
| test_tc_init_many_fields | 48.5200μs | 16.3637μs | 61.1109 KOps/s | 61.0956 KOps/s | |
| test_tc_first_layer_tensor | 0.4269ms | 1.8179μs | 550.0940 KOps/s | 546.6224 KOps/s | |
| test_tc_first_layer_tensor_only | 1.7320μs | 0.3964μs | 2.5227 MOps/s | 2.4689 MOps/s | |
| test_tc_first_layer_tensor_set | 0.4406ms | 3.9148μs | 255.4414 KOps/s | 255.1185 KOps/s | |
| test_tc_first_layer_tensor_only_set | 25.8300μs | 3.2698μs | 305.8329 KOps/s | 306.2298 KOps/s | |
| test_tc_first_layer_nontensor | 2.4663ms | 6.2800μs | 159.2350 KOps/s | 162.3029 KOps/s | |
| test_tc_second_layer_tensor | 0.4281ms | 4.4691μs | 223.7572 KOps/s | 227.2128 KOps/s | |
| test_tc_second_layer_nontensor | 43.5210μs | 8.8416μs | 113.1019 KOps/s | 115.8964 KOps/s | |
| test_unbind | 0.2496s | 16.3745ms | 61.0706 Ops/s | 54.9962 Ops/s | |
| test_full_like | 7.5608ms | 4.3848ms | 228.0602 Ops/s | 228.6130 Ops/s | |
| test_zeros_like | 5.0299ms | 4.3715ms | 228.7548 Ops/s | 229.1119 Ops/s | |
| test_ones_like | 4.5415ms | 4.3706ms | 228.7991 Ops/s | 229.0394 Ops/s | |
| test_clone | 6.8644ms | 6.4402ms | 155.2754 Ops/s | 155.5356 Ops/s | |
| test_squeeze | 66.8110μs | 14.0007μs | 71.4250 KOps/s | 70.5515 KOps/s | |
| test_unsqueeze | 0.2163ms | 0.1109ms | 9.0197 KOps/s | 8.9653 KOps/s | |
| test_split | 0.6225ms | 0.1860ms | 5.3759 KOps/s | 5.3609 KOps/s | |
| test_permute | 0.6474ms | 0.2126ms | 4.7043 KOps/s | 4.8513 KOps/s | |
| test_stack | 51.7995ms | 51.0945ms | 19.5716 Ops/s | 19.5461 Ops/s | |
| test_cat | 51.4962ms | 51.1000ms | 19.5695 Ops/s | 19.6933 Ops/s | |
| test_sequential_tensordict | 0.3103ms | 0.2256ms | 4.4325 KOps/s | 4.6270 KOps/s | |
| test_sequential_graph_module | 0.5733ms | 0.1232ms | 8.1200 KOps/s | 8.4949 KOps/s | |
| test_nested_tensordict | 0.3469ms | 0.2867ms | 3.4874 KOps/s | 3.5459 KOps/s | |
| test_nested_graph_module | 0.5769ms | 0.1292ms | 7.7420 KOps/s | 7.8311 KOps/s |
Contributor
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 31.3110μs | 14.9589μs | 66.8497 KOps/s | 66.9739 KOps/s | |
| test_plain_set_stack_nested | 36.9200μs | 15.2725μs | 65.4774 KOps/s | 65.7984 KOps/s | |
| test_plain_set_nested_inplace | 49.3400μs | 16.7306μs | 59.7708 KOps/s | 59.5671 KOps/s | |
| test_plain_set_stack_nested_inplace | 45.4710μs | 16.5006μs | 60.6037 KOps/s | 59.7506 KOps/s | |
| test_items | 59.6610μs | 6.1481μs | 162.6522 KOps/s | 168.5419 KOps/s | |
| test_items_nested | 0.7017ms | 0.4607ms | 2.1704 KOps/s | 2.1456 KOps/s | |
| test_items_nested_locked | 0.5949ms | 0.4578ms | 2.1845 KOps/s | 2.1450 KOps/s | |
| test_items_nested_leaf | 0.1391ms | 97.4976μs | 10.2567 KOps/s | 10.1355 KOps/s | |
| test_items_stack_nested | 0.5118ms | 0.4657ms | 2.1473 KOps/s | 2.1443 KOps/s | |
| test_items_stack_nested_leaf | 0.1428ms | 98.7465μs | 10.1269 KOps/s | 10.1307 KOps/s | |
| test_items_stack_nested_locked | 0.5204ms | 0.4697ms | 2.1290 KOps/s | 2.1371 KOps/s | |
| test_keys | 31.0810μs | 4.2653μs | 234.4474 KOps/s | 235.6464 KOps/s | |
| test_keys_nested | 0.1757ms | 0.1297ms | 7.7121 KOps/s | 7.6793 KOps/s | |
| test_keys_nested_locked | 2.1215ms | 0.1388ms | 7.2048 KOps/s | 7.2458 KOps/s | |
| test_keys_nested_leaf | 0.1558ms | 0.1207ms | 8.2859 KOps/s | 8.3375 KOps/s | |
| test_keys_stack_nested | 0.1806ms | 0.1308ms | 7.6426 KOps/s | 7.6784 KOps/s | |
| test_keys_stack_nested_leaf | 0.1766ms | 0.1206ms | 8.2927 KOps/s | 8.3237 KOps/s | |
| test_keys_stack_nested_locked | 0.4874ms | 0.1376ms | 7.2671 KOps/s | 7.2767 KOps/s | |
| test_values | 6.2600μs | 1.0228μs | 977.6608 KOps/s | 970.9442 KOps/s | |
| test_values_nested | 86.0110μs | 52.6117μs | 19.0072 KOps/s | 18.9158 KOps/s | |
| test_values_nested_locked | 96.4810μs | 55.9524μs | 17.8723 KOps/s | 17.8080 KOps/s | |
| test_values_nested_leaf | 88.5410μs | 60.2512μs | 16.5972 KOps/s | 16.6423 KOps/s | |
| test_values_stack_nested | 0.1192ms | 52.9216μs | 18.8959 KOps/s | 18.9446 KOps/s | |
| test_values_stack_nested_leaf | 96.1210μs | 60.3939μs | 16.5580 KOps/s | 16.5248 KOps/s | |
| test_values_stack_nested_locked | 86.6010μs | 56.1280μs | 17.8164 KOps/s | 17.7084 KOps/s | |
| test_membership | 4.9352μs | 0.8461μs | 1.1819 MOps/s | 1.1661 MOps/s | |
| test_membership_nested | 28.5100μs | 2.8817μs | 347.0200 KOps/s | 342.5253 KOps/s | |
| test_membership_nested_leaf | 69.4610μs | 2.8757μs | 347.7432 KOps/s | 343.9320 KOps/s | |
| test_membership_stacked_nested | 28.7500μs | 2.9044μs | 344.3086 KOps/s | 343.0598 KOps/s | |
| test_membership_stacked_nested_leaf | 34.4110μs | 2.9045μs | 344.2876 KOps/s | 344.3498 KOps/s | |
| test_membership_nested_last | 33.1110μs | 4.3591μs | 229.4052 KOps/s | 228.2860 KOps/s | |
| test_membership_nested_leaf_last | 33.0310μs | 4.3392μs | 230.4546 KOps/s | 227.0248 KOps/s | |
| test_membership_stacked_nested_last | 23.7500μs | 4.3538μs | 229.6823 KOps/s | 228.2042 KOps/s | |
| test_membership_stacked_nested_leaf_last | 35.6000μs | 4.3500μs | 229.8854 KOps/s | 229.2143 KOps/s | |
| test_nested_getleaf | 49.3910μs | 21.5490μs | 46.4058 KOps/s | 46.0766 KOps/s | |
| test_nested_get | 87.7620μs | 20.5254μs | 48.7200 KOps/s | 49.1354 KOps/s | |
| test_stacked_getleaf | 50.2800μs | 21.7439μs | 45.9899 KOps/s | 46.7903 KOps/s | |
| test_stacked_get | 53.0010μs | 20.4945μs | 48.7935 KOps/s | 48.7607 KOps/s | |
| test_nested_getitemleaf | 59.6510μs | 22.3346μs | 44.7735 KOps/s | 45.3222 KOps/s | |
| test_nested_getitem | 50.1900μs | 21.2031μs | 47.1630 KOps/s | 47.6541 KOps/s | |
| test_stacked_getitemleaf | 81.4210μs | 22.1912μs | 45.0628 KOps/s | 45.3895 KOps/s | |
| test_stacked_getitem | 52.2300μs | 20.9385μs | 47.7590 KOps/s | 47.4580 KOps/s | |
| test_lock_nested | 0.5715ms | 0.4782ms | 2.0913 KOps/s | 2.0939 KOps/s | |
| test_lock_stack_nested | 0.5339ms | 0.4830ms | 2.0705 KOps/s | 2.0592 KOps/s | |
| test_unlock_nested | 0.4740ms | 0.3913ms | 2.5555 KOps/s | 2.5862 KOps/s | |
| test_unlock_stack_nested | 0.4358ms | 0.3905ms | 2.5608 KOps/s | 2.5332 KOps/s | |
| test_flatten_speed | 0.1856ms | 0.1227ms | 8.1529 KOps/s | 8.2302 KOps/s | |
| test_unflatten_speed | 0.6675ms | 0.5625ms | 1.7778 KOps/s | 1.7453 KOps/s | |
| test_common_ops | 0.8381ms | 0.6919ms | 1.4452 KOps/s | 1.4258 KOps/s | |
| test_creation | 0.1152ms | 3.1716μs | 315.3020 KOps/s | 315.8970 KOps/s | |
| test_creation_empty | 35.8000μs | 7.0239μs | 142.3705 KOps/s | 143.2856 KOps/s | |
| test_creation_nested_1 | 33.5010μs | 11.5968μs | 86.2310 KOps/s | 86.7691 KOps/s | |
| test_creation_nested_2 | 71.1910μs | 12.9987μs | 76.9305 KOps/s | 74.6518 KOps/s | |
| test_creation_many_keys[10] | 89.5910μs | 20.8677μs | 47.9209 KOps/s | 47.0632 KOps/s | |
| test_creation_many_keys[50] | 0.1268ms | 90.1610μs | 11.0913 KOps/s | 11.0571 KOps/s | |
| test_creation_many_keys[100] | 0.2169ms | 0.1787ms | 5.5950 KOps/s | 5.5534 KOps/s | |
| test_creation_nested_many_keys[10] | 82.2720μs | 44.9926μs | 22.2259 KOps/s | 22.2481 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2371ms | 0.1846ms | 5.4162 KOps/s | 5.3792 KOps/s | |
| test_clone | 45.7210μs | 13.0367μs | 76.7065 KOps/s | 74.4932 KOps/s | |
| test_getitem[int] | 1.5701ms | 15.0049μs | 66.6447 KOps/s | 59.2168 KOps/s | |
| test_getitem[slice_int] | 0.1867ms | 24.0843μs | 41.5208 KOps/s | 41.4808 KOps/s | |
| test_getitem[range] | 0.1763ms | 62.7427μs | 15.9381 KOps/s | 15.7272 KOps/s | |
| test_getitem[tuple] | 0.1561ms | 23.9904μs | 41.6833 KOps/s | 41.7991 KOps/s | |
| test_getitem[list] | 0.1873ms | 57.8321μs | 17.2914 KOps/s | 16.9304 KOps/s | |
| test_setitem_dim[int] | 46.4110μs | 25.0069μs | 39.9890 KOps/s | 38.1343 KOps/s | |
| test_setitem_dim[slice_int] | 65.3400μs | 42.5058μs | 23.5262 KOps/s | 23.1396 KOps/s | |
| test_setitem_dim[range] | 0.1302ms | 94.6489μs | 10.5654 KOps/s | 10.4794 KOps/s | |
| test_setitem_dim[tuple] | 61.3710μs | 38.7480μs | 25.8078 KOps/s | 25.4353 KOps/s | |
| test_setitem | 63.1110μs | 17.5591μs | 56.9507 KOps/s | 56.4609 KOps/s | |
| test_set | 48.0410μs | 16.9195μs | 59.1035 KOps/s | 59.1777 KOps/s | |
| test_set_shared | 0.5176ms | 0.2046ms | 4.8882 KOps/s | 4.9501 KOps/s | |
| test_update | 0.2142ms | 21.5381μs | 46.4294 KOps/s | 46.0206 KOps/s | |
| test_update_nested | 77.2310μs | 33.1792μs | 30.1394 KOps/s | 30.0595 KOps/s | |
| test_update__nested | 0.4767ms | 33.3850μs | 29.9536 KOps/s | 28.4934 KOps/s | |
| test_set_nested | 46.5900μs | 18.8781μs | 52.9714 KOps/s | 52.7685 KOps/s | |
| test_set_nested_new | 76.7010μs | 23.6654μs | 42.2557 KOps/s | 41.9693 KOps/s | |
| test_select | 87.0210μs | 40.2810μs | 24.8256 KOps/s | 24.0524 KOps/s | |
| test_select_nested | 0.1109ms | 76.1037μs | 13.1400 KOps/s | 13.4474 KOps/s | |
| test_exclude_nested | 0.1312ms | 92.3902μs | 10.8237 KOps/s | 10.9120 KOps/s | |
| test_empty[True] | 0.4720ms | 0.4005ms | 2.4966 KOps/s | 2.5196 KOps/s | |
| test_empty[False] | 7.8850μs | 1.3308μs | 751.4357 KOps/s | 755.1165 KOps/s | |
| test_to | 0.1024ms | 70.5132μs | 14.1817 KOps/s | 13.7785 KOps/s | |
| test_to_nonblocking | 0.1056ms | 64.4452μs | 15.5171 KOps/s | 15.0035 KOps/s | |
| test_unbind_speed | 0.3810ms | 0.3339ms | 2.9951 KOps/s | 3.0172 KOps/s | |
| test_unbind_speed_stack0 | 0.3818ms | 0.3328ms | 3.0046 KOps/s | 3.0619 KOps/s | |
| test_unbind_speed_stack1 | 0.1083s | 1.0435ms | 958.3441 Ops/s | 1.1873 KOps/s | |
| test_split | 1.2174ms | 1.1481ms | 871.0328 Ops/s | 784.3383 Ops/s | |
| test_chunk | 0.1068s | 1.2092ms | 826.9837 Ops/s | 923.6165 Ops/s | |
| test_to_cpu_blocking | 19.5535ms | 18.8242ms | 53.1230 Ops/s | 35.6530 Ops/s | |
| test_to_cpu_global_sync | 11.6320ms | 11.3391ms | 88.1905 Ops/s | 78.6527 Ops/s | |
| test_to_cpu_event_sync | 0.1188s | 13.5140ms | 73.9974 Ops/s | 81.2933 Ops/s | |
| test_to_cpu_default | 12.5201ms | 12.2166ms | 81.8555 Ops/s | 81.1669 Ops/s | |
| test_consolidate[False-None] | 4.2309ms | 4.1581ms | 240.4940 Ops/s | 217.2359 Ops/s | |
| test_consolidate[default-None] | 2.1224ms | 2.0222ms | 494.4995 Ops/s | 486.6645 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0230ms | 1.9362ms | 516.4633 Ops/s | 506.1639 Ops/s | |
| test_consolidate_njt[False-None] | 8.7913ms | 8.5495ms | 116.9656 Ops/s | 116.9459 Ops/s | |
| test_to[False-False-None] | 2.1704ms | 2.0648ms | 484.3051 Ops/s | 476.2822 Ops/s | |
| test_to[True-False-None] | 2.1649ms | 1.9182ms | 521.3289 Ops/s | 514.9413 Ops/s | |
| test_to[within-False-None] | 6.3173ms | 6.2139ms | 160.9291 Ops/s | 163.2282 Ops/s | |
| test_to[True-default-None] | 9.2819ms | 9.0577ms | 110.4027 Ops/s | 106.9845 Ops/s | |
| test_to_njt[False-False-None] | 8.7890ms | 8.5020ms | 117.6190 Ops/s | 117.0489 Ops/s | |
| test_to_njt[True-False-None] | 7.1543ms | 6.9454ms | 143.9793 Ops/s | 143.7610 Ops/s | |
| test_to_njt[within-False-None] | 16.2207ms | 15.7313ms | 63.5675 Ops/s | 62.9926 Ops/s | |
| test_creation[device0] | 0.4174ms | 0.1164ms | 8.5909 KOps/s | 8.7385 KOps/s | |
| test_creation_from_tensor | 0.4110ms | 0.1172ms | 8.5350 KOps/s | 8.9076 KOps/s | |
| test_add_one[memmap_tensor0] | 0.4032ms | 6.6726μs | 149.8655 KOps/s | 141.7120 KOps/s | |
| test_contiguous[memmap_tensor0] | 34.3600μs | 0.7287μs | 1.3723 MOps/s | 1.9614 MOps/s | |
| test_stack[memmap_tensor0] | 37.8310μs | 4.5860μs | 218.0546 KOps/s | 218.3655 KOps/s | |
| test_memmaptd_index | 1.0560ms | 0.2717ms | 3.6806 KOps/s | 3.5824 KOps/s | |
| test_memmaptd_index_astensor | 0.5361ms | 0.3767ms | 2.6548 KOps/s | 2.5427 KOps/s | |
| test_memmaptd_index_op | 0.9632ms | 0.6297ms | 1.5879 KOps/s | 1.4865 KOps/s | |
| test_serialize_model | 0.3123s | 0.1614s | 6.1974 Ops/s | 7.3495 Ops/s | |
| test_serialize_model_pickle | 1.3474s | 1.2113s | 0.8256 Ops/s | 0.8261 Ops/s | |
| test_serialize_weights | 0.1391s | 0.1372s | 7.2876 Ops/s | 7.3700 Ops/s | |
| test_serialize_weights_returnearly | 0.4707s | 92.3803ms | 10.8248 Ops/s | 10.7005 Ops/s | |
| test_serialize_weights_pickle | 1.3656s | 1.2218s | 0.8185 Ops/s | 0.8196 Ops/s | |
| test_reshape_pytree | 0.2071ms | 32.7113μs | 30.5705 KOps/s | 30.3445 KOps/s | |
| test_reshape_td | 77.9410μs | 46.3069μs | 21.5951 KOps/s | 21.7670 KOps/s | |
| test_view_pytree | 0.2140ms | 32.0493μs | 31.2019 KOps/s | 30.7011 KOps/s | |
| test_view_td | 88.2810μs | 53.9406μs | 18.5389 KOps/s | 18.8808 KOps/s | |
| test_unbind_pytree | 0.2246ms | 36.2781μs | 27.5648 KOps/s | 27.6440 KOps/s | |
| test_unbind_td | 0.2168ms | 50.1725μs | 19.9312 KOps/s | 20.2493 KOps/s | |
| test_split_pytree | 0.2468ms | 42.4255μs | 23.5707 KOps/s | 23.5327 KOps/s | |
| test_split_td | 0.1738ms | 65.4171μs | 15.2865 KOps/s | 15.6303 KOps/s | |
| test_add_pytree | 0.1931ms | 42.4455μs | 23.5596 KOps/s | 23.4464 KOps/s | |
| test_add_td | 0.1176ms | 55.3943μs | 18.0524 KOps/s | 18.0783 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.2143ms | 0.1490ms | 6.7095 KOps/s | 6.8361 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.4140ms | 0.2002ms | 4.9944 KOps/s | 4.9959 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.2237ms | 0.1081ms | 9.2518 KOps/s | 9.2070 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4312ms | 0.1790ms | 5.5851 KOps/s | 5.5564 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.3259ms | 10.2884μs | 97.1969 KOps/s | 99.1380 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.1265ms | 54.1213μs | 18.4770 KOps/s | 18.4361 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1283ms | 9.8629μs | 101.3902 KOps/s | 103.3057 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4368ms | 68.1385μs | 14.6760 KOps/s | 14.3637 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2378ms | 0.1803ms | 5.5461 KOps/s | 5.2925 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3445ms | 0.2808ms | 3.5617 KOps/s | 3.5125 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.3399ms | 0.1206ms | 8.2912 KOps/s | 8.1740 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1291ms | 74.3801μs | 13.4445 KOps/s | 13.6176 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2047ms | 0.1585ms | 6.3094 KOps/s | 6.2229 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8081ms | 0.5270ms | 1.8977 KOps/s | 1.9131 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.4639ms | 0.3342ms | 2.9919 KOps/s | 2.9731 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2702ms | 0.1838ms | 5.4402 KOps/s | 5.3150 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1546ms | 93.8667μs | 10.6534 KOps/s | 11.2171 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.3998ms | 0.1236ms | 8.0910 KOps/s | 8.0098 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.7109ms | 0.4422ms | 2.2614 KOps/s | 2.2754 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.3243ms | 0.1589ms | 6.2938 KOps/s | 6.2212 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1153ms | 14.2303μs | 70.2727 KOps/s | 75.0994 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 83.5010μs | 41.1625μs | 24.2940 KOps/s | 23.8189 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.1300ms | 10.6103μs | 94.2480 KOps/s | 93.7626 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4105ms | 52.7329μs | 18.9635 KOps/s | 18.9901 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0134ms | 0.1729ms | 5.7849 KOps/s | 5.4381 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.4170ms | 3.2885ms | 304.0886 Ops/s | 287.8791 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.9758ms | 0.1616ms | 6.1892 KOps/s | 6.0822 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.9197ms | 2.7737ms | 360.5231 Ops/s | 343.5056 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1573ms | 0.1096ms | 9.1224 KOps/s | 8.6739 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3177ms | 73.4552μs | 13.6137 KOps/s | 13.3368 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.1656ms | 97.2952μs | 10.2780 KOps/s | 10.4201 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2528ms | 43.9939μs | 22.7304 KOps/s | 22.4954 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1544ms | 98.1532μs | 10.1882 KOps/s | 10.3182 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2625ms | 43.9583μs | 22.7488 KOps/s | 21.2364 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.2269ms | 56.2791μs | 17.7686 KOps/s | 17.0230 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2139ms | 27.1671μs | 36.8092 KOps/s | 36.2176 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 82.1310μs | 44.4330μs | 22.5058 KOps/s | 22.3215 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2792ms | 22.3219μs | 44.7991 KOps/s | 44.1125 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 83.0810μs | 44.0181μs | 22.7179 KOps/s | 22.3506 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2769ms | 22.3985μs | 44.6459 KOps/s | 43.9830 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1040ms | 55.9407μs | 17.8761 KOps/s | 17.4092 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2055ms | 26.3326μs | 37.9758 KOps/s | 36.4648 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 88.8910μs | 44.8750μs | 22.2841 KOps/s | 22.0333 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2924ms | 22.3266μs | 44.7896 KOps/s | 44.3433 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 80.3510μs | 44.5609μs | 22.4412 KOps/s | 21.9488 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2574ms | 22.1480μs | 45.1507 KOps/s | 44.3996 KOps/s | |
| test_compile_replace[single-eager] | 0.1199ms | 46.9380μs | 21.3047 KOps/s | 21.1875 KOps/s | |
| test_compile_replace[single-compile] | 0.2415ms | 0.1054ms | 9.4888 KOps/s | 9.0967 KOps/s | |
| test_compile_replace[multi-eager] | 0.6090ms | 0.5615ms | 1.7808 KOps/s | 1.7133 KOps/s | |
| test_compile_replace[multi-compile] | 0.1813ms | 0.1120ms | 8.9299 KOps/s | 8.5644 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2131ms | 0.1669ms | 5.9924 KOps/s | 5.9983 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.4133ms | 0.1200ms | 8.3349 KOps/s | 8.0781 KOps/s | |
| test_compile_clone_shallow[20-eager] | 84.4510μs | 19.2458μs | 51.9594 KOps/s | 53.0138 KOps/s | |
| test_compile_clone_shallow[20-compile] | 62.4510μs | 11.5981μs | 86.2207 KOps/s | 92.0748 KOps/s | |
| test_compile_clone_shallow[40-eager] | 65.2510μs | 34.1628μs | 29.2716 KOps/s | 29.5080 KOps/s | |
| test_compile_clone_shallow[40-compile] | 61.6010μs | 12.7534μs | 78.4103 KOps/s | 81.1723 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.1371ms | 62.4622μs | 16.0097 KOps/s | 15.6902 KOps/s | |
| test_compile_clone_shallow[80-compile] | 59.8100μs | 15.2559μs | 65.5485 KOps/s | 67.4377 KOps/s | |
| test_compile_update_inplace[eager] | 94.3320μs | 59.1258μs | 16.9131 KOps/s | 16.9138 KOps/s | |
| test_compile_update_inplace[compile] | 0.2871ms | 0.1394ms | 7.1736 KOps/s | 6.7247 KOps/s | |
| test_mod_add[eager] | 93.1410μs | 51.0641μs | 19.5832 KOps/s | 20.4551 KOps/s | |
| test_mod_add[compile] | 0.1416ms | 0.1033ms | 9.6768 KOps/s | 9.4041 KOps/s | |
| test_mod_add[compile-overhead] | 0.2328ms | 0.1482ms | 6.7492 KOps/s | 6.6125 KOps/s | |
| test_mod_wrap[eager] | 0.3649ms | 0.2886ms | 3.4652 KOps/s | 3.3460 KOps/s | |
| test_mod_wrap[compile] | 0.5026ms | 0.3529ms | 2.8334 KOps/s | 2.7931 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.3260ms | 4.0345ms | 247.8591 Ops/s | 248.0903 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6036ms | 1.5041ms | 664.8375 Ops/s | 661.6995 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.5641ms | 1.4390ms | 694.9172 Ops/s | 683.5378 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2508ms | 0.8900ms | 1.1236 KOps/s | 994.3950 Ops/s | |
| test_seq_add[eager] | 0.2093ms | 0.1543ms | 6.4801 KOps/s | 6.1472 KOps/s | |
| test_seq_add[compile] | 0.1896ms | 0.1125ms | 8.8875 KOps/s | 8.1445 KOps/s | |
| test_seq_add[compile-overhead] | 0.1952ms | 0.1536ms | 6.5113 KOps/s | 6.3386 KOps/s | |
| test_seq_wrap[eager] | 0.5908ms | 0.5234ms | 1.9105 KOps/s | 1.8614 KOps/s | |
| test_seq_wrap[compile] | 0.4424ms | 0.3629ms | 2.7558 KOps/s | 2.5605 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.3258ms | 0.2672ms | 3.7429 KOps/s | 3.5936 KOps/s | |
| test_func_call_runtime[False-eager] | 0.9033ms | 0.8358ms | 1.1965 KOps/s | 1.1122 KOps/s | |
| test_func_call_runtime[False-compile] | 1.0746ms | 0.9016ms | 1.1092 KOps/s | 1.0415 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5206ms | 0.4613ms | 2.1678 KOps/s | 2.1216 KOps/s | |
| test_func_call_runtime[True-eager] | 1.2476ms | 1.0747ms | 930.4935 Ops/s | 917.6295 Ops/s | |
| test_func_call_runtime[True-compile] | 0.9805ms | 0.9156ms | 1.0922 KOps/s | 1.0602 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5170ms | 0.4759ms | 2.1012 KOps/s | 2.0611 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9154ms | 0.8302ms | 1.2045 KOps/s | 1.1064 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 0.9902ms | 0.9031ms | 1.1073 KOps/s | 1.0430 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5367ms | 0.4640ms | 2.1553 KOps/s | 2.1265 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3229ms | 1.2215ms | 818.6780 Ops/s | 815.4817 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.0981ms | 0.9499ms | 1.0528 KOps/s | 987.0900 Ops/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5595ms | 0.5079ms | 1.9691 KOps/s | 1.9283 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8774ms | 2.3801ms | 420.1436 Ops/s | 417.1371 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0576ms | 0.9729ms | 1.0278 KOps/s | 1.0129 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.6079ms | 0.5153ms | 1.9407 KOps/s | 1.9280 KOps/s | |
| test_distributed | 2.6095ms | 0.1675ms | 5.9688 KOps/s | 6.4998 KOps/s | |
| test_tdmodule | 46.6800μs | 27.1856μs | 36.7841 KOps/s | 34.8266 KOps/s | |
| test_tdmodule_dispatch | 74.7310μs | 45.1485μs | 22.1491 KOps/s | 21.0183 KOps/s | |
| test_tdseq | 48.7010μs | 26.9747μs | 37.0717 KOps/s | 36.1905 KOps/s | |
| test_tdseq_dispatch | 74.3710μs | 47.1955μs | 21.1885 KOps/s | 20.7823 KOps/s | |
| test_instantiation_functorch | 2.3041ms | 2.0650ms | 484.2536 Ops/s | 477.4496 Ops/s | |
| test_exec_functorch | 0.2232ms | 0.1788ms | 5.5939 KOps/s | 5.5186 KOps/s | |
| test_exec_functional_call | 0.2129ms | 0.1582ms | 6.3204 KOps/s | 6.1701 KOps/s | |
| test_exec_td_decorator | 0.4416ms | 0.2365ms | 4.2282 KOps/s | 4.1899 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0467ms | 0.8268ms | 1.2095 KOps/s | 1.2079 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0089ms | 0.8253ms | 1.2117 KOps/s | 1.2080 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.8920ms | 0.7125ms | 1.4034 KOps/s | 1.3985 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8953ms | 0.7107ms | 1.4070 KOps/s | 1.4015 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 20.9473ms | 20.5104ms | 48.7557 Ops/s | 48.5871 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.1469ms | 20.5193ms | 48.7347 Ops/s | 48.5294 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 21.0960ms | 20.3112ms | 49.2339 Ops/s | 49.2114 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 21.0325ms | 20.3180ms | 49.2174 Ops/s | 49.0513 Ops/s | |
| test_to_module_speed[True] | 1.6893ms | 1.4837ms | 674.0051 Ops/s | 677.8840 Ops/s | |
| test_to_module_speed[False] | 1.9447ms | 1.4651ms | 682.5291 Ops/s | 698.7708 Ops/s | |
| test_tc_init | 0.1013ms | 44.6288μs | 22.4070 KOps/s | 22.0344 KOps/s | |
| test_tc_init_tensor_only | 33.8710μs | 9.6712μs | 103.3998 KOps/s | 102.7695 KOps/s | |
| test_tc_init_nested | 0.1449ms | 87.0039μs | 11.4937 KOps/s | 11.0915 KOps/s | |
| test_tc_init_many_fields | 41.7610μs | 16.2410μs | 61.5725 KOps/s | 60.7864 KOps/s | |
| test_tc_first_layer_tensor | 31.5800μs | 1.8166μs | 550.4796 KOps/s | 555.1157 KOps/s | |
| test_tc_first_layer_tensor_only | 1.4185μs | 0.3959μs | 2.5259 MOps/s | 2.5589 MOps/s | |
| test_tc_first_layer_tensor_set | 45.7200μs | 3.9761μs | 251.5050 KOps/s | 255.2568 KOps/s | |
| test_tc_first_layer_tensor_only_set | 87.1210μs | 3.2762μs | 305.2276 KOps/s | 303.9209 KOps/s | |
| test_tc_first_layer_nontensor | 49.9410μs | 6.1513μs | 162.5670 KOps/s | 161.6222 KOps/s | |
| test_tc_second_layer_tensor | 25.7900μs | 4.4161μs | 226.4434 KOps/s | 229.1020 KOps/s | |
| test_tc_second_layer_nontensor | 0.1193ms | 8.6894μs | 115.0831 KOps/s | 115.7815 KOps/s | |
| test_unbind | 0.2697s | 16.7863ms | 59.5724 Ops/s | 53.7523 Ops/s | |
| test_full_like | 4.9549ms | 4.4719ms | 223.6176 Ops/s | 214.6760 Ops/s | |
| test_zeros_like | 5.0079ms | 4.4323ms | 225.6165 Ops/s | 133.7640 Ops/s | |
| test_ones_like | 4.6806ms | 4.4426ms | 225.0947 Ops/s | 230.3493 Ops/s | |
| test_clone | 7.7022ms | 6.8505ms | 145.9746 Ops/s | 147.1398 Ops/s | |
| test_squeeze | 0.1757ms | 14.5722μs | 68.6238 KOps/s | 71.8232 KOps/s | |
| test_unsqueeze | 0.1655ms | 0.1106ms | 9.0425 KOps/s | 8.8943 KOps/s | |
| test_split | 0.2709ms | 0.1855ms | 5.3921 KOps/s | 5.4091 KOps/s | |
| test_permute | 0.2590ms | 0.2052ms | 4.8745 KOps/s | 4.8998 KOps/s | |
| test_stack | 36.7458ms | 35.8936ms | 27.8601 Ops/s | 19.1144 Ops/s | |
| test_cat | 36.4606ms | 35.7396ms | 27.9801 Ops/s | 19.1662 Ops/s | |
| test_sequential_tensordict | 0.6191ms | 0.2198ms | 4.5502 KOps/s | 4.4543 KOps/s | |
| test_sequential_graph_module | 0.1698ms | 0.1175ms | 8.5097 KOps/s | 8.3660 KOps/s | |
| test_nested_tensordict | 0.5583ms | 0.2800ms | 3.5711 KOps/s | 3.4116 KOps/s | |
| test_nested_graph_module | 0.1762ms | 0.1274ms | 7.8486 KOps/s | 7.7544 KOps/s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):
Implement _dtensor_send_optimal and _dtensor_recv_optimal:
extracts only the needed slices from local shards, and sends via P2P
of the local buffer, wraps as DTensor via from_local()
Update "auto" strategy resolution to pick "optimal" when dst_mesh/src_mesh
and dst_placements/src_placements are provided, falling back to "materialize"
otherwise.
Add _mesh_to_rank_map and _mesh_all_ranks helpers to _dtensor.py.
Made-with: Cursor