[DTensor] Add example scripts for cross-mesh DTensor transfer#1647
[DTensor] Add example scripts for cross-mesh DTensor transfer#1647vmoens wants to merge 6 commits intogh/vmoens/88/basefrom
Conversation
- dtensor_transfer_plan_test.py: CPU-only test for shard algebra and transfer plan computation (no GPUs needed) - dtensor_transfer_distributed_test.py: Multi-GPU test for strategies A and B using torchrun with real DTensors on NCCL - minimal_p2p_test.py: Minimal NCCL P2P test for JSON metadata serialization over CUDA byte tensors Made-with: Cursor ghstack-source-id: ccca9b7 Pull-Request: #1647
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 32.7620μs | 14.9071μs | 67.0821 KOps/s | 68.2113 KOps/s | |
| test_plain_set_stack_nested | 46.8830μs | 15.0607μs | 66.3982 KOps/s | 67.0347 KOps/s | |
| test_plain_set_nested_inplace | 82.4740μs | 16.8517μs | 59.3412 KOps/s | 59.8231 KOps/s | |
| test_plain_set_stack_nested_inplace | 54.1630μs | 16.7361μs | 59.7511 KOps/s | 60.7876 KOps/s | |
| test_items | 41.3320μs | 6.0066μs | 166.4824 KOps/s | 165.5070 KOps/s | |
| test_items_nested | 0.5295ms | 0.4691ms | 2.1317 KOps/s | 2.1466 KOps/s | |
| test_items_nested_locked | 0.5572ms | 0.4745ms | 2.1074 KOps/s | 2.1087 KOps/s | |
| test_items_nested_leaf | 0.1430ms | 97.5314μs | 10.2531 KOps/s | 10.0942 KOps/s | |
| test_items_stack_nested | 0.5126ms | 0.4683ms | 2.1355 KOps/s | 2.1456 KOps/s | |
| test_items_stack_nested_leaf | 0.1424ms | 97.3412μs | 10.2731 KOps/s | 10.2684 KOps/s | |
| test_items_stack_nested_locked | 0.5816ms | 0.4704ms | 2.1257 KOps/s | 2.1419 KOps/s | |
| test_keys | 32.0120μs | 4.2166μs | 237.1579 KOps/s | 237.3731 KOps/s | |
| test_keys_nested | 0.2037ms | 0.1304ms | 7.6678 KOps/s | 7.7280 KOps/s | |
| test_keys_nested_locked | 0.8101ms | 0.1391ms | 7.1916 KOps/s | 7.3062 KOps/s | |
| test_keys_nested_leaf | 0.1687ms | 0.1213ms | 8.2419 KOps/s | 8.3646 KOps/s | |
| test_keys_stack_nested | 0.1907ms | 0.1320ms | 7.5764 KOps/s | 7.7104 KOps/s | |
| test_keys_stack_nested_leaf | 0.2459ms | 0.1195ms | 8.3704 KOps/s | 8.3584 KOps/s | |
| test_keys_stack_nested_locked | 0.1910ms | 0.1403ms | 7.1287 KOps/s | 7.2762 KOps/s | |
| test_values | 3.2142μs | 0.9982μs | 1.0019 MOps/s | 978.1402 KOps/s | |
| test_values_nested | 76.5050μs | 53.3537μs | 18.7428 KOps/s | 19.0143 KOps/s | |
| test_values_nested_locked | 80.4750μs | 56.3460μs | 17.7475 KOps/s | 17.7475 KOps/s | |
| test_values_nested_leaf | 92.2950μs | 61.1283μs | 16.3590 KOps/s | 16.4785 KOps/s | |
| test_values_stack_nested | 80.7850μs | 53.1079μs | 18.8296 KOps/s | 18.9219 KOps/s | |
| test_values_stack_nested_leaf | 87.7250μs | 60.9494μs | 16.4071 KOps/s | 16.5656 KOps/s | |
| test_values_stack_nested_locked | 91.3450μs | 56.3993μs | 17.7307 KOps/s | 17.7737 KOps/s | |
| test_membership | 12.1492μs | 0.8568μs | 1.1672 MOps/s | 1.1628 MOps/s | |
| test_membership_nested | 39.8520μs | 2.8983μs | 345.0264 KOps/s | 344.9239 KOps/s | |
| test_membership_nested_leaf | 16.6210μs | 2.8177μs | 354.8987 KOps/s | 366.6981 KOps/s | |
| test_membership_stacked_nested | 38.1030μs | 2.9111μs | 343.5111 KOps/s | 341.3714 KOps/s | |
| test_membership_stacked_nested_leaf | 23.7410μs | 2.9047μs | 344.2730 KOps/s | 341.7357 KOps/s | |
| test_membership_nested_last | 30.9410μs | 4.3560μs | 229.5695 KOps/s | 230.3162 KOps/s | |
| test_membership_nested_leaf_last | 69.5230μs | 4.3497μs | 229.9005 KOps/s | 230.9201 KOps/s | |
| test_membership_stacked_nested_last | 33.3120μs | 4.3831μs | 228.1466 KOps/s | 229.6074 KOps/s | |
| test_membership_stacked_nested_leaf_last | 38.1330μs | 4.3820μs | 228.2062 KOps/s | 231.7446 KOps/s | |
| test_nested_getleaf | 50.0930μs | 21.5932μs | 46.3110 KOps/s | 45.9547 KOps/s | |
| test_nested_get | 52.1230μs | 20.2285μs | 49.4352 KOps/s | 48.7827 KOps/s | |
| test_stacked_getleaf | 74.5840μs | 21.7054μs | 46.0715 KOps/s | 46.4866 KOps/s | |
| test_stacked_get | 88.8050μs | 20.6203μs | 48.4960 KOps/s | 49.0583 KOps/s | |
| test_nested_getitemleaf | 45.1930μs | 22.2006μs | 45.0438 KOps/s | 44.7158 KOps/s | |
| test_nested_getitem | 60.5230μs | 20.7631μs | 48.1624 KOps/s | 47.0783 KOps/s | |
| test_stacked_getitemleaf | 50.8730μs | 22.4037μs | 44.6355 KOps/s | 44.7283 KOps/s | |
| test_stacked_getitem | 66.3230μs | 21.1720μs | 47.2321 KOps/s | 47.2206 KOps/s | |
| test_lock_nested | 7.9511ms | 0.4876ms | 2.0507 KOps/s | 2.1032 KOps/s | |
| test_lock_stack_nested | 0.5268ms | 0.4818ms | 2.0758 KOps/s | 2.0582 KOps/s | |
| test_unlock_nested | 0.5105ms | 0.3903ms | 2.5620 KOps/s | 2.5637 KOps/s | |
| test_unlock_stack_nested | 0.4593ms | 0.3904ms | 2.5617 KOps/s | 2.5473 KOps/s | |
| test_flatten_speed | 0.1658ms | 0.1225ms | 8.1656 KOps/s | 8.1488 KOps/s | |
| test_unflatten_speed | 0.6359ms | 0.5716ms | 1.7494 KOps/s | 1.7408 KOps/s | |
| test_common_ops | 0.8521ms | 0.6961ms | 1.4365 KOps/s | 1.4446 KOps/s | |
| test_creation | 0.1040ms | 3.1415μs | 318.3188 KOps/s | 316.3879 KOps/s | |
| test_creation_empty | 25.7110μs | 6.9950μs | 142.9585 KOps/s | 143.3961 KOps/s | |
| test_creation_nested_1 | 44.0120μs | 11.5629μs | 86.4838 KOps/s | 86.6581 KOps/s | |
| test_creation_nested_2 | 39.5720μs | 13.2744μs | 75.3331 KOps/s | 75.3493 KOps/s | |
| test_creation_many_keys[10] | 56.4530μs | 20.7855μs | 48.1105 KOps/s | 47.6855 KOps/s | |
| test_creation_many_keys[50] | 0.1674ms | 88.8020μs | 11.2610 KOps/s | 11.1003 KOps/s | |
| test_creation_many_keys[100] | 0.2303ms | 0.1744ms | 5.7346 KOps/s | 5.6634 KOps/s | |
| test_creation_nested_many_keys[10] | 74.7050μs | 44.8096μs | 22.3166 KOps/s | 22.2210 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2531ms | 0.1827ms | 5.4726 KOps/s | 5.4632 KOps/s | |
| test_clone | 46.7430μs | 13.5529μs | 73.7851 KOps/s | 74.9668 KOps/s | |
| test_getitem[int] | 1.6075ms | 15.2963μs | 65.3754 KOps/s | 59.4654 KOps/s | |
| test_getitem[slice_int] | 0.1301ms | 24.0213μs | 41.6298 KOps/s | 41.1801 KOps/s | |
| test_getitem[range] | 0.1732ms | 63.1144μs | 15.8442 KOps/s | 15.6346 KOps/s | |
| test_getitem[tuple] | 0.1392ms | 23.7913μs | 42.0323 KOps/s | 41.8334 KOps/s | |
| test_getitem[list] | 0.1833ms | 57.5763μs | 17.3683 KOps/s | 17.0285 KOps/s | |
| test_setitem_dim[int] | 48.8130μs | 26.0244μs | 38.4256 KOps/s | 38.6101 KOps/s | |
| test_setitem_dim[slice_int] | 69.5340μs | 42.7845μs | 23.3730 KOps/s | 22.5483 KOps/s | |
| test_setitem_dim[range] | 0.1181ms | 95.1224μs | 10.5128 KOps/s | 10.5149 KOps/s | |
| test_setitem_dim[tuple] | 68.5140μs | 40.0086μs | 24.9946 KOps/s | 24.6258 KOps/s | |
| test_setitem | 59.9230μs | 17.8708μs | 55.9573 KOps/s | 56.0619 KOps/s | |
| test_set | 66.6630μs | 17.1212μs | 58.4072 KOps/s | 58.4590 KOps/s | |
| test_set_shared | 0.4991ms | 0.2035ms | 4.9136 KOps/s | 4.9090 KOps/s | |
| test_update | 0.1929ms | 21.3824μs | 46.7673 KOps/s | 46.4811 KOps/s | |
| test_update_nested | 80.3950μs | 33.4508μs | 29.8947 KOps/s | 30.4686 KOps/s | |
| test_update__nested | 0.4627ms | 34.5011μs | 28.9846 KOps/s | 28.9432 KOps/s | |
| test_set_nested | 55.6230μs | 18.7284μs | 53.3949 KOps/s | 52.9745 KOps/s | |
| test_set_nested_new | 62.0330μs | 23.8249μs | 41.9729 KOps/s | 41.6260 KOps/s | |
| test_select | 77.6350μs | 39.7909μs | 25.1314 KOps/s | 24.7940 KOps/s | |
| test_select_nested | 0.1100ms | 74.3721μs | 13.4459 KOps/s | 13.5215 KOps/s | |
| test_exclude_nested | 0.1452ms | 91.8047μs | 10.8927 KOps/s | 10.8416 KOps/s | |
| test_empty[True] | 0.4810ms | 0.3984ms | 2.5097 KOps/s | 2.5060 KOps/s | |
| test_empty[False] | 7.1355μs | 1.3164μs | 759.6587 KOps/s | 756.2817 KOps/s | |
| test_to | 0.1123ms | 75.6481μs | 13.2191 KOps/s | 13.3468 KOps/s | |
| test_to_nonblocking | 0.1212ms | 65.1480μs | 15.3497 KOps/s | 15.4466 KOps/s | |
| test_unbind_speed | 0.4016ms | 0.3353ms | 2.9820 KOps/s | 2.9960 KOps/s | |
| test_unbind_speed_stack0 | 0.4023ms | 0.3301ms | 3.0292 KOps/s | 3.0112 KOps/s | |
| test_unbind_speed_stack1 | 0.1044s | 0.8391ms | 1.1918 KOps/s | 1.1907 KOps/s | |
| test_split | 0.1044s | 1.2636ms | 791.3955 Ops/s | 786.3632 Ops/s | |
| test_chunk | 0.1034s | 1.2105ms | 826.1307 Ops/s | 924.8730 Ops/s | |
| test_to_cpu_blocking | 19.8872ms | 19.6675ms | 50.8454 Ops/s | 45.7915 Ops/s | |
| test_to_cpu_global_sync | 11.6345ms | 11.4833ms | 87.0832 Ops/s | 87.2906 Ops/s | |
| test_to_cpu_event_sync | 12.7610ms | 12.4772ms | 80.1462 Ops/s | 80.0242 Ops/s | |
| test_to_cpu_default | 0.1166s | 13.8117ms | 72.4022 Ops/s | 80.0225 Ops/s | |
| test_consolidate[False-None] | 4.2663ms | 4.1716ms | 239.7176 Ops/s | 240.1108 Ops/s | |
| test_consolidate[default-None] | 2.1389ms | 2.0302ms | 492.5573 Ops/s | 492.0238 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0409ms | 1.9476ms | 513.4413 Ops/s | 499.7398 Ops/s | |
| test_consolidate_njt[False-None] | 0.1902s | 10.0536ms | 99.4670 Ops/s | 117.6362 Ops/s | |
| test_to[False-False-None] | 2.2119ms | 2.1100ms | 473.9228 Ops/s | 474.0539 Ops/s | |
| test_to[True-False-None] | 2.1576ms | 1.9144ms | 522.3512 Ops/s | 521.0134 Ops/s | |
| test_to[within-False-None] | 6.3037ms | 6.1250ms | 163.2646 Ops/s | 163.0031 Ops/s | |
| test_to[True-default-None] | 9.0195ms | 8.8511ms | 112.9797 Ops/s | 112.0771 Ops/s | |
| test_to_njt[False-False-None] | 8.5835ms | 8.4551ms | 118.2713 Ops/s | 116.7587 Ops/s | |
| test_to_njt[True-False-None] | 7.4474ms | 7.2478ms | 137.9734 Ops/s | 143.0337 Ops/s | |
| test_to_njt[within-False-None] | 15.7145ms | 15.5068ms | 64.4877 Ops/s | 63.5187 Ops/s | |
| test_creation[device0] | 0.3873ms | 0.1151ms | 8.6887 KOps/s | 8.8156 KOps/s | |
| test_creation_from_tensor | 0.3976ms | 0.1136ms | 8.8055 KOps/s | 8.9914 KOps/s | |
| test_add_one[memmap_tensor0] | 0.1954ms | 6.7278μs | 148.6369 KOps/s | 148.0037 KOps/s | |
| test_contiguous[memmap_tensor0] | 30.7020μs | 0.6736μs | 1.4846 MOps/s | 2.0982 MOps/s | |
| test_stack[memmap_tensor0] | 30.4320μs | 4.6357μs | 215.7176 KOps/s | 215.9403 KOps/s | |
| test_memmaptd_index | 1.0309ms | 0.2715ms | 3.6829 KOps/s | 3.6793 KOps/s | |
| test_memmaptd_index_astensor | 0.5338ms | 0.3744ms | 2.6710 KOps/s | 2.6843 KOps/s | |
| test_memmaptd_index_op | 0.8067ms | 0.6277ms | 1.5932 KOps/s | 1.5946 KOps/s | |
| test_serialize_model | 0.1402s | 0.1371s | 7.2918 Ops/s | 7.3467 Ops/s | |
| test_serialize_model_pickle | 1.3496s | 1.2101s | 0.8263 Ops/s | 0.8237 Ops/s | |
| test_serialize_weights | 0.1358s | 0.1345s | 7.4328 Ops/s | 7.3485 Ops/s | |
| test_serialize_weights_returnearly | 0.4325s | 88.0997ms | 11.3508 Ops/s | 6.2217 Ops/s | |
| test_serialize_weights_pickle | 1.3655s | 1.2154s | 0.8227 Ops/s | 0.8232 Ops/s | |
| test_reshape_pytree | 0.1994ms | 32.6023μs | 30.6727 KOps/s | 30.7853 KOps/s | |
| test_reshape_td | 86.8050μs | 45.1900μs | 22.1288 KOps/s | 22.4250 KOps/s | |
| test_view_pytree | 0.2630ms | 32.4040μs | 30.8604 KOps/s | 30.9125 KOps/s | |
| test_view_td | 0.1160ms | 52.2101μs | 19.1534 KOps/s | 18.8022 KOps/s | |
| test_unbind_pytree | 0.2224ms | 35.9378μs | 27.8258 KOps/s | 27.5802 KOps/s | |
| test_unbind_td | 93.7450μs | 49.4315μs | 20.2300 KOps/s | 20.2535 KOps/s | |
| test_split_pytree | 0.1925ms | 41.8823μs | 23.8764 KOps/s | 23.6216 KOps/s | |
| test_split_td | 0.1926ms | 64.9130μs | 15.4052 KOps/s | 15.4880 KOps/s | |
| test_add_pytree | 0.1860ms | 42.1834μs | 23.7060 KOps/s | 23.5030 KOps/s | |
| test_add_td | 0.2070ms | 56.5202μs | 17.6928 KOps/s | 17.8340 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.1985ms | 0.1384ms | 7.2266 KOps/s | 6.5670 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.3394ms | 0.2031ms | 4.9233 KOps/s | 5.0276 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1684ms | 0.1069ms | 9.3562 KOps/s | 8.9585 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4367ms | 0.1784ms | 5.6057 KOps/s | 5.6296 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.3500ms | 10.3094μs | 96.9988 KOps/s | 97.1638 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 87.7750μs | 53.9746μs | 18.5272 KOps/s | 18.2952 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1426ms | 9.8568μs | 101.4524 KOps/s | 99.8565 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4448ms | 67.8412μs | 14.7403 KOps/s | 14.6288 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2349ms | 0.1745ms | 5.7322 KOps/s | 5.3896 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.4574ms | 0.2807ms | 3.5621 KOps/s | 3.5646 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.2633ms | 0.1142ms | 8.7593 KOps/s | 8.1637 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1303ms | 72.4686μs | 13.7991 KOps/s | 13.5846 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2245ms | 0.1566ms | 6.3869 KOps/s | 6.2155 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.8188ms | 0.5173ms | 1.9333 KOps/s | 1.9356 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.4364ms | 0.3311ms | 3.0203 KOps/s | 2.9983 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.2684ms | 0.1763ms | 5.6734 KOps/s | 5.2746 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1298ms | 87.9860μs | 11.3654 KOps/s | 11.2661 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.2471ms | 0.1168ms | 8.5642 KOps/s | 8.0179 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6376ms | 0.4250ms | 2.3531 KOps/s | 2.3385 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.2721ms | 0.1573ms | 6.3590 KOps/s | 6.0764 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1250ms | 13.3899μs | 74.6834 KOps/s | 69.4132 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 70.7240μs | 41.4486μs | 24.1263 KOps/s | 23.9871 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 75.6440μs | 10.9371μs | 91.4318 KOps/s | 92.6244 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.1798s | 62.1683μs | 16.0854 KOps/s | 19.0020 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0275ms | 0.1743ms | 5.7375 KOps/s | 5.3132 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.4296ms | 3.3004ms | 302.9945 Ops/s | 300.5842 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 2.0258ms | 0.1614ms | 6.1964 KOps/s | 6.0567 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 2.9518ms | 2.7870ms | 358.8103 Ops/s | 358.8147 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.2210ms | 0.1071ms | 9.3351 KOps/s | 8.7222 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3104ms | 72.6576μs | 13.7632 KOps/s | 13.3499 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.2721ms | 93.5751μs | 10.6866 KOps/s | 10.2058 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2680ms | 43.8897μs | 22.7844 KOps/s | 23.0074 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1380ms | 94.2027μs | 10.6154 KOps/s | 10.1262 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2883ms | 43.9228μs | 22.7672 KOps/s | 23.1606 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.2255ms | 56.8596μs | 17.5872 KOps/s | 17.0273 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2193ms | 27.4435μs | 36.4385 KOps/s | 36.3167 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1963ms | 44.6619μs | 22.3904 KOps/s | 22.7289 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2466ms | 22.5003μs | 44.4438 KOps/s | 45.0084 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 89.6750μs | 45.7987μs | 21.8347 KOps/s | 21.9363 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2936ms | 22.3073μs | 44.8283 KOps/s | 44.5626 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 0.1021ms | 58.0702μs | 17.2205 KOps/s | 16.9482 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2135ms | 27.6138μs | 36.2138 KOps/s | 36.2511 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 87.3050μs | 45.1306μs | 22.1579 KOps/s | 22.1109 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2777ms | 22.4021μs | 44.6387 KOps/s | 44.6400 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 83.6940μs | 45.3270μs | 22.0619 KOps/s | 22.8034 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2917ms | 22.4333μs | 44.5766 KOps/s | 44.4241 KOps/s | |
| test_compile_replace[single-eager] | 0.1046ms | 46.4289μs | 21.5383 KOps/s | 21.4916 KOps/s | |
| test_compile_replace[single-compile] | 0.1734ms | 0.1022ms | 9.7817 KOps/s | 9.3506 KOps/s | |
| test_compile_replace[multi-eager] | 0.6459ms | 0.5491ms | 1.8213 KOps/s | 1.8168 KOps/s | |
| test_compile_replace[multi-compile] | 0.2077ms | 0.1100ms | 9.0923 KOps/s | 8.8806 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.3491ms | 0.1681ms | 5.9500 KOps/s | 6.1187 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.2549ms | 0.1180ms | 8.4746 KOps/s | 8.2958 KOps/s | |
| test_compile_clone_shallow[20-eager] | 49.5230μs | 19.3468μs | 51.6882 KOps/s | 52.1049 KOps/s | |
| test_compile_clone_shallow[20-compile] | 60.2830μs | 11.4928μs | 87.0110 KOps/s | 87.5346 KOps/s | |
| test_compile_clone_shallow[40-eager] | 0.1013ms | 33.8168μs | 29.5711 KOps/s | 29.4014 KOps/s | |
| test_compile_clone_shallow[40-compile] | 68.4340μs | 12.5365μs | 79.7670 KOps/s | 81.0391 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.1070ms | 63.4082μs | 15.7708 KOps/s | 15.9685 KOps/s | |
| test_compile_clone_shallow[80-compile] | 73.0840μs | 15.0154μs | 66.5981 KOps/s | 66.1131 KOps/s | |
| test_compile_update_inplace[eager] | 0.1687ms | 60.1080μs | 16.6367 KOps/s | 17.2503 KOps/s | |
| test_compile_update_inplace[compile] | 0.1841ms | 0.1399ms | 7.1482 KOps/s | 6.9362 KOps/s | |
| test_mod_add[eager] | 79.2440μs | 48.1530μs | 20.7671 KOps/s | 20.6610 KOps/s | |
| test_mod_add[compile] | 0.2492ms | 0.1043ms | 9.5845 KOps/s | 9.1157 KOps/s | |
| test_mod_add[compile-overhead] | 0.2335ms | 0.1459ms | 6.8555 KOps/s | 6.6085 KOps/s | |
| test_mod_wrap[eager] | 0.4260ms | 0.2881ms | 3.4715 KOps/s | 3.3217 KOps/s | |
| test_mod_wrap[compile] | 0.4106ms | 0.3453ms | 2.8957 KOps/s | 2.7682 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.3190ms | 4.0406ms | 247.4889 Ops/s | 244.9846 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.6686ms | 1.4836ms | 674.0503 Ops/s | 670.7712 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.7051ms | 1.4318ms | 698.4286 Ops/s | 692.6691 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2696ms | 0.8761ms | 1.1415 KOps/s | 1.1121 KOps/s | |
| test_seq_add[eager] | 0.2073ms | 0.1516ms | 6.5977 KOps/s | 6.2443 KOps/s | |
| test_seq_add[compile] | 0.2681ms | 0.1168ms | 8.5631 KOps/s | 8.5893 KOps/s | |
| test_seq_add[compile-overhead] | 0.2906ms | 0.1525ms | 6.5590 KOps/s | 6.1391 KOps/s | |
| test_seq_wrap[eager] | 0.5799ms | 0.5127ms | 1.9506 KOps/s | 1.8407 KOps/s | |
| test_seq_wrap[compile] | 0.4580ms | 0.3625ms | 2.7588 KOps/s | 2.6278 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.4167ms | 0.2610ms | 3.8314 KOps/s | 3.5988 KOps/s | |
| test_func_call_runtime[False-eager] | 1.0149ms | 0.8317ms | 1.2023 KOps/s | 1.2094 KOps/s | |
| test_func_call_runtime[False-compile] | 1.0678ms | 0.9024ms | 1.1082 KOps/s | 1.1000 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.5789ms | 0.4591ms | 2.1782 KOps/s | 2.1420 KOps/s | |
| test_func_call_runtime[True-eager] | 1.2244ms | 1.0764ms | 929.0068 Ops/s | 927.7616 Ops/s | |
| test_func_call_runtime[True-compile] | 1.0476ms | 0.9189ms | 1.0882 KOps/s | 1.0898 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.5852ms | 0.4712ms | 2.1221 KOps/s | 2.0803 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 0.9401ms | 0.8827ms | 1.1328 KOps/s | 1.2129 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.0815ms | 0.9122ms | 1.0963 KOps/s | 1.0731 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.5472ms | 0.4602ms | 2.1732 KOps/s | 2.1461 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.3132ms | 1.2261ms | 815.5628 Ops/s | 825.7118 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.0910ms | 0.9547ms | 1.0474 KOps/s | 1.0454 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5682ms | 0.5033ms | 1.9870 KOps/s | 1.9402 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8593ms | 2.3648ms | 422.8672 Ops/s | 420.0239 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.1499ms | 0.9764ms | 1.0242 KOps/s | 1.0242 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5704ms | 0.5111ms | 1.9565 KOps/s | 1.9181 KOps/s | |
| test_distributed | 2.5804ms | 0.1657ms | 6.0353 KOps/s | 6.4867 KOps/s | |
| test_tdmodule | 0.1823ms | 27.2352μs | 36.7171 KOps/s | 36.9860 KOps/s | |
| test_tdmodule_dispatch | 74.5340μs | 45.1079μs | 22.1691 KOps/s | 22.0212 KOps/s | |
| test_tdseq | 45.8020μs | 26.9193μs | 37.1481 KOps/s | 37.4587 KOps/s | |
| test_tdseq_dispatch | 68.6330μs | 47.1442μs | 21.2115 KOps/s | 21.0266 KOps/s | |
| test_instantiation_functorch | 2.2065ms | 2.0845ms | 479.7245 Ops/s | 479.3529 Ops/s | |
| test_exec_functorch | 0.2307ms | 0.1805ms | 5.5395 KOps/s | 5.5334 KOps/s | |
| test_exec_functional_call | 0.2076ms | 0.1612ms | 6.2039 KOps/s | 6.2350 KOps/s | |
| test_exec_td_decorator | 0.4389ms | 0.2379ms | 4.2033 KOps/s | 4.1900 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0084ms | 0.8214ms | 1.2174 KOps/s | 1.2135 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 0.9898ms | 0.8212ms | 1.2178 KOps/s | 1.2216 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 1.0519ms | 0.7095ms | 1.4095 KOps/s | 1.4126 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.9033ms | 0.7097ms | 1.4090 KOps/s | 1.4150 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.0807ms | 20.5196ms | 48.7338 Ops/s | 48.8156 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 21.2047ms | 20.5128ms | 48.7501 Ops/s | 48.8195 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.9939ms | 20.3139ms | 49.2273 Ops/s | 49.3533 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.4709ms | 20.2914ms | 49.2820 Ops/s | 49.2905 Ops/s | |
| test_to_module_speed[True] | 1.5680ms | 1.4762ms | 677.4127 Ops/s | 675.2574 Ops/s | |
| test_to_module_speed[False] | 1.5542ms | 1.4656ms | 682.3068 Ops/s | 686.5310 Ops/s | |
| test_tc_init | 82.9240μs | 44.5656μs | 22.4388 KOps/s | 22.5802 KOps/s | |
| test_tc_init_tensor_only | 45.4320μs | 9.8552μs | 101.4694 KOps/s | 103.5141 KOps/s | |
| test_tc_init_nested | 0.1211ms | 87.5143μs | 11.4267 KOps/s | 11.3469 KOps/s | |
| test_tc_init_many_fields | 0.1922ms | 16.5558μs | 60.4016 KOps/s | 60.5822 KOps/s | |
| test_tc_first_layer_tensor | 32.8110μs | 1.8244μs | 548.1347 KOps/s | 545.6558 KOps/s | |
| test_tc_first_layer_tensor_only | 16.0092μs | 0.3965μs | 2.5219 MOps/s | 2.5047 MOps/s | |
| test_tc_first_layer_tensor_set | 32.7920μs | 3.9174μs | 255.2732 KOps/s | 251.9657 KOps/s | |
| test_tc_first_layer_tensor_only_set | 18.7910μs | 3.2225μs | 310.3215 KOps/s | 307.2540 KOps/s | |
| test_tc_first_layer_nontensor | 38.4120μs | 6.1610μs | 162.3110 KOps/s | 161.2248 KOps/s | |
| test_tc_second_layer_tensor | 33.8620μs | 4.4888μs | 222.7745 KOps/s | 221.6370 KOps/s | |
| test_tc_second_layer_nontensor | 40.7820μs | 8.6498μs | 115.6094 KOps/s | 114.3294 KOps/s | |
| test_unbind | 0.2726s | 14.2189ms | 70.3287 Ops/s | 55.4462 Ops/s | |
| test_full_like | 17.5233ms | 16.7333ms | 59.7612 Ops/s | 228.3937 Ops/s | |
| test_zeros_like | 16.9181ms | 16.6386ms | 60.1011 Ops/s | 229.0686 Ops/s | |
| test_ones_like | 17.9179ms | 16.8269ms | 59.4288 Ops/s | 228.7501 Ops/s | |
| test_clone | 17.9909ms | 17.6214ms | 56.7491 Ops/s | 154.8207 Ops/s | |
| test_squeeze | 0.1131ms | 14.0855μs | 70.9950 KOps/s | 70.6839 KOps/s | |
| test_unsqueeze | 0.1661ms | 0.1102ms | 9.0767 KOps/s | 9.1294 KOps/s | |
| test_split | 0.3086ms | 0.1858ms | 5.3826 KOps/s | 5.3585 KOps/s | |
| test_permute | 0.3907ms | 0.2094ms | 4.7766 KOps/s | 4.8651 KOps/s | |
| test_stack | 51.3717ms | 51.1379ms | 19.5550 Ops/s | 19.6335 Ops/s | |
| test_cat | 51.4311ms | 51.1387ms | 19.5547 Ops/s | 19.6397 Ops/s | |
| test_sequential_tensordict | 0.2778ms | 0.2246ms | 4.4524 KOps/s | 4.5217 KOps/s | |
| test_sequential_graph_module | 0.5045ms | 0.1234ms | 8.1030 KOps/s | 8.3751 KOps/s | |
| test_nested_tensordict | 0.4038ms | 0.2922ms | 3.4227 KOps/s | 3.3871 KOps/s | |
| test_nested_graph_module | 0.5413ms | 0.1329ms | 7.5226 KOps/s | 7.4625 KOps/s |
|
| Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
|---|---|---|---|---|---|
| test_plain_set_nested | 42.0710μs | 14.9180μs | 67.0332 KOps/s | 67.3474 KOps/s | |
| test_plain_set_stack_nested | 30.9710μs | 15.2450μs | 65.5951 KOps/s | 65.8057 KOps/s | |
| test_plain_set_nested_inplace | 46.5810μs | 16.5486μs | 60.4279 KOps/s | 58.9094 KOps/s | |
| test_plain_set_stack_nested_inplace | 50.9410μs | 16.2491μs | 61.5418 KOps/s | 58.9226 KOps/s | |
| test_items | 25.1210μs | 5.9825μs | 167.1538 KOps/s | 165.1938 KOps/s | |
| test_items_nested | 0.5431ms | 0.4728ms | 2.1149 KOps/s | 2.1106 KOps/s | |
| test_items_nested_locked | 0.5318ms | 0.4740ms | 2.1095 KOps/s | 2.1072 KOps/s | |
| test_items_nested_leaf | 0.1240ms | 97.5225μs | 10.2540 KOps/s | 10.0995 KOps/s | |
| test_items_stack_nested | 0.5782ms | 0.4724ms | 2.1167 KOps/s | 2.1506 KOps/s | |
| test_items_stack_nested_leaf | 0.1419ms | 98.3139μs | 10.1715 KOps/s | 10.1023 KOps/s | |
| test_items_stack_nested_locked | 0.6063ms | 0.4787ms | 2.0889 KOps/s | 2.1116 KOps/s | |
| test_keys | 30.4700μs | 4.2439μs | 235.6317 KOps/s | 233.4440 KOps/s | |
| test_keys_nested | 0.1742ms | 0.1312ms | 7.6205 KOps/s | 7.5805 KOps/s | |
| test_keys_nested_locked | 0.7738ms | 0.1390ms | 7.1934 KOps/s | 7.1637 KOps/s | |
| test_keys_nested_leaf | 0.1854ms | 0.1211ms | 8.2545 KOps/s | 8.3565 KOps/s | |
| test_keys_stack_nested | 0.1958ms | 0.1323ms | 7.5595 KOps/s | 7.7086 KOps/s | |
| test_keys_stack_nested_leaf | 0.1545ms | 0.1218ms | 8.2132 KOps/s | 8.3063 KOps/s | |
| test_keys_stack_nested_locked | 0.1891ms | 0.1395ms | 7.1691 KOps/s | 7.2896 KOps/s | |
| test_values | 5.1420μs | 1.0228μs | 977.6771 KOps/s | 986.8799 KOps/s | |
| test_values_nested | 0.1248ms | 53.4864μs | 18.6963 KOps/s | 19.1284 KOps/s | |
| test_values_nested_locked | 84.0020μs | 56.5877μs | 17.6717 KOps/s | 18.0814 KOps/s | |
| test_values_nested_leaf | 80.0520μs | 61.1356μs | 16.3571 KOps/s | 16.7643 KOps/s | |
| test_values_stack_nested | 80.4020μs | 53.6596μs | 18.6360 KOps/s | 19.0495 KOps/s | |
| test_values_stack_nested_leaf | 85.6410μs | 61.1154μs | 16.3625 KOps/s | 16.7227 KOps/s | |
| test_values_stack_nested_locked | 95.1620μs | 56.4184μs | 17.7247 KOps/s | 17.8949 KOps/s | |
| test_membership | 4.4900μs | 0.8524μs | 1.1732 MOps/s | 1.1853 MOps/s | |
| test_membership_nested | 39.4400μs | 2.8850μs | 346.6171 KOps/s | 346.6596 KOps/s | |
| test_membership_nested_leaf | 30.6810μs | 2.9161μs | 342.9272 KOps/s | 363.1182 KOps/s | |
| test_membership_stacked_nested | 39.4810μs | 2.8849μs | 346.6298 KOps/s | 343.4730 KOps/s | |
| test_membership_stacked_nested_leaf | 31.2300μs | 2.8915μs | 345.8373 KOps/s | 343.5342 KOps/s | |
| test_membership_nested_last | 32.5910μs | 4.3881μs | 227.8870 KOps/s | 226.8980 KOps/s | |
| test_membership_nested_leaf_last | 30.7300μs | 4.3336μs | 230.7559 KOps/s | 227.7853 KOps/s | |
| test_membership_stacked_nested_last | 28.1710μs | 4.3335μs | 230.7578 KOps/s | 227.4615 KOps/s | |
| test_membership_stacked_nested_leaf_last | 25.0200μs | 4.3570μs | 229.5171 KOps/s | 228.2747 KOps/s | |
| test_nested_getleaf | 53.7810μs | 22.0824μs | 45.2850 KOps/s | 46.1742 KOps/s | |
| test_nested_get | 43.8110μs | 20.9910μs | 47.6395 KOps/s | 48.3219 KOps/s | |
| test_stacked_getleaf | 53.0510μs | 21.9898μs | 45.4757 KOps/s | 45.9234 KOps/s | |
| test_stacked_get | 57.1410μs | 21.0963μs | 47.4018 KOps/s | 48.7480 KOps/s | |
| test_nested_getitemleaf | 42.0200μs | 22.4806μs | 44.4827 KOps/s | 45.1290 KOps/s | |
| test_nested_getitem | 51.7710μs | 21.0942μs | 47.4064 KOps/s | 47.4615 KOps/s | |
| test_stacked_getitemleaf | 53.3610μs | 22.0897μs | 45.2699 KOps/s | 45.1662 KOps/s | |
| test_stacked_getitem | 50.5310μs | 21.2767μs | 46.9998 KOps/s | 47.5850 KOps/s | |
| test_lock_nested | 8.2496ms | 0.4892ms | 2.0443 KOps/s | 2.0831 KOps/s | |
| test_lock_stack_nested | 0.5457ms | 0.4830ms | 2.0703 KOps/s | 2.0566 KOps/s | |
| test_unlock_nested | 0.4558ms | 0.3881ms | 2.5767 KOps/s | 2.5569 KOps/s | |
| test_unlock_stack_nested | 0.4460ms | 0.3884ms | 2.5748 KOps/s | 2.5310 KOps/s | |
| test_flatten_speed | 0.1701ms | 0.1227ms | 8.1525 KOps/s | 8.1266 KOps/s | |
| test_unflatten_speed | 0.6447ms | 0.5762ms | 1.7356 KOps/s | 1.7302 KOps/s | |
| test_common_ops | 0.7992ms | 0.6875ms | 1.4545 KOps/s | 1.4444 KOps/s | |
| test_creation | 71.8520μs | 3.1342μs | 319.0588 KOps/s | 313.2180 KOps/s | |
| test_creation_empty | 42.4210μs | 6.9670μs | 143.5341 KOps/s | 142.0070 KOps/s | |
| test_creation_nested_1 | 43.3710μs | 11.5485μs | 86.5915 KOps/s | 85.6462 KOps/s | |
| test_creation_nested_2 | 40.2710μs | 13.3423μs | 74.9496 KOps/s | 73.4269 KOps/s | |
| test_creation_many_keys[10] | 51.6710μs | 20.8493μs | 47.9631 KOps/s | 47.1871 KOps/s | |
| test_creation_many_keys[50] | 0.1244ms | 89.2815μs | 11.2005 KOps/s | 10.9768 KOps/s | |
| test_creation_many_keys[100] | 0.2523ms | 0.1762ms | 5.6766 KOps/s | 5.6096 KOps/s | |
| test_creation_nested_many_keys[10] | 82.7710μs | 44.6311μs | 22.4059 KOps/s | 21.9413 KOps/s | |
| test_creation_nested_many_keys[50] | 0.2501ms | 0.1834ms | 5.4537 KOps/s | 5.4372 KOps/s | |
| test_clone | 33.7110μs | 13.0644μs | 76.5438 KOps/s | 76.5801 KOps/s | |
| test_getitem[int] | 1.6013ms | 15.2432μs | 65.6031 KOps/s | 58.2579 KOps/s | |
| test_getitem[slice_int] | 0.1416ms | 24.4709μs | 40.8649 KOps/s | 41.8557 KOps/s | |
| test_getitem[range] | 0.1710ms | 62.4385μs | 16.0158 KOps/s | 15.1007 KOps/s | |
| test_getitem[tuple] | 0.1415ms | 24.1846μs | 41.3487 KOps/s | 42.0461 KOps/s | |
| test_getitem[list] | 0.1960ms | 57.5926μs | 17.3633 KOps/s | 17.2625 KOps/s | |
| test_setitem_dim[int] | 55.0610μs | 25.6101μs | 39.0471 KOps/s | 38.0990 KOps/s | |
| test_setitem_dim[slice_int] | 64.0420μs | 42.9926μs | 23.2598 KOps/s | 22.2491 KOps/s | |
| test_setitem_dim[range] | 0.1282ms | 94.4013μs | 10.5931 KOps/s | 10.0714 KOps/s | |
| test_setitem_dim[tuple] | 62.1210μs | 40.1761μs | 24.8904 KOps/s | 25.8570 KOps/s | |
| test_setitem | 62.3920μs | 17.3092μs | 57.7726 KOps/s | 56.2954 KOps/s | |
| test_set | 43.3910μs | 16.6134μs | 60.1925 KOps/s | 55.2133 KOps/s | |
| test_set_shared | 0.5686ms | 0.2024ms | 4.9415 KOps/s | 4.6980 KOps/s | |
| test_update | 0.3530ms | 21.5799μs | 46.3394 KOps/s | 43.5403 KOps/s | |
| test_update_nested | 69.0210μs | 33.1892μs | 30.1303 KOps/s | 29.2267 KOps/s | |
| test_update__nested | 0.4677ms | 34.0562μs | 29.3632 KOps/s | 27.4330 KOps/s | |
| test_set_nested | 45.2410μs | 18.6718μs | 53.5566 KOps/s | 49.0382 KOps/s | |
| test_set_nested_new | 55.7210μs | 23.6919μs | 42.2085 KOps/s | 39.1873 KOps/s | |
| test_select | 72.2710μs | 39.3959μs | 25.3834 KOps/s | 23.0962 KOps/s | |
| test_select_nested | 0.1130ms | 75.3765μs | 13.2667 KOps/s | 13.3543 KOps/s | |
| test_exclude_nested | 0.1243ms | 92.7378μs | 10.7831 KOps/s | 10.7033 KOps/s | |
| test_empty[True] | 0.4712ms | 0.4018ms | 2.4889 KOps/s | 2.4829 KOps/s | |
| test_empty[False] | 9.5527μs | 1.3378μs | 747.5209 KOps/s | 750.4203 KOps/s | |
| test_to | 0.1092ms | 71.7100μs | 13.9451 KOps/s | 13.4432 KOps/s | |
| test_to_nonblocking | 0.1090ms | 64.0354μs | 15.6164 KOps/s | 15.4873 KOps/s | |
| test_unbind_speed | 0.3892ms | 0.3349ms | 2.9859 KOps/s | 2.9794 KOps/s | |
| test_unbind_speed_stack0 | 0.3895ms | 0.3346ms | 2.9886 KOps/s | 3.0008 KOps/s | |
| test_unbind_speed_stack1 | 0.1036s | 0.8420ms | 1.1877 KOps/s | 1.1789 KOps/s | |
| test_split | 0.1036s | 1.2657ms | 790.0677 Ops/s | 782.9867 Ops/s | |
| test_chunk | 0.1034s | 1.2122ms | 824.9145 Ops/s | 926.8018 Ops/s | |
| test_to_cpu_blocking | 28.5713ms | 28.3796ms | 35.2366 Ops/s | 34.7918 Ops/s | |
| test_to_cpu_global_sync | 11.6395ms | 11.2422ms | 88.9506 Ops/s | 79.7074 Ops/s | |
| test_to_cpu_event_sync | 12.4154ms | 12.2266ms | 81.7887 Ops/s | 81.0594 Ops/s | |
| test_to_cpu_default | 12.5117ms | 12.2165ms | 81.8566 Ops/s | 81.1905 Ops/s | |
| test_consolidate[False-None] | 4.2819ms | 4.1623ms | 240.2540 Ops/s | 238.0236 Ops/s | |
| test_consolidate[default-None] | 2.1320ms | 2.0111ms | 497.2509 Ops/s | 486.3302 Ops/s | |
| test_consolidate[reduce-overhead-None] | 2.0329ms | 1.9334ms | 517.2132 Ops/s | 505.0451 Ops/s | |
| test_consolidate_njt[False-None] | 8.7086ms | 8.5139ms | 117.4550 Ops/s | 117.6327 Ops/s | |
| test_to[False-False-None] | 2.2002ms | 2.0917ms | 478.0786 Ops/s | 477.1367 Ops/s | |
| test_to[True-False-None] | 0.1832s | 2.3080ms | 433.2736 Ops/s | 508.0849 Ops/s | |
| test_to[within-False-None] | 6.3311ms | 6.1576ms | 162.4008 Ops/s | 162.9049 Ops/s | |
| test_to[True-default-None] | 9.2638ms | 8.8867ms | 112.5281 Ops/s | 113.5419 Ops/s | |
| test_to_njt[False-False-None] | 10.1115ms | 8.4787ms | 117.9423 Ops/s | 116.8988 Ops/s | |
| test_to_njt[True-False-None] | 7.2462ms | 6.9614ms | 143.6497 Ops/s | 142.4749 Ops/s | |
| test_to_njt[within-False-None] | 16.3069ms | 15.6682ms | 63.8236 Ops/s | 63.5162 Ops/s | |
| test_creation[device0] | 0.3933ms | 0.1157ms | 8.6433 KOps/s | 8.8086 KOps/s | |
| test_creation_from_tensor | 0.4650ms | 0.1162ms | 8.6076 KOps/s | 8.7645 KOps/s | |
| test_add_one[memmap_tensor0] | 0.3244ms | 6.2441μs | 160.1522 KOps/s | 157.4978 KOps/s | |
| test_contiguous[memmap_tensor0] | 13.3600μs | 0.6644μs | 1.5051 MOps/s | 2.1818 MOps/s | |
| test_stack[memmap_tensor0] | 25.5400μs | 4.6812μs | 213.6197 KOps/s | 216.3756 KOps/s | |
| test_memmaptd_index | 1.0431ms | 0.2657ms | 3.7632 KOps/s | 3.7351 KOps/s | |
| test_memmaptd_index_astensor | 0.5276ms | 0.3705ms | 2.6991 KOps/s | 2.6682 KOps/s | |
| test_memmaptd_index_op | 0.7497ms | 0.6110ms | 1.6368 KOps/s | 1.6362 KOps/s | |
| test_serialize_model | 0.3031s | 0.1652s | 6.0517 Ops/s | 7.3706 Ops/s | |
| test_serialize_model_pickle | 2.1436s | 1.4136s | 0.7074 Ops/s | 0.8378 Ops/s | |
| test_serialize_weights | 0.1365s | 0.1344s | 7.4417 Ops/s | 7.4209 Ops/s | |
| test_serialize_weights_returnearly | 0.4448s | 87.8446ms | 11.3837 Ops/s | 6.9172 Ops/s | |
| test_serialize_weights_pickle | 1.3704s | 1.1983s | 0.8345 Ops/s | 0.8177 Ops/s | |
| test_reshape_pytree | 0.2061ms | 32.7806μs | 30.5058 KOps/s | 30.2417 KOps/s | |
| test_reshape_td | 89.1520μs | 46.1038μs | 21.6902 KOps/s | 21.2034 KOps/s | |
| test_view_pytree | 0.2044ms | 32.4517μs | 30.8150 KOps/s | 30.3135 KOps/s | |
| test_view_td | 87.6120μs | 54.0155μs | 18.5132 KOps/s | 18.3058 KOps/s | |
| test_unbind_pytree | 0.2278ms | 36.4438μs | 27.4395 KOps/s | 27.1070 KOps/s | |
| test_unbind_td | 0.1274ms | 50.2156μs | 19.9141 KOps/s | 19.1927 KOps/s | |
| test_split_pytree | 0.2417ms | 42.5328μs | 23.5112 KOps/s | 23.3026 KOps/s | |
| test_split_td | 0.1540ms | 63.9976μs | 15.6256 KOps/s | 14.8211 KOps/s | |
| test_add_pytree | 0.2257ms | 42.0484μs | 23.7821 KOps/s | 23.9392 KOps/s | |
| test_add_td | 99.4020μs | 55.4687μs | 18.0282 KOps/s | 17.1424 KOps/s | |
| test_compile_add_one_nested[tensordict-compile] | 0.1931ms | 0.1396ms | 7.1631 KOps/s | 6.7776 KOps/s | |
| test_compile_add_one_nested[tensordict-eager] | 0.3058ms | 0.2033ms | 4.9197 KOps/s | 4.9736 KOps/s | |
| test_compile_add_one_nested[pytree-compile] | 0.1389ms | 0.1067ms | 9.3741 KOps/s | 8.8824 KOps/s | |
| test_compile_add_one_nested[pytree-eager] | 0.4429ms | 0.1772ms | 5.6441 KOps/s | 5.3493 KOps/s | |
| test_compile_copy_nested[tensordict-compile] | 0.3884ms | 10.3441μs | 96.6733 KOps/s | 96.6014 KOps/s | |
| test_compile_copy_nested[tensordict-eager] | 0.1124ms | 54.2766μs | 18.4241 KOps/s | 18.2947 KOps/s | |
| test_compile_copy_nested[pytree-compile] | 0.1341ms | 10.2826μs | 97.2513 KOps/s | 99.6265 KOps/s | |
| test_compile_copy_nested[pytree-eager] | 0.4652ms | 69.6925μs | 14.3487 KOps/s | 14.1082 KOps/s | |
| test_compile_add_one_flat[tensordict-compile] | 0.2729ms | 0.1780ms | 5.6165 KOps/s | 5.4252 KOps/s | |
| test_compile_add_one_flat[tensordict-eager] | 0.3739ms | 0.2809ms | 3.5596 KOps/s | 3.5456 KOps/s | |
| test_compile_add_one_flat[tensorclass-compile] | 0.3369ms | 0.1164ms | 8.5941 KOps/s | 8.3126 KOps/s | |
| test_compile_add_one_flat[tensorclass-eager] | 0.1326ms | 73.3080μs | 13.6411 KOps/s | 13.4278 KOps/s | |
| test_compile_add_one_flat[pytree-compile] | 0.2495ms | 0.1582ms | 6.3193 KOps/s | 6.0871 KOps/s | |
| test_compile_add_one_flat[pytree-eager] | 0.7936ms | 0.5096ms | 1.9623 KOps/s | 1.8328 KOps/s | |
| test_compile_add_self_flat[tensordict-eager] | 0.3951ms | 0.3340ms | 2.9943 KOps/s | 2.9325 KOps/s | |
| test_compile_add_self_flat[tensordict-compile] | 0.3238ms | 0.1811ms | 5.5221 KOps/s | 4.8610 KOps/s | |
| test_compile_add_self_flat[tensorclass-eager] | 0.1607ms | 88.8590μs | 11.2538 KOps/s | 10.7678 KOps/s | |
| test_compile_add_self_flat[tensorclass-compile] | 0.1720ms | 0.1190ms | 8.4067 KOps/s | 7.8571 KOps/s | |
| test_compile_add_self_flat[pytree-eager] | 0.6429ms | 0.4236ms | 2.3607 KOps/s | 2.3348 KOps/s | |
| test_compile_add_self_flat[pytree-compile] | 0.3159ms | 0.1566ms | 6.3839 KOps/s | 6.1132 KOps/s | |
| test_compile_copy_flat[tensordict-compile] | 0.1247ms | 13.7758μs | 72.5909 KOps/s | 74.3684 KOps/s | |
| test_compile_copy_flat[tensordict-eager] | 99.2620μs | 41.6178μs | 24.0282 KOps/s | 24.1222 KOps/s | |
| test_compile_copy_flat[pytree-compile] | 0.1575ms | 10.8676μs | 92.0163 KOps/s | 92.3849 KOps/s | |
| test_compile_copy_flat[pytree-eager] | 0.4065ms | 52.5068μs | 19.0452 KOps/s | 18.8458 KOps/s | |
| test_compile_assign_and_add[tensordict-compile] | 2.0094ms | 0.1748ms | 5.7204 KOps/s | 5.0564 KOps/s | |
| test_compile_assign_and_add[tensordict-eager] | 3.5183ms | 3.2790ms | 304.9706 Ops/s | 289.8388 Ops/s | |
| test_compile_assign_and_add[pytree-compile] | 1.9810ms | 0.1632ms | 6.1277 KOps/s | 6.0721 KOps/s | |
| test_compile_assign_and_add[pytree-eager] | 3.0312ms | 2.8417ms | 351.8976 Ops/s | 361.9744 Ops/s | |
| test_compile_indexing[tensor-tensordict-compile] | 0.1853ms | 0.1104ms | 9.0540 KOps/s | 8.8331 KOps/s | |
| test_compile_indexing[tensor-tensordict-eager] | 0.3111ms | 73.3177μs | 13.6393 KOps/s | 13.6510 KOps/s | |
| test_compile_indexing[tensor-tensorclass-compile] | 0.2282ms | 97.4562μs | 10.2610 KOps/s | 10.2880 KOps/s | |
| test_compile_indexing[tensor-tensorclass-eager] | 0.2640ms | 45.9399μs | 21.7676 KOps/s | 22.6711 KOps/s | |
| test_compile_indexing[tensor-pytree-compile] | 0.1607ms | 98.1897μs | 10.1844 KOps/s | 10.2128 KOps/s | |
| test_compile_indexing[tensor-pytree-eager] | 0.2958ms | 45.2773μs | 22.0861 KOps/s | 22.3108 KOps/s | |
| test_compile_indexing[slice-tensordict-compile] | 0.2547ms | 58.5180μs | 17.0888 KOps/s | 17.1097 KOps/s | |
| test_compile_indexing[slice-tensordict-eager] | 0.2231ms | 28.0417μs | 35.6612 KOps/s | 35.4221 KOps/s | |
| test_compile_indexing[slice-tensorclass-compile] | 0.1244ms | 46.3225μs | 21.5878 KOps/s | 22.2344 KOps/s | |
| test_compile_indexing[slice-tensorclass-eager] | 0.2536ms | 22.8043μs | 43.8513 KOps/s | 43.6411 KOps/s | |
| test_compile_indexing[slice-pytree-compile] | 95.5420μs | 45.5510μs | 21.9534 KOps/s | 21.8626 KOps/s | |
| test_compile_indexing[slice-pytree-eager] | 0.2720ms | 22.7249μs | 44.0047 KOps/s | 44.0261 KOps/s | |
| test_compile_indexing[int-tensordict-compile] | 98.1410μs | 58.3675μs | 17.1328 KOps/s | 16.8481 KOps/s | |
| test_compile_indexing[int-tensordict-eager] | 0.2595ms | 27.9745μs | 35.7468 KOps/s | 35.8679 KOps/s | |
| test_compile_indexing[int-tensorclass-compile] | 89.1010μs | 45.5547μs | 21.9516 KOps/s | 20.7362 KOps/s | |
| test_compile_indexing[int-tensorclass-eager] | 0.2899ms | 22.6837μs | 44.0845 KOps/s | 44.0093 KOps/s | |
| test_compile_indexing[int-pytree-compile] | 93.3510μs | 46.0404μs | 21.7200 KOps/s | 20.8381 KOps/s | |
| test_compile_indexing[int-pytree-eager] | 0.2910ms | 22.5921μs | 44.2633 KOps/s | 44.1242 KOps/s | |
| test_compile_replace[single-eager] | 93.9920μs | 48.0192μs | 20.8250 KOps/s | 21.1689 KOps/s | |
| test_compile_replace[single-compile] | 0.1975ms | 0.1060ms | 9.4364 KOps/s | 9.3107 KOps/s | |
| test_compile_replace[multi-eager] | 0.6352ms | 0.5589ms | 1.7892 KOps/s | 1.7367 KOps/s | |
| test_compile_replace[multi-compile] | 0.2897ms | 0.1112ms | 8.9964 KOps/s | 8.8393 KOps/s | |
| test_compile_tc_getattr_20[eager] | 0.2233ms | 0.1631ms | 6.1330 KOps/s | 6.0967 KOps/s | |
| test_compile_tc_getattr_20[compile] | 0.2990ms | 0.1177ms | 8.4991 KOps/s | 8.3184 KOps/s | |
| test_compile_clone_shallow[20-eager] | 45.9010μs | 19.5541μs | 51.1402 KOps/s | 51.3913 KOps/s | |
| test_compile_clone_shallow[20-compile] | 66.1410μs | 11.5954μs | 86.2411 KOps/s | 87.2528 KOps/s | |
| test_compile_clone_shallow[40-eager] | 71.5220μs | 34.4699μs | 29.0108 KOps/s | 29.2104 KOps/s | |
| test_compile_clone_shallow[40-compile] | 68.4510μs | 12.8655μs | 77.7271 KOps/s | 75.4515 KOps/s | |
| test_compile_clone_shallow[80-eager] | 0.1059ms | 63.3198μs | 15.7929 KOps/s | 15.8062 KOps/s | |
| test_compile_clone_shallow[80-compile] | 62.7520μs | 15.1719μs | 65.9115 KOps/s | 64.6836 KOps/s | |
| test_compile_update_inplace[eager] | 0.1007ms | 58.8271μs | 16.9990 KOps/s | 16.8574 KOps/s | |
| test_compile_update_inplace[compile] | 0.3170ms | 0.1401ms | 7.1391 KOps/s | 7.0642 KOps/s | |
| test_mod_add[eager] | 91.4420μs | 50.0761μs | 19.9696 KOps/s | 19.9761 KOps/s | |
| test_mod_add[compile] | 0.4735ms | 0.1059ms | 9.4440 KOps/s | 9.0342 KOps/s | |
| test_mod_add[compile-overhead] | 0.4730ms | 0.1489ms | 6.7166 KOps/s | 6.5241 KOps/s | |
| test_mod_wrap[eager] | 0.3797ms | 0.2862ms | 3.4945 KOps/s | 3.4306 KOps/s | |
| test_mod_wrap[compile] | 0.4558ms | 0.3475ms | 2.8773 KOps/s | 2.8572 KOps/s | |
| test_mod_wrap[compile-overhead] | 7.3346ms | 4.0398ms | 247.5372 Ops/s | 251.1565 Ops/s | |
| test_mod_wrap_and_backward[eager] | 1.9904ms | 1.4907ms | 670.8384 Ops/s | 671.9618 Ops/s | |
| test_mod_wrap_and_backward[compile] | 1.6236ms | 1.4443ms | 692.3786 Ops/s | 692.7247 Ops/s | |
| test_mod_wrap_and_backward[compile-overhead] | 1.2351ms | 0.8839ms | 1.1314 KOps/s | 1.1047 KOps/s | |
| test_seq_add[eager] | 0.7298ms | 0.1523ms | 6.5644 KOps/s | 6.3080 KOps/s | |
| test_seq_add[compile] | 0.5539ms | 0.1138ms | 8.7896 KOps/s | 8.4746 KOps/s | |
| test_seq_add[compile-overhead] | 0.6142ms | 0.1527ms | 6.5475 KOps/s | 6.2353 KOps/s | |
| test_seq_wrap[eager] | 0.9740ms | 0.5167ms | 1.9354 KOps/s | 1.8502 KOps/s | |
| test_seq_wrap[compile] | 0.8096ms | 0.3660ms | 2.7320 KOps/s | 2.6809 KOps/s | |
| test_seq_wrap[compile-overhead] | 0.7053ms | 0.2641ms | 3.7866 KOps/s | 3.7392 KOps/s | |
| test_func_call_runtime[False-eager] | 1.2940ms | 0.8258ms | 1.2109 KOps/s | 1.2169 KOps/s | |
| test_func_call_runtime[False-compile] | 1.3402ms | 0.9080ms | 1.1013 KOps/s | 1.1031 KOps/s | |
| test_func_call_runtime[False-compile-overhead] | 0.9085ms | 0.4594ms | 2.1770 KOps/s | 2.1389 KOps/s | |
| test_func_call_runtime[True-eager] | 1.4897ms | 1.0621ms | 941.5373 Ops/s | 916.3803 Ops/s | |
| test_func_call_runtime[True-compile] | 1.4889ms | 0.9199ms | 1.0871 KOps/s | 1.0840 KOps/s | |
| test_func_call_runtime[True-compile-overhead] | 0.9303ms | 0.4747ms | 2.1064 KOps/s | 2.0805 KOps/s | |
| test_func_call_cm_runtime[False-eager] | 1.2711ms | 0.8409ms | 1.1891 KOps/s | 1.1542 KOps/s | |
| test_func_call_cm_runtime[False-compile] | 1.3815ms | 0.9141ms | 1.0939 KOps/s | 1.0935 KOps/s | |
| test_func_call_cm_runtime[False-compile-overhead] | 0.6263ms | 0.4646ms | 2.1525 KOps/s | 2.1305 KOps/s | |
| test_func_call_cm_runtime[True-eager] | 1.2907ms | 1.2084ms | 827.5217 Ops/s | 813.8276 Ops/s | |
| test_func_call_cm_runtime[True-compile] | 1.0275ms | 0.9516ms | 1.0508 KOps/s | 1.0430 KOps/s | |
| test_func_call_cm_runtime[True-compile-overhead] | 0.5913ms | 0.5096ms | 1.9622 KOps/s | 1.9415 KOps/s | |
| test_vmap_func_call_cm_runtime[eager] | 2.8129ms | 2.3395ms | 427.4490 Ops/s | 422.1919 Ops/s | |
| test_vmap_func_call_cm_runtime[compile] | 1.0588ms | 0.9666ms | 1.0346 KOps/s | 1.0213 KOps/s | |
| test_vmap_func_call_cm_runtime[compile-overhead] | 0.5914ms | 0.5143ms | 1.9443 KOps/s | 1.9154 KOps/s | |
| test_distributed | 0.5956ms | 0.1528ms | 6.5426 KOps/s | 6.4816 KOps/s | |
| test_tdmodule | 46.6500μs | 27.0571μs | 36.9588 KOps/s | 35.3447 KOps/s | |
| test_tdmodule_dispatch | 76.6810μs | 47.0119μs | 21.2712 KOps/s | 21.7871 KOps/s | |
| test_tdseq | 51.7300μs | 26.6298μs | 37.5519 KOps/s | 37.1272 KOps/s | |
| test_tdseq_dispatch | 73.3110μs | 47.0368μs | 21.2600 KOps/s | 20.9612 KOps/s | |
| test_instantiation_functorch | 2.1856ms | 2.0787ms | 481.0700 Ops/s | 480.3541 Ops/s | |
| test_exec_functorch | 0.2627ms | 0.1766ms | 5.6628 KOps/s | 5.6086 KOps/s | |
| test_exec_functional_call | 0.2227ms | 0.1586ms | 6.3065 KOps/s | 6.3584 KOps/s | |
| test_exec_td_decorator | 0.4576ms | 0.2415ms | 4.1406 KOps/s | 4.2647 KOps/s | |
| test_vmap_mlp_speed_decorator[True-True] | 1.0094ms | 0.8235ms | 1.2143 KOps/s | 1.2149 KOps/s | |
| test_vmap_mlp_speed_decorator[True-False] | 1.0090ms | 0.8194ms | 1.2204 KOps/s | 1.2114 KOps/s | |
| test_vmap_mlp_speed_decorator[False-True] | 0.8928ms | 0.7012ms | 1.4261 KOps/s | 1.3992 KOps/s | |
| test_vmap_mlp_speed_decorator[False-False] | 0.8843ms | 0.7064ms | 1.4156 KOps/s | 1.4046 KOps/s | |
| test_vmap_transformer_speed_decorator[True-True] | 21.0082ms | 20.2262ms | 49.4409 Ops/s | 49.0093 Ops/s | |
| test_vmap_transformer_speed_decorator[True-False] | 20.8568ms | 20.2145ms | 49.4695 Ops/s | 48.9888 Ops/s | |
| test_vmap_transformer_speed_decorator[False-True] | 20.2682ms | 20.0001ms | 49.9998 Ops/s | 49.5342 Ops/s | |
| test_vmap_transformer_speed_decorator[False-False] | 20.9755ms | 20.1473ms | 49.6345 Ops/s | 49.4848 Ops/s | |
| test_to_module_speed[True] | 1.6217ms | 1.4770ms | 677.0316 Ops/s | 669.4962 Ops/s | |
| test_to_module_speed[False] | 1.5810ms | 1.4573ms | 686.2185 Ops/s | 704.1430 Ops/s | |
| test_tc_init | 66.1910μs | 44.6124μs | 22.4153 KOps/s | 22.1826 KOps/s | |
| test_tc_init_tensor_only | 39.6410μs | 9.8733μs | 101.2833 KOps/s | 102.0890 KOps/s | |
| test_tc_init_nested | 0.1320ms | 89.3917μs | 11.1867 KOps/s | 11.0895 KOps/s | |
| test_tc_init_many_fields | 54.0710μs | 16.7253μs | 59.7895 KOps/s | 60.5583 KOps/s | |
| test_tc_first_layer_tensor | 33.4510μs | 1.8344μs | 545.1400 KOps/s | 547.7682 KOps/s | |
| test_tc_first_layer_tensor_only | 4.8631μs | 0.4070μs | 2.4571 MOps/s | 2.5394 MOps/s | |
| test_tc_first_layer_tensor_set | 38.5300μs | 3.9005μs | 256.3745 KOps/s | 252.7019 KOps/s | |
| test_tc_first_layer_tensor_only_set | 21.6510μs | 3.2773μs | 305.1337 KOps/s | 279.0046 KOps/s | |
| test_tc_first_layer_nontensor | 25.7700μs | 6.1836μs | 161.7168 KOps/s | 160.9177 KOps/s | |
| test_tc_second_layer_tensor | 27.3200μs | 4.4349μs | 225.4858 KOps/s | 220.9134 KOps/s | |
| test_tc_second_layer_nontensor | 39.7700μs | 8.7249μs | 114.6140 KOps/s | 112.8631 KOps/s | |
| test_unbind | 0.2502s | 16.6085ms | 60.2100 Ops/s | 54.9548 Ops/s | |
| test_full_like | 17.4438ms | 16.6885ms | 59.9215 Ops/s | 60.0561 Ops/s | |
| test_zeros_like | 16.9464ms | 16.6053ms | 60.2217 Ops/s | 60.1109 Ops/s | |
| test_ones_like | 17.7853ms | 16.6414ms | 60.0913 Ops/s | 60.1619 Ops/s | |
| test_clone | 17.8815ms | 17.5404ms | 57.0113 Ops/s | 57.2243 Ops/s | |
| test_squeeze | 89.4810μs | 14.3097μs | 69.8829 KOps/s | 69.5865 KOps/s | |
| test_unsqueeze | 0.1660ms | 0.1114ms | 8.9746 KOps/s | 8.7756 KOps/s | |
| test_split | 0.3486ms | 0.1866ms | 5.3587 KOps/s | 5.4191 KOps/s | |
| test_permute | 0.2684ms | 0.2056ms | 4.8647 KOps/s | 4.9185 KOps/s | |
| test_stack | 51.3004ms | 50.8306ms | 19.6732 Ops/s | 19.4941 Ops/s | |
| test_cat | 51.6318ms | 50.9089ms | 19.6429 Ops/s | 19.6603 Ops/s | |
| test_sequential_tensordict | 0.2737ms | 0.2181ms | 4.5849 KOps/s | 4.6541 KOps/s | |
| test_sequential_graph_module | 0.2551ms | 0.1181ms | 8.4685 KOps/s | 8.5767 KOps/s | |
| test_nested_tensordict | 0.3827ms | 0.2938ms | 3.4033 KOps/s | 3.4880 KOps/s | |
| test_nested_graph_module | 0.1933ms | 0.1325ms | 7.5499 KOps/s | 7.7694 KOps/s |
- dtensor_transfer_plan_test.py: CPU-only test for shard algebra and transfer plan computation (no GPUs needed) - dtensor_transfer_distributed_test.py: Multi-GPU test for strategies A and B using torchrun with real DTensors on NCCL - minimal_p2p_test.py: Minimal NCCL P2P test for JSON metadata serialization over CUDA byte tensors Made-with: Cursor ghstack-source-id: 04408b0 Pull-Request: #1647 Made-with: Cursor
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
1 similar comment
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
1 similar comment
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
1 similar comment
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
PR Title Label ErrorUnknown or invalid prefix Current title: Supported PrefixesYour PR title must start with exactly one of these prefixes (case-insensitive):
Note: Matching is case-insensitive. Common variations (singular/plural) are supported. |
Stack from ghstack (oldest at bottom):
and transfer plan computation (no GPUs needed)
A and B using torchrun with real DTensors on NCCL
serialization over CUDA byte tensors
Made-with: Cursor