Skip to content

[DTensor] Add example scripts for cross-mesh DTensor transfer#1647

Open
vmoens wants to merge 6 commits intogh/vmoens/88/basefrom
gh/vmoens/88/head
Open

[DTensor] Add example scripts for cross-mesh DTensor transfer#1647
vmoens wants to merge 6 commits intogh/vmoens/88/basefrom
gh/vmoens/88/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 9, 2026

Stack from ghstack (oldest at bottom):


  • dtensor_transfer_plan_test.py: CPU-only test for shard algebra
    and transfer plan computation (no GPUs needed)
  • dtensor_transfer_distributed_test.py: Multi-GPU test for strategies
    A and B using torchrun with real DTensors on NCCL
  • minimal_p2p_test.py: Minimal NCCL P2P test for JSON metadata
    serialization over CUDA byte tensors

Made-with: Cursor

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 9, 2026
- dtensor_transfer_plan_test.py: CPU-only test for shard algebra
  and transfer plan computation (no GPUs needed)
- dtensor_transfer_distributed_test.py: Multi-GPU test for strategies
  A and B using torchrun with real DTensors on NCCL
- minimal_p2p_test.py: Minimal NCCL P2P test for JSON metadata
  serialization over CUDA byte tensors

Made-with: Cursor
ghstack-source-id: ccca9b7
Pull-Request: #1647
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}11$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.7620μs 14.9071μs 67.0821 KOps/s 68.2113 KOps/s $\color{#d91a1a}-1.66\%$
test_plain_set_stack_nested 46.8830μs 15.0607μs 66.3982 KOps/s 67.0347 KOps/s $\color{#d91a1a}-0.95\%$
test_plain_set_nested_inplace 82.4740μs 16.8517μs 59.3412 KOps/s 59.8231 KOps/s $\color{#d91a1a}-0.81\%$
test_plain_set_stack_nested_inplace 54.1630μs 16.7361μs 59.7511 KOps/s 60.7876 KOps/s $\color{#d91a1a}-1.71\%$
test_items 41.3320μs 6.0066μs 166.4824 KOps/s 165.5070 KOps/s $\color{#35bf28}+0.59\%$
test_items_nested 0.5295ms 0.4691ms 2.1317 KOps/s 2.1466 KOps/s $\color{#d91a1a}-0.70\%$
test_items_nested_locked 0.5572ms 0.4745ms 2.1074 KOps/s 2.1087 KOps/s $\color{#d91a1a}-0.06\%$
test_items_nested_leaf 0.1430ms 97.5314μs 10.2531 KOps/s 10.0942 KOps/s $\color{#35bf28}+1.57\%$
test_items_stack_nested 0.5126ms 0.4683ms 2.1355 KOps/s 2.1456 KOps/s $\color{#d91a1a}-0.47\%$
test_items_stack_nested_leaf 0.1424ms 97.3412μs 10.2731 KOps/s 10.2684 KOps/s $\color{#35bf28}+0.05\%$
test_items_stack_nested_locked 0.5816ms 0.4704ms 2.1257 KOps/s 2.1419 KOps/s $\color{#d91a1a}-0.76\%$
test_keys 32.0120μs 4.2166μs 237.1579 KOps/s 237.3731 KOps/s $\color{#d91a1a}-0.09\%$
test_keys_nested 0.2037ms 0.1304ms 7.6678 KOps/s 7.7280 KOps/s $\color{#d91a1a}-0.78\%$
test_keys_nested_locked 0.8101ms 0.1391ms 7.1916 KOps/s 7.3062 KOps/s $\color{#d91a1a}-1.57\%$
test_keys_nested_leaf 0.1687ms 0.1213ms 8.2419 KOps/s 8.3646 KOps/s $\color{#d91a1a}-1.47\%$
test_keys_stack_nested 0.1907ms 0.1320ms 7.5764 KOps/s 7.7104 KOps/s $\color{#d91a1a}-1.74\%$
test_keys_stack_nested_leaf 0.2459ms 0.1195ms 8.3704 KOps/s 8.3584 KOps/s $\color{#35bf28}+0.14\%$
test_keys_stack_nested_locked 0.1910ms 0.1403ms 7.1287 KOps/s 7.2762 KOps/s $\color{#d91a1a}-2.03\%$
test_values 3.2142μs 0.9982μs 1.0019 MOps/s 978.1402 KOps/s $\color{#35bf28}+2.42\%$
test_values_nested 76.5050μs 53.3537μs 18.7428 KOps/s 19.0143 KOps/s $\color{#d91a1a}-1.43\%$
test_values_nested_locked 80.4750μs 56.3460μs 17.7475 KOps/s 17.7475 KOps/s $-0.00\%$
test_values_nested_leaf 92.2950μs 61.1283μs 16.3590 KOps/s 16.4785 KOps/s $\color{#d91a1a}-0.72\%$
test_values_stack_nested 80.7850μs 53.1079μs 18.8296 KOps/s 18.9219 KOps/s $\color{#d91a1a}-0.49\%$
test_values_stack_nested_leaf 87.7250μs 60.9494μs 16.4071 KOps/s 16.5656 KOps/s $\color{#d91a1a}-0.96\%$
test_values_stack_nested_locked 91.3450μs 56.3993μs 17.7307 KOps/s 17.7737 KOps/s $\color{#d91a1a}-0.24\%$
test_membership 12.1492μs 0.8568μs 1.1672 MOps/s 1.1628 MOps/s $\color{#35bf28}+0.38\%$
test_membership_nested 39.8520μs 2.8983μs 345.0264 KOps/s 344.9239 KOps/s $\color{#35bf28}+0.03\%$
test_membership_nested_leaf 16.6210μs 2.8177μs 354.8987 KOps/s 366.6981 KOps/s $\color{#d91a1a}-3.22\%$
test_membership_stacked_nested 38.1030μs 2.9111μs 343.5111 KOps/s 341.3714 KOps/s $\color{#35bf28}+0.63\%$
test_membership_stacked_nested_leaf 23.7410μs 2.9047μs 344.2730 KOps/s 341.7357 KOps/s $\color{#35bf28}+0.74\%$
test_membership_nested_last 30.9410μs 4.3560μs 229.5695 KOps/s 230.3162 KOps/s $\color{#d91a1a}-0.32\%$
test_membership_nested_leaf_last 69.5230μs 4.3497μs 229.9005 KOps/s 230.9201 KOps/s $\color{#d91a1a}-0.44\%$
test_membership_stacked_nested_last 33.3120μs 4.3831μs 228.1466 KOps/s 229.6074 KOps/s $\color{#d91a1a}-0.64\%$
test_membership_stacked_nested_leaf_last 38.1330μs 4.3820μs 228.2062 KOps/s 231.7446 KOps/s $\color{#d91a1a}-1.53\%$
test_nested_getleaf 50.0930μs 21.5932μs 46.3110 KOps/s 45.9547 KOps/s $\color{#35bf28}+0.78\%$
test_nested_get 52.1230μs 20.2285μs 49.4352 KOps/s 48.7827 KOps/s $\color{#35bf28}+1.34\%$
test_stacked_getleaf 74.5840μs 21.7054μs 46.0715 KOps/s 46.4866 KOps/s $\color{#d91a1a}-0.89\%$
test_stacked_get 88.8050μs 20.6203μs 48.4960 KOps/s 49.0583 KOps/s $\color{#d91a1a}-1.15\%$
test_nested_getitemleaf 45.1930μs 22.2006μs 45.0438 KOps/s 44.7158 KOps/s $\color{#35bf28}+0.73\%$
test_nested_getitem 60.5230μs 20.7631μs 48.1624 KOps/s 47.0783 KOps/s $\color{#35bf28}+2.30\%$
test_stacked_getitemleaf 50.8730μs 22.4037μs 44.6355 KOps/s 44.7283 KOps/s $\color{#d91a1a}-0.21\%$
test_stacked_getitem 66.3230μs 21.1720μs 47.2321 KOps/s 47.2206 KOps/s $\color{#35bf28}+0.02\%$
test_lock_nested 7.9511ms 0.4876ms 2.0507 KOps/s 2.1032 KOps/s $\color{#d91a1a}-2.50\%$
test_lock_stack_nested 0.5268ms 0.4818ms 2.0758 KOps/s 2.0582 KOps/s $\color{#35bf28}+0.85\%$
test_unlock_nested 0.5105ms 0.3903ms 2.5620 KOps/s 2.5637 KOps/s $\color{#d91a1a}-0.06\%$
test_unlock_stack_nested 0.4593ms 0.3904ms 2.5617 KOps/s 2.5473 KOps/s $\color{#35bf28}+0.56\%$
test_flatten_speed 0.1658ms 0.1225ms 8.1656 KOps/s 8.1488 KOps/s $\color{#35bf28}+0.21\%$
test_unflatten_speed 0.6359ms 0.5716ms 1.7494 KOps/s 1.7408 KOps/s $\color{#35bf28}+0.49\%$
test_common_ops 0.8521ms 0.6961ms 1.4365 KOps/s 1.4446 KOps/s $\color{#d91a1a}-0.56\%$
test_creation 0.1040ms 3.1415μs 318.3188 KOps/s 316.3879 KOps/s $\color{#35bf28}+0.61\%$
test_creation_empty 25.7110μs 6.9950μs 142.9585 KOps/s 143.3961 KOps/s $\color{#d91a1a}-0.31\%$
test_creation_nested_1 44.0120μs 11.5629μs 86.4838 KOps/s 86.6581 KOps/s $\color{#d91a1a}-0.20\%$
test_creation_nested_2 39.5720μs 13.2744μs 75.3331 KOps/s 75.3493 KOps/s $\color{#d91a1a}-0.02\%$
test_creation_many_keys[10] 56.4530μs 20.7855μs 48.1105 KOps/s 47.6855 KOps/s $\color{#35bf28}+0.89\%$
test_creation_many_keys[50] 0.1674ms 88.8020μs 11.2610 KOps/s 11.1003 KOps/s $\color{#35bf28}+1.45\%$
test_creation_many_keys[100] 0.2303ms 0.1744ms 5.7346 KOps/s 5.6634 KOps/s $\color{#35bf28}+1.26\%$
test_creation_nested_many_keys[10] 74.7050μs 44.8096μs 22.3166 KOps/s 22.2210 KOps/s $\color{#35bf28}+0.43\%$
test_creation_nested_many_keys[50] 0.2531ms 0.1827ms 5.4726 KOps/s 5.4632 KOps/s $\color{#35bf28}+0.17\%$
test_clone 46.7430μs 13.5529μs 73.7851 KOps/s 74.9668 KOps/s $\color{#d91a1a}-1.58\%$
test_getitem[int] 1.6075ms 15.2963μs 65.3754 KOps/s 59.4654 KOps/s $\textbf{\color{#35bf28}+9.94\%}$
test_getitem[slice_int] 0.1301ms 24.0213μs 41.6298 KOps/s 41.1801 KOps/s $\color{#35bf28}+1.09\%$
test_getitem[range] 0.1732ms 63.1144μs 15.8442 KOps/s 15.6346 KOps/s $\color{#35bf28}+1.34\%$
test_getitem[tuple] 0.1392ms 23.7913μs 42.0323 KOps/s 41.8334 KOps/s $\color{#35bf28}+0.48\%$
test_getitem[list] 0.1833ms 57.5763μs 17.3683 KOps/s 17.0285 KOps/s $\color{#35bf28}+2.00\%$
test_setitem_dim[int] 48.8130μs 26.0244μs 38.4256 KOps/s 38.6101 KOps/s $\color{#d91a1a}-0.48\%$
test_setitem_dim[slice_int] 69.5340μs 42.7845μs 23.3730 KOps/s 22.5483 KOps/s $\color{#35bf28}+3.66\%$
test_setitem_dim[range] 0.1181ms 95.1224μs 10.5128 KOps/s 10.5149 KOps/s $\color{#d91a1a}-0.02\%$
test_setitem_dim[tuple] 68.5140μs 40.0086μs 24.9946 KOps/s 24.6258 KOps/s $\color{#35bf28}+1.50\%$
test_setitem 59.9230μs 17.8708μs 55.9573 KOps/s 56.0619 KOps/s $\color{#d91a1a}-0.19\%$
test_set 66.6630μs 17.1212μs 58.4072 KOps/s 58.4590 KOps/s $\color{#d91a1a}-0.09\%$
test_set_shared 0.4991ms 0.2035ms 4.9136 KOps/s 4.9090 KOps/s $\color{#35bf28}+0.09\%$
test_update 0.1929ms 21.3824μs 46.7673 KOps/s 46.4811 KOps/s $\color{#35bf28}+0.62\%$
test_update_nested 80.3950μs 33.4508μs 29.8947 KOps/s 30.4686 KOps/s $\color{#d91a1a}-1.88\%$
test_update__nested 0.4627ms 34.5011μs 28.9846 KOps/s 28.9432 KOps/s $\color{#35bf28}+0.14\%$
test_set_nested 55.6230μs 18.7284μs 53.3949 KOps/s 52.9745 KOps/s $\color{#35bf28}+0.79\%$
test_set_nested_new 62.0330μs 23.8249μs 41.9729 KOps/s 41.6260 KOps/s $\color{#35bf28}+0.83\%$
test_select 77.6350μs 39.7909μs 25.1314 KOps/s 24.7940 KOps/s $\color{#35bf28}+1.36\%$
test_select_nested 0.1100ms 74.3721μs 13.4459 KOps/s 13.5215 KOps/s $\color{#d91a1a}-0.56\%$
test_exclude_nested 0.1452ms 91.8047μs 10.8927 KOps/s 10.8416 KOps/s $\color{#35bf28}+0.47\%$
test_empty[True] 0.4810ms 0.3984ms 2.5097 KOps/s 2.5060 KOps/s $\color{#35bf28}+0.15\%$
test_empty[False] 7.1355μs 1.3164μs 759.6587 KOps/s 756.2817 KOps/s $\color{#35bf28}+0.45\%$
test_to 0.1123ms 75.6481μs 13.2191 KOps/s 13.3468 KOps/s $\color{#d91a1a}-0.96\%$
test_to_nonblocking 0.1212ms 65.1480μs 15.3497 KOps/s 15.4466 KOps/s $\color{#d91a1a}-0.63\%$
test_unbind_speed 0.4016ms 0.3353ms 2.9820 KOps/s 2.9960 KOps/s $\color{#d91a1a}-0.47\%$
test_unbind_speed_stack0 0.4023ms 0.3301ms 3.0292 KOps/s 3.0112 KOps/s $\color{#35bf28}+0.60\%$
test_unbind_speed_stack1 0.1044s 0.8391ms 1.1918 KOps/s 1.1907 KOps/s $\color{#35bf28}+0.09\%$
test_split 0.1044s 1.2636ms 791.3955 Ops/s 786.3632 Ops/s $\color{#35bf28}+0.64\%$
test_chunk 0.1034s 1.2105ms 826.1307 Ops/s 924.8730 Ops/s $\textbf{\color{#d91a1a}-10.68\%}$
test_to_cpu_blocking 19.8872ms 19.6675ms 50.8454 Ops/s 45.7915 Ops/s $\textbf{\color{#35bf28}+11.04\%}$
test_to_cpu_global_sync 11.6345ms 11.4833ms 87.0832 Ops/s 87.2906 Ops/s $\color{#d91a1a}-0.24\%$
test_to_cpu_event_sync 12.7610ms 12.4772ms 80.1462 Ops/s 80.0242 Ops/s $\color{#35bf28}+0.15\%$
test_to_cpu_default 0.1166s 13.8117ms 72.4022 Ops/s 80.0225 Ops/s $\textbf{\color{#d91a1a}-9.52\%}$
test_consolidate[False-None] 4.2663ms 4.1716ms 239.7176 Ops/s 240.1108 Ops/s $\color{#d91a1a}-0.16\%$
test_consolidate[default-None] 2.1389ms 2.0302ms 492.5573 Ops/s 492.0238 Ops/s $\color{#35bf28}+0.11\%$
test_consolidate[reduce-overhead-None] 2.0409ms 1.9476ms 513.4413 Ops/s 499.7398 Ops/s $\color{#35bf28}+2.74\%$
test_consolidate_njt[False-None] 0.1902s 10.0536ms 99.4670 Ops/s 117.6362 Ops/s $\textbf{\color{#d91a1a}-15.45\%}$
test_to[False-False-None] 2.2119ms 2.1100ms 473.9228 Ops/s 474.0539 Ops/s $\color{#d91a1a}-0.03\%$
test_to[True-False-None] 2.1576ms 1.9144ms 522.3512 Ops/s 521.0134 Ops/s $\color{#35bf28}+0.26\%$
test_to[within-False-None] 6.3037ms 6.1250ms 163.2646 Ops/s 163.0031 Ops/s $\color{#35bf28}+0.16\%$
test_to[True-default-None] 9.0195ms 8.8511ms 112.9797 Ops/s 112.0771 Ops/s $\color{#35bf28}+0.81\%$
test_to_njt[False-False-None] 8.5835ms 8.4551ms 118.2713 Ops/s 116.7587 Ops/s $\color{#35bf28}+1.30\%$
test_to_njt[True-False-None] 7.4474ms 7.2478ms 137.9734 Ops/s 143.0337 Ops/s $\color{#d91a1a}-3.54\%$
test_to_njt[within-False-None] 15.7145ms 15.5068ms 64.4877 Ops/s 63.5187 Ops/s $\color{#35bf28}+1.53\%$
test_creation[device0] 0.3873ms 0.1151ms 8.6887 KOps/s 8.8156 KOps/s $\color{#d91a1a}-1.44\%$
test_creation_from_tensor 0.3976ms 0.1136ms 8.8055 KOps/s 8.9914 KOps/s $\color{#d91a1a}-2.07\%$
test_add_one[memmap_tensor0] 0.1954ms 6.7278μs 148.6369 KOps/s 148.0037 KOps/s $\color{#35bf28}+0.43\%$
test_contiguous[memmap_tensor0] 30.7020μs 0.6736μs 1.4846 MOps/s 2.0982 MOps/s $\textbf{\color{#d91a1a}-29.24\%}$
test_stack[memmap_tensor0] 30.4320μs 4.6357μs 215.7176 KOps/s 215.9403 KOps/s $\color{#d91a1a}-0.10\%$
test_memmaptd_index 1.0309ms 0.2715ms 3.6829 KOps/s 3.6793 KOps/s $\color{#35bf28}+0.10\%$
test_memmaptd_index_astensor 0.5338ms 0.3744ms 2.6710 KOps/s 2.6843 KOps/s $\color{#d91a1a}-0.50\%$
test_memmaptd_index_op 0.8067ms 0.6277ms 1.5932 KOps/s 1.5946 KOps/s $\color{#d91a1a}-0.09\%$
test_serialize_model 0.1402s 0.1371s 7.2918 Ops/s 7.3467 Ops/s $\color{#d91a1a}-0.75\%$
test_serialize_model_pickle 1.3496s 1.2101s 0.8263 Ops/s 0.8237 Ops/s $\color{#35bf28}+0.32\%$
test_serialize_weights 0.1358s 0.1345s 7.4328 Ops/s 7.3485 Ops/s $\color{#35bf28}+1.15\%$
test_serialize_weights_returnearly 0.4325s 88.0997ms 11.3508 Ops/s 6.2217 Ops/s $\textbf{\color{#35bf28}+82.44\%}$
test_serialize_weights_pickle 1.3655s 1.2154s 0.8227 Ops/s 0.8232 Ops/s $\color{#d91a1a}-0.06\%$
test_reshape_pytree 0.1994ms 32.6023μs 30.6727 KOps/s 30.7853 KOps/s $\color{#d91a1a}-0.37\%$
test_reshape_td 86.8050μs 45.1900μs 22.1288 KOps/s 22.4250 KOps/s $\color{#d91a1a}-1.32\%$
test_view_pytree 0.2630ms 32.4040μs 30.8604 KOps/s 30.9125 KOps/s $\color{#d91a1a}-0.17\%$
test_view_td 0.1160ms 52.2101μs 19.1534 KOps/s 18.8022 KOps/s $\color{#35bf28}+1.87\%$
test_unbind_pytree 0.2224ms 35.9378μs 27.8258 KOps/s 27.5802 KOps/s $\color{#35bf28}+0.89\%$
test_unbind_td 93.7450μs 49.4315μs 20.2300 KOps/s 20.2535 KOps/s $\color{#d91a1a}-0.12\%$
test_split_pytree 0.1925ms 41.8823μs 23.8764 KOps/s 23.6216 KOps/s $\color{#35bf28}+1.08\%$
test_split_td 0.1926ms 64.9130μs 15.4052 KOps/s 15.4880 KOps/s $\color{#d91a1a}-0.53\%$
test_add_pytree 0.1860ms 42.1834μs 23.7060 KOps/s 23.5030 KOps/s $\color{#35bf28}+0.86\%$
test_add_td 0.2070ms 56.5202μs 17.6928 KOps/s 17.8340 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_add_one_nested[tensordict-compile] 0.1985ms 0.1384ms 7.2266 KOps/s 6.5670 KOps/s $\textbf{\color{#35bf28}+10.04\%}$
test_compile_add_one_nested[tensordict-eager] 0.3394ms 0.2031ms 4.9233 KOps/s 5.0276 KOps/s $\color{#d91a1a}-2.07\%$
test_compile_add_one_nested[pytree-compile] 0.1684ms 0.1069ms 9.3562 KOps/s 8.9585 KOps/s $\color{#35bf28}+4.44\%$
test_compile_add_one_nested[pytree-eager] 0.4367ms 0.1784ms 5.6057 KOps/s 5.6296 KOps/s $\color{#d91a1a}-0.42\%$
test_compile_copy_nested[tensordict-compile] 0.3500ms 10.3094μs 96.9988 KOps/s 97.1638 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_copy_nested[tensordict-eager] 87.7750μs 53.9746μs 18.5272 KOps/s 18.2952 KOps/s $\color{#35bf28}+1.27\%$
test_compile_copy_nested[pytree-compile] 0.1426ms 9.8568μs 101.4524 KOps/s 99.8565 KOps/s $\color{#35bf28}+1.60\%$
test_compile_copy_nested[pytree-eager] 0.4448ms 67.8412μs 14.7403 KOps/s 14.6288 KOps/s $\color{#35bf28}+0.76\%$
test_compile_add_one_flat[tensordict-compile] 0.2349ms 0.1745ms 5.7322 KOps/s 5.3896 KOps/s $\textbf{\color{#35bf28}+6.36\%}$
test_compile_add_one_flat[tensordict-eager] 0.4574ms 0.2807ms 3.5621 KOps/s 3.5646 KOps/s $\color{#d91a1a}-0.07\%$
test_compile_add_one_flat[tensorclass-compile] 0.2633ms 0.1142ms 8.7593 KOps/s 8.1637 KOps/s $\textbf{\color{#35bf28}+7.30\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1303ms 72.4686μs 13.7991 KOps/s 13.5846 KOps/s $\color{#35bf28}+1.58\%$
test_compile_add_one_flat[pytree-compile] 0.2245ms 0.1566ms 6.3869 KOps/s 6.2155 KOps/s $\color{#35bf28}+2.76\%$
test_compile_add_one_flat[pytree-eager] 0.8188ms 0.5173ms 1.9333 KOps/s 1.9356 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_add_self_flat[tensordict-eager] 0.4364ms 0.3311ms 3.0203 KOps/s 2.9983 KOps/s $\color{#35bf28}+0.73\%$
test_compile_add_self_flat[tensordict-compile] 0.2684ms 0.1763ms 5.6734 KOps/s 5.2746 KOps/s $\textbf{\color{#35bf28}+7.56\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1298ms 87.9860μs 11.3654 KOps/s 11.2661 KOps/s $\color{#35bf28}+0.88\%$
test_compile_add_self_flat[tensorclass-compile] 0.2471ms 0.1168ms 8.5642 KOps/s 8.0179 KOps/s $\textbf{\color{#35bf28}+6.81\%}$
test_compile_add_self_flat[pytree-eager] 0.6376ms 0.4250ms 2.3531 KOps/s 2.3385 KOps/s $\color{#35bf28}+0.62\%$
test_compile_add_self_flat[pytree-compile] 0.2721ms 0.1573ms 6.3590 KOps/s 6.0764 KOps/s $\color{#35bf28}+4.65\%$
test_compile_copy_flat[tensordict-compile] 0.1250ms 13.3899μs 74.6834 KOps/s 69.4132 KOps/s $\textbf{\color{#35bf28}+7.59\%}$
test_compile_copy_flat[tensordict-eager] 70.7240μs 41.4486μs 24.1263 KOps/s 23.9871 KOps/s $\color{#35bf28}+0.58\%$
test_compile_copy_flat[pytree-compile] 75.6440μs 10.9371μs 91.4318 KOps/s 92.6244 KOps/s $\color{#d91a1a}-1.29\%$
test_compile_copy_flat[pytree-eager] 0.1798s 62.1683μs 16.0854 KOps/s 19.0020 KOps/s $\textbf{\color{#d91a1a}-15.35\%}$
test_compile_assign_and_add[tensordict-compile] 2.0275ms 0.1743ms 5.7375 KOps/s 5.3132 KOps/s $\textbf{\color{#35bf28}+7.99\%}$
test_compile_assign_and_add[tensordict-eager] 3.4296ms 3.3004ms 302.9945 Ops/s 300.5842 Ops/s $\color{#35bf28}+0.80\%$
test_compile_assign_and_add[pytree-compile] 2.0258ms 0.1614ms 6.1964 KOps/s 6.0567 KOps/s $\color{#35bf28}+2.31\%$
test_compile_assign_and_add[pytree-eager] 2.9518ms 2.7870ms 358.8103 Ops/s 358.8147 Ops/s $-0.00\%$
test_compile_indexing[tensor-tensordict-compile] 0.2210ms 0.1071ms 9.3351 KOps/s 8.7222 KOps/s $\textbf{\color{#35bf28}+7.03\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3104ms 72.6576μs 13.7632 KOps/s 13.3499 KOps/s $\color{#35bf28}+3.10\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2721ms 93.5751μs 10.6866 KOps/s 10.2058 KOps/s $\color{#35bf28}+4.71\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2680ms 43.8897μs 22.7844 KOps/s 23.0074 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_indexing[tensor-pytree-compile] 0.1380ms 94.2027μs 10.6154 KOps/s 10.1262 KOps/s $\color{#35bf28}+4.83\%$
test_compile_indexing[tensor-pytree-eager] 0.2883ms 43.9228μs 22.7672 KOps/s 23.1606 KOps/s $\color{#d91a1a}-1.70\%$
test_compile_indexing[slice-tensordict-compile] 0.2255ms 56.8596μs 17.5872 KOps/s 17.0273 KOps/s $\color{#35bf28}+3.29\%$
test_compile_indexing[slice-tensordict-eager] 0.2193ms 27.4435μs 36.4385 KOps/s 36.3167 KOps/s $\color{#35bf28}+0.34\%$
test_compile_indexing[slice-tensorclass-compile] 0.1963ms 44.6619μs 22.3904 KOps/s 22.7289 KOps/s $\color{#d91a1a}-1.49\%$
test_compile_indexing[slice-tensorclass-eager] 0.2466ms 22.5003μs 44.4438 KOps/s 45.0084 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_indexing[slice-pytree-compile] 89.6750μs 45.7987μs 21.8347 KOps/s 21.9363 KOps/s $\color{#d91a1a}-0.46\%$
test_compile_indexing[slice-pytree-eager] 0.2936ms 22.3073μs 44.8283 KOps/s 44.5626 KOps/s $\color{#35bf28}+0.60\%$
test_compile_indexing[int-tensordict-compile] 0.1021ms 58.0702μs 17.2205 KOps/s 16.9482 KOps/s $\color{#35bf28}+1.61\%$
test_compile_indexing[int-tensordict-eager] 0.2135ms 27.6138μs 36.2138 KOps/s 36.2511 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_indexing[int-tensorclass-compile] 87.3050μs 45.1306μs 22.1579 KOps/s 22.1109 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[int-tensorclass-eager] 0.2777ms 22.4021μs 44.6387 KOps/s 44.6400 KOps/s $-0.00\%$
test_compile_indexing[int-pytree-compile] 83.6940μs 45.3270μs 22.0619 KOps/s 22.8034 KOps/s $\color{#d91a1a}-3.25\%$
test_compile_indexing[int-pytree-eager] 0.2917ms 22.4333μs 44.5766 KOps/s 44.4241 KOps/s $\color{#35bf28}+0.34\%$
test_compile_replace[single-eager] 0.1046ms 46.4289μs 21.5383 KOps/s 21.4916 KOps/s $\color{#35bf28}+0.22\%$
test_compile_replace[single-compile] 0.1734ms 0.1022ms 9.7817 KOps/s 9.3506 KOps/s $\color{#35bf28}+4.61\%$
test_compile_replace[multi-eager] 0.6459ms 0.5491ms 1.8213 KOps/s 1.8168 KOps/s $\color{#35bf28}+0.25\%$
test_compile_replace[multi-compile] 0.2077ms 0.1100ms 9.0923 KOps/s 8.8806 KOps/s $\color{#35bf28}+2.38\%$
test_compile_tc_getattr_20[eager] 0.3491ms 0.1681ms 5.9500 KOps/s 6.1187 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_tc_getattr_20[compile] 0.2549ms 0.1180ms 8.4746 KOps/s 8.2958 KOps/s $\color{#35bf28}+2.15\%$
test_compile_clone_shallow[20-eager] 49.5230μs 19.3468μs 51.6882 KOps/s 52.1049 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_clone_shallow[20-compile] 60.2830μs 11.4928μs 87.0110 KOps/s 87.5346 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_clone_shallow[40-eager] 0.1013ms 33.8168μs 29.5711 KOps/s 29.4014 KOps/s $\color{#35bf28}+0.58\%$
test_compile_clone_shallow[40-compile] 68.4340μs 12.5365μs 79.7670 KOps/s 81.0391 KOps/s $\color{#d91a1a}-1.57\%$
test_compile_clone_shallow[80-eager] 0.1070ms 63.4082μs 15.7708 KOps/s 15.9685 KOps/s $\color{#d91a1a}-1.24\%$
test_compile_clone_shallow[80-compile] 73.0840μs 15.0154μs 66.5981 KOps/s 66.1131 KOps/s $\color{#35bf28}+0.73\%$
test_compile_update_inplace[eager] 0.1687ms 60.1080μs 16.6367 KOps/s 17.2503 KOps/s $\color{#d91a1a}-3.56\%$
test_compile_update_inplace[compile] 0.1841ms 0.1399ms 7.1482 KOps/s 6.9362 KOps/s $\color{#35bf28}+3.06\%$
test_mod_add[eager] 79.2440μs 48.1530μs 20.7671 KOps/s 20.6610 KOps/s $\color{#35bf28}+0.51\%$
test_mod_add[compile] 0.2492ms 0.1043ms 9.5845 KOps/s 9.1157 KOps/s $\textbf{\color{#35bf28}+5.14\%}$
test_mod_add[compile-overhead] 0.2335ms 0.1459ms 6.8555 KOps/s 6.6085 KOps/s $\color{#35bf28}+3.74\%$
test_mod_wrap[eager] 0.4260ms 0.2881ms 3.4715 KOps/s 3.3217 KOps/s $\color{#35bf28}+4.51\%$
test_mod_wrap[compile] 0.4106ms 0.3453ms 2.8957 KOps/s 2.7682 KOps/s $\color{#35bf28}+4.61\%$
test_mod_wrap[compile-overhead] 7.3190ms 4.0406ms 247.4889 Ops/s 244.9846 Ops/s $\color{#35bf28}+1.02\%$
test_mod_wrap_and_backward[eager] 1.6686ms 1.4836ms 674.0503 Ops/s 670.7712 Ops/s $\color{#35bf28}+0.49\%$
test_mod_wrap_and_backward[compile] 1.7051ms 1.4318ms 698.4286 Ops/s 692.6691 Ops/s $\color{#35bf28}+0.83\%$
test_mod_wrap_and_backward[compile-overhead] 1.2696ms 0.8761ms 1.1415 KOps/s 1.1121 KOps/s $\color{#35bf28}+2.64\%$
test_seq_add[eager] 0.2073ms 0.1516ms 6.5977 KOps/s 6.2443 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_seq_add[compile] 0.2681ms 0.1168ms 8.5631 KOps/s 8.5893 KOps/s $\color{#d91a1a}-0.30\%$
test_seq_add[compile-overhead] 0.2906ms 0.1525ms 6.5590 KOps/s 6.1391 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_seq_wrap[eager] 0.5799ms 0.5127ms 1.9506 KOps/s 1.8407 KOps/s $\textbf{\color{#35bf28}+5.97\%}$
test_seq_wrap[compile] 0.4580ms 0.3625ms 2.7588 KOps/s 2.6278 KOps/s $\color{#35bf28}+4.99\%$
test_seq_wrap[compile-overhead] 0.4167ms 0.2610ms 3.8314 KOps/s 3.5988 KOps/s $\textbf{\color{#35bf28}+6.46\%}$
test_func_call_runtime[False-eager] 1.0149ms 0.8317ms 1.2023 KOps/s 1.2094 KOps/s $\color{#d91a1a}-0.58\%$
test_func_call_runtime[False-compile] 1.0678ms 0.9024ms 1.1082 KOps/s 1.1000 KOps/s $\color{#35bf28}+0.75\%$
test_func_call_runtime[False-compile-overhead] 0.5789ms 0.4591ms 2.1782 KOps/s 2.1420 KOps/s $\color{#35bf28}+1.69\%$
test_func_call_runtime[True-eager] 1.2244ms 1.0764ms 929.0068 Ops/s 927.7616 Ops/s $\color{#35bf28}+0.13\%$
test_func_call_runtime[True-compile] 1.0476ms 0.9189ms 1.0882 KOps/s 1.0898 KOps/s $\color{#d91a1a}-0.14\%$
test_func_call_runtime[True-compile-overhead] 0.5852ms 0.4712ms 2.1221 KOps/s 2.0803 KOps/s $\color{#35bf28}+2.01\%$
test_func_call_cm_runtime[False-eager] 0.9401ms 0.8827ms 1.1328 KOps/s 1.2129 KOps/s $\textbf{\color{#d91a1a}-6.60\%}$
test_func_call_cm_runtime[False-compile] 1.0815ms 0.9122ms 1.0963 KOps/s 1.0731 KOps/s $\color{#35bf28}+2.16\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5472ms 0.4602ms 2.1732 KOps/s 2.1461 KOps/s $\color{#35bf28}+1.26\%$
test_func_call_cm_runtime[True-eager] 1.3132ms 1.2261ms 815.5628 Ops/s 825.7118 Ops/s $\color{#d91a1a}-1.23\%$
test_func_call_cm_runtime[True-compile] 1.0910ms 0.9547ms 1.0474 KOps/s 1.0454 KOps/s $\color{#35bf28}+0.19\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5682ms 0.5033ms 1.9870 KOps/s 1.9402 KOps/s $\color{#35bf28}+2.41\%$
test_vmap_func_call_cm_runtime[eager] 2.8593ms 2.3648ms 422.8672 Ops/s 420.0239 Ops/s $\color{#35bf28}+0.68\%$
test_vmap_func_call_cm_runtime[compile] 1.1499ms 0.9764ms 1.0242 KOps/s 1.0242 KOps/s $-0.00\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5704ms 0.5111ms 1.9565 KOps/s 1.9181 KOps/s $\color{#35bf28}+2.00\%$
test_distributed 2.5804ms 0.1657ms 6.0353 KOps/s 6.4867 KOps/s $\textbf{\color{#d91a1a}-6.96\%}$
test_tdmodule 0.1823ms 27.2352μs 36.7171 KOps/s 36.9860 KOps/s $\color{#d91a1a}-0.73\%$
test_tdmodule_dispatch 74.5340μs 45.1079μs 22.1691 KOps/s 22.0212 KOps/s $\color{#35bf28}+0.67\%$
test_tdseq 45.8020μs 26.9193μs 37.1481 KOps/s 37.4587 KOps/s $\color{#d91a1a}-0.83\%$
test_tdseq_dispatch 68.6330μs 47.1442μs 21.2115 KOps/s 21.0266 KOps/s $\color{#35bf28}+0.88\%$
test_instantiation_functorch 2.2065ms 2.0845ms 479.7245 Ops/s 479.3529 Ops/s $\color{#35bf28}+0.08\%$
test_exec_functorch 0.2307ms 0.1805ms 5.5395 KOps/s 5.5334 KOps/s $\color{#35bf28}+0.11\%$
test_exec_functional_call 0.2076ms 0.1612ms 6.2039 KOps/s 6.2350 KOps/s $\color{#d91a1a}-0.50\%$
test_exec_td_decorator 0.4389ms 0.2379ms 4.2033 KOps/s 4.1900 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed_decorator[True-True] 1.0084ms 0.8214ms 1.2174 KOps/s 1.2135 KOps/s $\color{#35bf28}+0.32\%$
test_vmap_mlp_speed_decorator[True-False] 0.9898ms 0.8212ms 1.2178 KOps/s 1.2216 KOps/s $\color{#d91a1a}-0.31\%$
test_vmap_mlp_speed_decorator[False-True] 1.0519ms 0.7095ms 1.4095 KOps/s 1.4126 KOps/s $\color{#d91a1a}-0.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.9033ms 0.7097ms 1.4090 KOps/s 1.4150 KOps/s $\color{#d91a1a}-0.42\%$
test_vmap_transformer_speed_decorator[True-True] 21.0807ms 20.5196ms 48.7338 Ops/s 48.8156 Ops/s $\color{#d91a1a}-0.17\%$
test_vmap_transformer_speed_decorator[True-False] 21.2047ms 20.5128ms 48.7501 Ops/s 48.8195 Ops/s $\color{#d91a1a}-0.14\%$
test_vmap_transformer_speed_decorator[False-True] 20.9939ms 20.3139ms 49.2273 Ops/s 49.3533 Ops/s $\color{#d91a1a}-0.26\%$
test_vmap_transformer_speed_decorator[False-False] 20.4709ms 20.2914ms 49.2820 Ops/s 49.2905 Ops/s $\color{#d91a1a}-0.02\%$
test_to_module_speed[True] 1.5680ms 1.4762ms 677.4127 Ops/s 675.2574 Ops/s $\color{#35bf28}+0.32\%$
test_to_module_speed[False] 1.5542ms 1.4656ms 682.3068 Ops/s 686.5310 Ops/s $\color{#d91a1a}-0.62\%$
test_tc_init 82.9240μs 44.5656μs 22.4388 KOps/s 22.5802 KOps/s $\color{#d91a1a}-0.63\%$
test_tc_init_tensor_only 45.4320μs 9.8552μs 101.4694 KOps/s 103.5141 KOps/s $\color{#d91a1a}-1.98\%$
test_tc_init_nested 0.1211ms 87.5143μs 11.4267 KOps/s 11.3469 KOps/s $\color{#35bf28}+0.70\%$
test_tc_init_many_fields 0.1922ms 16.5558μs 60.4016 KOps/s 60.5822 KOps/s $\color{#d91a1a}-0.30\%$
test_tc_first_layer_tensor 32.8110μs 1.8244μs 548.1347 KOps/s 545.6558 KOps/s $\color{#35bf28}+0.45\%$
test_tc_first_layer_tensor_only 16.0092μs 0.3965μs 2.5219 MOps/s 2.5047 MOps/s $\color{#35bf28}+0.69\%$
test_tc_first_layer_tensor_set 32.7920μs 3.9174μs 255.2732 KOps/s 251.9657 KOps/s $\color{#35bf28}+1.31\%$
test_tc_first_layer_tensor_only_set 18.7910μs 3.2225μs 310.3215 KOps/s 307.2540 KOps/s $\color{#35bf28}+1.00\%$
test_tc_first_layer_nontensor 38.4120μs 6.1610μs 162.3110 KOps/s 161.2248 KOps/s $\color{#35bf28}+0.67\%$
test_tc_second_layer_tensor 33.8620μs 4.4888μs 222.7745 KOps/s 221.6370 KOps/s $\color{#35bf28}+0.51\%$
test_tc_second_layer_nontensor 40.7820μs 8.6498μs 115.6094 KOps/s 114.3294 KOps/s $\color{#35bf28}+1.12\%$
test_unbind 0.2726s 14.2189ms 70.3287 Ops/s 55.4462 Ops/s $\textbf{\color{#35bf28}+26.84\%}$
test_full_like 17.5233ms 16.7333ms 59.7612 Ops/s 228.3937 Ops/s $\textbf{\color{#d91a1a}-73.83\%}$
test_zeros_like 16.9181ms 16.6386ms 60.1011 Ops/s 229.0686 Ops/s $\textbf{\color{#d91a1a}-73.76\%}$
test_ones_like 17.9179ms 16.8269ms 59.4288 Ops/s 228.7501 Ops/s $\textbf{\color{#d91a1a}-74.02\%}$
test_clone 17.9909ms 17.6214ms 56.7491 Ops/s 154.8207 Ops/s $\textbf{\color{#d91a1a}-63.35\%}$
test_squeeze 0.1131ms 14.0855μs 70.9950 KOps/s 70.6839 KOps/s $\color{#35bf28}+0.44\%$
test_unsqueeze 0.1661ms 0.1102ms 9.0767 KOps/s 9.1294 KOps/s $\color{#d91a1a}-0.58\%$
test_split 0.3086ms 0.1858ms 5.3826 KOps/s 5.3585 KOps/s $\color{#35bf28}+0.45\%$
test_permute 0.3907ms 0.2094ms 4.7766 KOps/s 4.8651 KOps/s $\color{#d91a1a}-1.82\%$
test_stack 51.3717ms 51.1379ms 19.5550 Ops/s 19.6335 Ops/s $\color{#d91a1a}-0.40\%$
test_cat 51.4311ms 51.1387ms 19.5547 Ops/s 19.6397 Ops/s $\color{#d91a1a}-0.43\%$
test_sequential_tensordict 0.2778ms 0.2246ms 4.4524 KOps/s 4.5217 KOps/s $\color{#d91a1a}-1.53\%$
test_sequential_graph_module 0.5045ms 0.1234ms 8.1030 KOps/s 8.3751 KOps/s $\color{#d91a1a}-3.25\%$
test_nested_tensordict 0.4038ms 0.2922ms 3.4227 KOps/s 3.3871 KOps/s $\color{#35bf28}+1.05\%$
test_nested_graph_module 0.5413ms 0.1329ms 7.5226 KOps/s 7.4625 KOps/s $\color{#35bf28}+0.81\%$

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}26$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 42.0710μs 14.9180μs 67.0332 KOps/s 67.3474 KOps/s $\color{#d91a1a}-0.47\%$
test_plain_set_stack_nested 30.9710μs 15.2450μs 65.5951 KOps/s 65.8057 KOps/s $\color{#d91a1a}-0.32\%$
test_plain_set_nested_inplace 46.5810μs 16.5486μs 60.4279 KOps/s 58.9094 KOps/s $\color{#35bf28}+2.58\%$
test_plain_set_stack_nested_inplace 50.9410μs 16.2491μs 61.5418 KOps/s 58.9226 KOps/s $\color{#35bf28}+4.45\%$
test_items 25.1210μs 5.9825μs 167.1538 KOps/s 165.1938 KOps/s $\color{#35bf28}+1.19\%$
test_items_nested 0.5431ms 0.4728ms 2.1149 KOps/s 2.1106 KOps/s $\color{#35bf28}+0.21\%$
test_items_nested_locked 0.5318ms 0.4740ms 2.1095 KOps/s 2.1072 KOps/s $\color{#35bf28}+0.11\%$
test_items_nested_leaf 0.1240ms 97.5225μs 10.2540 KOps/s 10.0995 KOps/s $\color{#35bf28}+1.53\%$
test_items_stack_nested 0.5782ms 0.4724ms 2.1167 KOps/s 2.1506 KOps/s $\color{#d91a1a}-1.57\%$
test_items_stack_nested_leaf 0.1419ms 98.3139μs 10.1715 KOps/s 10.1023 KOps/s $\color{#35bf28}+0.68\%$
test_items_stack_nested_locked 0.6063ms 0.4787ms 2.0889 KOps/s 2.1116 KOps/s $\color{#d91a1a}-1.08\%$
test_keys 30.4700μs 4.2439μs 235.6317 KOps/s 233.4440 KOps/s $\color{#35bf28}+0.94\%$
test_keys_nested 0.1742ms 0.1312ms 7.6205 KOps/s 7.5805 KOps/s $\color{#35bf28}+0.53\%$
test_keys_nested_locked 0.7738ms 0.1390ms 7.1934 KOps/s 7.1637 KOps/s $\color{#35bf28}+0.42\%$
test_keys_nested_leaf 0.1854ms 0.1211ms 8.2545 KOps/s 8.3565 KOps/s $\color{#d91a1a}-1.22\%$
test_keys_stack_nested 0.1958ms 0.1323ms 7.5595 KOps/s 7.7086 KOps/s $\color{#d91a1a}-1.93\%$
test_keys_stack_nested_leaf 0.1545ms 0.1218ms 8.2132 KOps/s 8.3063 KOps/s $\color{#d91a1a}-1.12\%$
test_keys_stack_nested_locked 0.1891ms 0.1395ms 7.1691 KOps/s 7.2896 KOps/s $\color{#d91a1a}-1.65\%$
test_values 5.1420μs 1.0228μs 977.6771 KOps/s 986.8799 KOps/s $\color{#d91a1a}-0.93\%$
test_values_nested 0.1248ms 53.4864μs 18.6963 KOps/s 19.1284 KOps/s $\color{#d91a1a}-2.26\%$
test_values_nested_locked 84.0020μs 56.5877μs 17.6717 KOps/s 18.0814 KOps/s $\color{#d91a1a}-2.27\%$
test_values_nested_leaf 80.0520μs 61.1356μs 16.3571 KOps/s 16.7643 KOps/s $\color{#d91a1a}-2.43\%$
test_values_stack_nested 80.4020μs 53.6596μs 18.6360 KOps/s 19.0495 KOps/s $\color{#d91a1a}-2.17\%$
test_values_stack_nested_leaf 85.6410μs 61.1154μs 16.3625 KOps/s 16.7227 KOps/s $\color{#d91a1a}-2.15\%$
test_values_stack_nested_locked 95.1620μs 56.4184μs 17.7247 KOps/s 17.8949 KOps/s $\color{#d91a1a}-0.95\%$
test_membership 4.4900μs 0.8524μs 1.1732 MOps/s 1.1853 MOps/s $\color{#d91a1a}-1.02\%$
test_membership_nested 39.4400μs 2.8850μs 346.6171 KOps/s 346.6596 KOps/s $\color{#d91a1a}-0.01\%$
test_membership_nested_leaf 30.6810μs 2.9161μs 342.9272 KOps/s 363.1182 KOps/s $\textbf{\color{#d91a1a}-5.56\%}$
test_membership_stacked_nested 39.4810μs 2.8849μs 346.6298 KOps/s 343.4730 KOps/s $\color{#35bf28}+0.92\%$
test_membership_stacked_nested_leaf 31.2300μs 2.8915μs 345.8373 KOps/s 343.5342 KOps/s $\color{#35bf28}+0.67\%$
test_membership_nested_last 32.5910μs 4.3881μs 227.8870 KOps/s 226.8980 KOps/s $\color{#35bf28}+0.44\%$
test_membership_nested_leaf_last 30.7300μs 4.3336μs 230.7559 KOps/s 227.7853 KOps/s $\color{#35bf28}+1.30\%$
test_membership_stacked_nested_last 28.1710μs 4.3335μs 230.7578 KOps/s 227.4615 KOps/s $\color{#35bf28}+1.45\%$
test_membership_stacked_nested_leaf_last 25.0200μs 4.3570μs 229.5171 KOps/s 228.2747 KOps/s $\color{#35bf28}+0.54\%$
test_nested_getleaf 53.7810μs 22.0824μs 45.2850 KOps/s 46.1742 KOps/s $\color{#d91a1a}-1.93\%$
test_nested_get 43.8110μs 20.9910μs 47.6395 KOps/s 48.3219 KOps/s $\color{#d91a1a}-1.41\%$
test_stacked_getleaf 53.0510μs 21.9898μs 45.4757 KOps/s 45.9234 KOps/s $\color{#d91a1a}-0.97\%$
test_stacked_get 57.1410μs 21.0963μs 47.4018 KOps/s 48.7480 KOps/s $\color{#d91a1a}-2.76\%$
test_nested_getitemleaf 42.0200μs 22.4806μs 44.4827 KOps/s 45.1290 KOps/s $\color{#d91a1a}-1.43\%$
test_nested_getitem 51.7710μs 21.0942μs 47.4064 KOps/s 47.4615 KOps/s $\color{#d91a1a}-0.12\%$
test_stacked_getitemleaf 53.3610μs 22.0897μs 45.2699 KOps/s 45.1662 KOps/s $\color{#35bf28}+0.23\%$
test_stacked_getitem 50.5310μs 21.2767μs 46.9998 KOps/s 47.5850 KOps/s $\color{#d91a1a}-1.23\%$
test_lock_nested 8.2496ms 0.4892ms 2.0443 KOps/s 2.0831 KOps/s $\color{#d91a1a}-1.86\%$
test_lock_stack_nested 0.5457ms 0.4830ms 2.0703 KOps/s 2.0566 KOps/s $\color{#35bf28}+0.67\%$
test_unlock_nested 0.4558ms 0.3881ms 2.5767 KOps/s 2.5569 KOps/s $\color{#35bf28}+0.77\%$
test_unlock_stack_nested 0.4460ms 0.3884ms 2.5748 KOps/s 2.5310 KOps/s $\color{#35bf28}+1.73\%$
test_flatten_speed 0.1701ms 0.1227ms 8.1525 KOps/s 8.1266 KOps/s $\color{#35bf28}+0.32\%$
test_unflatten_speed 0.6447ms 0.5762ms 1.7356 KOps/s 1.7302 KOps/s $\color{#35bf28}+0.31\%$
test_common_ops 0.7992ms 0.6875ms 1.4545 KOps/s 1.4444 KOps/s $\color{#35bf28}+0.69\%$
test_creation 71.8520μs 3.1342μs 319.0588 KOps/s 313.2180 KOps/s $\color{#35bf28}+1.86\%$
test_creation_empty 42.4210μs 6.9670μs 143.5341 KOps/s 142.0070 KOps/s $\color{#35bf28}+1.08\%$
test_creation_nested_1 43.3710μs 11.5485μs 86.5915 KOps/s 85.6462 KOps/s $\color{#35bf28}+1.10\%$
test_creation_nested_2 40.2710μs 13.3423μs 74.9496 KOps/s 73.4269 KOps/s $\color{#35bf28}+2.07\%$
test_creation_many_keys[10] 51.6710μs 20.8493μs 47.9631 KOps/s 47.1871 KOps/s $\color{#35bf28}+1.64\%$
test_creation_many_keys[50] 0.1244ms 89.2815μs 11.2005 KOps/s 10.9768 KOps/s $\color{#35bf28}+2.04\%$
test_creation_many_keys[100] 0.2523ms 0.1762ms 5.6766 KOps/s 5.6096 KOps/s $\color{#35bf28}+1.20\%$
test_creation_nested_many_keys[10] 82.7710μs 44.6311μs 22.4059 KOps/s 21.9413 KOps/s $\color{#35bf28}+2.12\%$
test_creation_nested_many_keys[50] 0.2501ms 0.1834ms 5.4537 KOps/s 5.4372 KOps/s $\color{#35bf28}+0.30\%$
test_clone 33.7110μs 13.0644μs 76.5438 KOps/s 76.5801 KOps/s $\color{#d91a1a}-0.05\%$
test_getitem[int] 1.6013ms 15.2432μs 65.6031 KOps/s 58.2579 KOps/s $\textbf{\color{#35bf28}+12.61\%}$
test_getitem[slice_int] 0.1416ms 24.4709μs 40.8649 KOps/s 41.8557 KOps/s $\color{#d91a1a}-2.37\%$
test_getitem[range] 0.1710ms 62.4385μs 16.0158 KOps/s 15.1007 KOps/s $\textbf{\color{#35bf28}+6.06\%}$
test_getitem[tuple] 0.1415ms 24.1846μs 41.3487 KOps/s 42.0461 KOps/s $\color{#d91a1a}-1.66\%$
test_getitem[list] 0.1960ms 57.5926μs 17.3633 KOps/s 17.2625 KOps/s $\color{#35bf28}+0.58\%$
test_setitem_dim[int] 55.0610μs 25.6101μs 39.0471 KOps/s 38.0990 KOps/s $\color{#35bf28}+2.49\%$
test_setitem_dim[slice_int] 64.0420μs 42.9926μs 23.2598 KOps/s 22.2491 KOps/s $\color{#35bf28}+4.54\%$
test_setitem_dim[range] 0.1282ms 94.4013μs 10.5931 KOps/s 10.0714 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_setitem_dim[tuple] 62.1210μs 40.1761μs 24.8904 KOps/s 25.8570 KOps/s $\color{#d91a1a}-3.74\%$
test_setitem 62.3920μs 17.3092μs 57.7726 KOps/s 56.2954 KOps/s $\color{#35bf28}+2.62\%$
test_set 43.3910μs 16.6134μs 60.1925 KOps/s 55.2133 KOps/s $\textbf{\color{#35bf28}+9.02\%}$
test_set_shared 0.5686ms 0.2024ms 4.9415 KOps/s 4.6980 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_update 0.3530ms 21.5799μs 46.3394 KOps/s 43.5403 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_update_nested 69.0210μs 33.1892μs 30.1303 KOps/s 29.2267 KOps/s $\color{#35bf28}+3.09\%$
test_update__nested 0.4677ms 34.0562μs 29.3632 KOps/s 27.4330 KOps/s $\textbf{\color{#35bf28}+7.04\%}$
test_set_nested 45.2410μs 18.6718μs 53.5566 KOps/s 49.0382 KOps/s $\textbf{\color{#35bf28}+9.21\%}$
test_set_nested_new 55.7210μs 23.6919μs 42.2085 KOps/s 39.1873 KOps/s $\textbf{\color{#35bf28}+7.71\%}$
test_select 72.2710μs 39.3959μs 25.3834 KOps/s 23.0962 KOps/s $\textbf{\color{#35bf28}+9.90\%}$
test_select_nested 0.1130ms 75.3765μs 13.2667 KOps/s 13.3543 KOps/s $\color{#d91a1a}-0.66\%$
test_exclude_nested 0.1243ms 92.7378μs 10.7831 KOps/s 10.7033 KOps/s $\color{#35bf28}+0.75\%$
test_empty[True] 0.4712ms 0.4018ms 2.4889 KOps/s 2.4829 KOps/s $\color{#35bf28}+0.24\%$
test_empty[False] 9.5527μs 1.3378μs 747.5209 KOps/s 750.4203 KOps/s $\color{#d91a1a}-0.39\%$
test_to 0.1092ms 71.7100μs 13.9451 KOps/s 13.4432 KOps/s $\color{#35bf28}+3.73\%$
test_to_nonblocking 0.1090ms 64.0354μs 15.6164 KOps/s 15.4873 KOps/s $\color{#35bf28}+0.83\%$
test_unbind_speed 0.3892ms 0.3349ms 2.9859 KOps/s 2.9794 KOps/s $\color{#35bf28}+0.22\%$
test_unbind_speed_stack0 0.3895ms 0.3346ms 2.9886 KOps/s 3.0008 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind_speed_stack1 0.1036s 0.8420ms 1.1877 KOps/s 1.1789 KOps/s $\color{#35bf28}+0.74\%$
test_split 0.1036s 1.2657ms 790.0677 Ops/s 782.9867 Ops/s $\color{#35bf28}+0.90\%$
test_chunk 0.1034s 1.2122ms 824.9145 Ops/s 926.8018 Ops/s $\textbf{\color{#d91a1a}-10.99\%}$
test_to_cpu_blocking 28.5713ms 28.3796ms 35.2366 Ops/s 34.7918 Ops/s $\color{#35bf28}+1.28\%$
test_to_cpu_global_sync 11.6395ms 11.2422ms 88.9506 Ops/s 79.7074 Ops/s $\textbf{\color{#35bf28}+11.60\%}$
test_to_cpu_event_sync 12.4154ms 12.2266ms 81.7887 Ops/s 81.0594 Ops/s $\color{#35bf28}+0.90\%$
test_to_cpu_default 12.5117ms 12.2165ms 81.8566 Ops/s 81.1905 Ops/s $\color{#35bf28}+0.82\%$
test_consolidate[False-None] 4.2819ms 4.1623ms 240.2540 Ops/s 238.0236 Ops/s $\color{#35bf28}+0.94\%$
test_consolidate[default-None] 2.1320ms 2.0111ms 497.2509 Ops/s 486.3302 Ops/s $\color{#35bf28}+2.25\%$
test_consolidate[reduce-overhead-None] 2.0329ms 1.9334ms 517.2132 Ops/s 505.0451 Ops/s $\color{#35bf28}+2.41\%$
test_consolidate_njt[False-None] 8.7086ms 8.5139ms 117.4550 Ops/s 117.6327 Ops/s $\color{#d91a1a}-0.15\%$
test_to[False-False-None] 2.2002ms 2.0917ms 478.0786 Ops/s 477.1367 Ops/s $\color{#35bf28}+0.20\%$
test_to[True-False-None] 0.1832s 2.3080ms 433.2736 Ops/s 508.0849 Ops/s $\textbf{\color{#d91a1a}-14.72\%}$
test_to[within-False-None] 6.3311ms 6.1576ms 162.4008 Ops/s 162.9049 Ops/s $\color{#d91a1a}-0.31\%$
test_to[True-default-None] 9.2638ms 8.8867ms 112.5281 Ops/s 113.5419 Ops/s $\color{#d91a1a}-0.89\%$
test_to_njt[False-False-None] 10.1115ms 8.4787ms 117.9423 Ops/s 116.8988 Ops/s $\color{#35bf28}+0.89\%$
test_to_njt[True-False-None] 7.2462ms 6.9614ms 143.6497 Ops/s 142.4749 Ops/s $\color{#35bf28}+0.82\%$
test_to_njt[within-False-None] 16.3069ms 15.6682ms 63.8236 Ops/s 63.5162 Ops/s $\color{#35bf28}+0.48\%$
test_creation[device0] 0.3933ms 0.1157ms 8.6433 KOps/s 8.8086 KOps/s $\color{#d91a1a}-1.88\%$
test_creation_from_tensor 0.4650ms 0.1162ms 8.6076 KOps/s 8.7645 KOps/s $\color{#d91a1a}-1.79\%$
test_add_one[memmap_tensor0] 0.3244ms 6.2441μs 160.1522 KOps/s 157.4978 KOps/s $\color{#35bf28}+1.69\%$
test_contiguous[memmap_tensor0] 13.3600μs 0.6644μs 1.5051 MOps/s 2.1818 MOps/s $\textbf{\color{#d91a1a}-31.01\%}$
test_stack[memmap_tensor0] 25.5400μs 4.6812μs 213.6197 KOps/s 216.3756 KOps/s $\color{#d91a1a}-1.27\%$
test_memmaptd_index 1.0431ms 0.2657ms 3.7632 KOps/s 3.7351 KOps/s $\color{#35bf28}+0.75\%$
test_memmaptd_index_astensor 0.5276ms 0.3705ms 2.6991 KOps/s 2.6682 KOps/s $\color{#35bf28}+1.16\%$
test_memmaptd_index_op 0.7497ms 0.6110ms 1.6368 KOps/s 1.6362 KOps/s $\color{#35bf28}+0.04\%$
test_serialize_model 0.3031s 0.1652s 6.0517 Ops/s 7.3706 Ops/s $\textbf{\color{#d91a1a}-17.89\%}$
test_serialize_model_pickle 2.1436s 1.4136s 0.7074 Ops/s 0.8378 Ops/s $\textbf{\color{#d91a1a}-15.56\%}$
test_serialize_weights 0.1365s 0.1344s 7.4417 Ops/s 7.4209 Ops/s $\color{#35bf28}+0.28\%$
test_serialize_weights_returnearly 0.4448s 87.8446ms 11.3837 Ops/s 6.9172 Ops/s $\textbf{\color{#35bf28}+64.57\%}$
test_serialize_weights_pickle 1.3704s 1.1983s 0.8345 Ops/s 0.8177 Ops/s $\color{#35bf28}+2.05\%$
test_reshape_pytree 0.2061ms 32.7806μs 30.5058 KOps/s 30.2417 KOps/s $\color{#35bf28}+0.87\%$
test_reshape_td 89.1520μs 46.1038μs 21.6902 KOps/s 21.2034 KOps/s $\color{#35bf28}+2.30\%$
test_view_pytree 0.2044ms 32.4517μs 30.8150 KOps/s 30.3135 KOps/s $\color{#35bf28}+1.65\%$
test_view_td 87.6120μs 54.0155μs 18.5132 KOps/s 18.3058 KOps/s $\color{#35bf28}+1.13\%$
test_unbind_pytree 0.2278ms 36.4438μs 27.4395 KOps/s 27.1070 KOps/s $\color{#35bf28}+1.23\%$
test_unbind_td 0.1274ms 50.2156μs 19.9141 KOps/s 19.1927 KOps/s $\color{#35bf28}+3.76\%$
test_split_pytree 0.2417ms 42.5328μs 23.5112 KOps/s 23.3026 KOps/s $\color{#35bf28}+0.90\%$
test_split_td 0.1540ms 63.9976μs 15.6256 KOps/s 14.8211 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_add_pytree 0.2257ms 42.0484μs 23.7821 KOps/s 23.9392 KOps/s $\color{#d91a1a}-0.66\%$
test_add_td 99.4020μs 55.4687μs 18.0282 KOps/s 17.1424 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_compile_add_one_nested[tensordict-compile] 0.1931ms 0.1396ms 7.1631 KOps/s 6.7776 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_compile_add_one_nested[tensordict-eager] 0.3058ms 0.2033ms 4.9197 KOps/s 4.9736 KOps/s $\color{#d91a1a}-1.08\%$
test_compile_add_one_nested[pytree-compile] 0.1389ms 0.1067ms 9.3741 KOps/s 8.8824 KOps/s $\textbf{\color{#35bf28}+5.54\%}$
test_compile_add_one_nested[pytree-eager] 0.4429ms 0.1772ms 5.6441 KOps/s 5.3493 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_compile_copy_nested[tensordict-compile] 0.3884ms 10.3441μs 96.6733 KOps/s 96.6014 KOps/s $\color{#35bf28}+0.07\%$
test_compile_copy_nested[tensordict-eager] 0.1124ms 54.2766μs 18.4241 KOps/s 18.2947 KOps/s $\color{#35bf28}+0.71\%$
test_compile_copy_nested[pytree-compile] 0.1341ms 10.2826μs 97.2513 KOps/s 99.6265 KOps/s $\color{#d91a1a}-2.38\%$
test_compile_copy_nested[pytree-eager] 0.4652ms 69.6925μs 14.3487 KOps/s 14.1082 KOps/s $\color{#35bf28}+1.70\%$
test_compile_add_one_flat[tensordict-compile] 0.2729ms 0.1780ms 5.6165 KOps/s 5.4252 KOps/s $\color{#35bf28}+3.53\%$
test_compile_add_one_flat[tensordict-eager] 0.3739ms 0.2809ms 3.5596 KOps/s 3.5456 KOps/s $\color{#35bf28}+0.39\%$
test_compile_add_one_flat[tensorclass-compile] 0.3369ms 0.1164ms 8.5941 KOps/s 8.3126 KOps/s $\color{#35bf28}+3.39\%$
test_compile_add_one_flat[tensorclass-eager] 0.1326ms 73.3080μs 13.6411 KOps/s 13.4278 KOps/s $\color{#35bf28}+1.59\%$
test_compile_add_one_flat[pytree-compile] 0.2495ms 0.1582ms 6.3193 KOps/s 6.0871 KOps/s $\color{#35bf28}+3.81\%$
test_compile_add_one_flat[pytree-eager] 0.7936ms 0.5096ms 1.9623 KOps/s 1.8328 KOps/s $\textbf{\color{#35bf28}+7.07\%}$
test_compile_add_self_flat[tensordict-eager] 0.3951ms 0.3340ms 2.9943 KOps/s 2.9325 KOps/s $\color{#35bf28}+2.11\%$
test_compile_add_self_flat[tensordict-compile] 0.3238ms 0.1811ms 5.5221 KOps/s 4.8610 KOps/s $\textbf{\color{#35bf28}+13.60\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1607ms 88.8590μs 11.2538 KOps/s 10.7678 KOps/s $\color{#35bf28}+4.51\%$
test_compile_add_self_flat[tensorclass-compile] 0.1720ms 0.1190ms 8.4067 KOps/s 7.8571 KOps/s $\textbf{\color{#35bf28}+7.00\%}$
test_compile_add_self_flat[pytree-eager] 0.6429ms 0.4236ms 2.3607 KOps/s 2.3348 KOps/s $\color{#35bf28}+1.11\%$
test_compile_add_self_flat[pytree-compile] 0.3159ms 0.1566ms 6.3839 KOps/s 6.1132 KOps/s $\color{#35bf28}+4.43\%$
test_compile_copy_flat[tensordict-compile] 0.1247ms 13.7758μs 72.5909 KOps/s 74.3684 KOps/s $\color{#d91a1a}-2.39\%$
test_compile_copy_flat[tensordict-eager] 99.2620μs 41.6178μs 24.0282 KOps/s 24.1222 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_copy_flat[pytree-compile] 0.1575ms 10.8676μs 92.0163 KOps/s 92.3849 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_copy_flat[pytree-eager] 0.4065ms 52.5068μs 19.0452 KOps/s 18.8458 KOps/s $\color{#35bf28}+1.06\%$
test_compile_assign_and_add[tensordict-compile] 2.0094ms 0.1748ms 5.7204 KOps/s 5.0564 KOps/s $\textbf{\color{#35bf28}+13.13\%}$
test_compile_assign_and_add[tensordict-eager] 3.5183ms 3.2790ms 304.9706 Ops/s 289.8388 Ops/s $\textbf{\color{#35bf28}+5.22\%}$
test_compile_assign_and_add[pytree-compile] 1.9810ms 0.1632ms 6.1277 KOps/s 6.0721 KOps/s $\color{#35bf28}+0.92\%$
test_compile_assign_and_add[pytree-eager] 3.0312ms 2.8417ms 351.8976 Ops/s 361.9744 Ops/s $\color{#d91a1a}-2.78\%$
test_compile_indexing[tensor-tensordict-compile] 0.1853ms 0.1104ms 9.0540 KOps/s 8.8331 KOps/s $\color{#35bf28}+2.50\%$
test_compile_indexing[tensor-tensordict-eager] 0.3111ms 73.3177μs 13.6393 KOps/s 13.6510 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2282ms 97.4562μs 10.2610 KOps/s 10.2880 KOps/s $\color{#d91a1a}-0.26\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2640ms 45.9399μs 21.7676 KOps/s 22.6711 KOps/s $\color{#d91a1a}-3.99\%$
test_compile_indexing[tensor-pytree-compile] 0.1607ms 98.1897μs 10.1844 KOps/s 10.2128 KOps/s $\color{#d91a1a}-0.28\%$
test_compile_indexing[tensor-pytree-eager] 0.2958ms 45.2773μs 22.0861 KOps/s 22.3108 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_indexing[slice-tensordict-compile] 0.2547ms 58.5180μs 17.0888 KOps/s 17.1097 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_indexing[slice-tensordict-eager] 0.2231ms 28.0417μs 35.6612 KOps/s 35.4221 KOps/s $\color{#35bf28}+0.68\%$
test_compile_indexing[slice-tensorclass-compile] 0.1244ms 46.3225μs 21.5878 KOps/s 22.2344 KOps/s $\color{#d91a1a}-2.91\%$
test_compile_indexing[slice-tensorclass-eager] 0.2536ms 22.8043μs 43.8513 KOps/s 43.6411 KOps/s $\color{#35bf28}+0.48\%$
test_compile_indexing[slice-pytree-compile] 95.5420μs 45.5510μs 21.9534 KOps/s 21.8626 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[slice-pytree-eager] 0.2720ms 22.7249μs 44.0047 KOps/s 44.0261 KOps/s $\color{#d91a1a}-0.05\%$
test_compile_indexing[int-tensordict-compile] 98.1410μs 58.3675μs 17.1328 KOps/s 16.8481 KOps/s $\color{#35bf28}+1.69\%$
test_compile_indexing[int-tensordict-eager] 0.2595ms 27.9745μs 35.7468 KOps/s 35.8679 KOps/s $\color{#d91a1a}-0.34\%$
test_compile_indexing[int-tensorclass-compile] 89.1010μs 45.5547μs 21.9516 KOps/s 20.7362 KOps/s $\textbf{\color{#35bf28}+5.86\%}$
test_compile_indexing[int-tensorclass-eager] 0.2899ms 22.6837μs 44.0845 KOps/s 44.0093 KOps/s $\color{#35bf28}+0.17\%$
test_compile_indexing[int-pytree-compile] 93.3510μs 46.0404μs 21.7200 KOps/s 20.8381 KOps/s $\color{#35bf28}+4.23\%$
test_compile_indexing[int-pytree-eager] 0.2910ms 22.5921μs 44.2633 KOps/s 44.1242 KOps/s $\color{#35bf28}+0.32\%$
test_compile_replace[single-eager] 93.9920μs 48.0192μs 20.8250 KOps/s 21.1689 KOps/s $\color{#d91a1a}-1.62\%$
test_compile_replace[single-compile] 0.1975ms 0.1060ms 9.4364 KOps/s 9.3107 KOps/s $\color{#35bf28}+1.35\%$
test_compile_replace[multi-eager] 0.6352ms 0.5589ms 1.7892 KOps/s 1.7367 KOps/s $\color{#35bf28}+3.02\%$
test_compile_replace[multi-compile] 0.2897ms 0.1112ms 8.9964 KOps/s 8.8393 KOps/s $\color{#35bf28}+1.78\%$
test_compile_tc_getattr_20[eager] 0.2233ms 0.1631ms 6.1330 KOps/s 6.0967 KOps/s $\color{#35bf28}+0.59\%$
test_compile_tc_getattr_20[compile] 0.2990ms 0.1177ms 8.4991 KOps/s 8.3184 KOps/s $\color{#35bf28}+2.17\%$
test_compile_clone_shallow[20-eager] 45.9010μs 19.5541μs 51.1402 KOps/s 51.3913 KOps/s $\color{#d91a1a}-0.49\%$
test_compile_clone_shallow[20-compile] 66.1410μs 11.5954μs 86.2411 KOps/s 87.2528 KOps/s $\color{#d91a1a}-1.16\%$
test_compile_clone_shallow[40-eager] 71.5220μs 34.4699μs 29.0108 KOps/s 29.2104 KOps/s $\color{#d91a1a}-0.68\%$
test_compile_clone_shallow[40-compile] 68.4510μs 12.8655μs 77.7271 KOps/s 75.4515 KOps/s $\color{#35bf28}+3.02\%$
test_compile_clone_shallow[80-eager] 0.1059ms 63.3198μs 15.7929 KOps/s 15.8062 KOps/s $\color{#d91a1a}-0.08\%$
test_compile_clone_shallow[80-compile] 62.7520μs 15.1719μs 65.9115 KOps/s 64.6836 KOps/s $\color{#35bf28}+1.90\%$
test_compile_update_inplace[eager] 0.1007ms 58.8271μs 16.9990 KOps/s 16.8574 KOps/s $\color{#35bf28}+0.84\%$
test_compile_update_inplace[compile] 0.3170ms 0.1401ms 7.1391 KOps/s 7.0642 KOps/s $\color{#35bf28}+1.06\%$
test_mod_add[eager] 91.4420μs 50.0761μs 19.9696 KOps/s 19.9761 KOps/s $\color{#d91a1a}-0.03\%$
test_mod_add[compile] 0.4735ms 0.1059ms 9.4440 KOps/s 9.0342 KOps/s $\color{#35bf28}+4.54\%$
test_mod_add[compile-overhead] 0.4730ms 0.1489ms 6.7166 KOps/s 6.5241 KOps/s $\color{#35bf28}+2.95\%$
test_mod_wrap[eager] 0.3797ms 0.2862ms 3.4945 KOps/s 3.4306 KOps/s $\color{#35bf28}+1.86\%$
test_mod_wrap[compile] 0.4558ms 0.3475ms 2.8773 KOps/s 2.8572 KOps/s $\color{#35bf28}+0.71\%$
test_mod_wrap[compile-overhead] 7.3346ms 4.0398ms 247.5372 Ops/s 251.1565 Ops/s $\color{#d91a1a}-1.44\%$
test_mod_wrap_and_backward[eager] 1.9904ms 1.4907ms 670.8384 Ops/s 671.9618 Ops/s $\color{#d91a1a}-0.17\%$
test_mod_wrap_and_backward[compile] 1.6236ms 1.4443ms 692.3786 Ops/s 692.7247 Ops/s $\color{#d91a1a}-0.05\%$
test_mod_wrap_and_backward[compile-overhead] 1.2351ms 0.8839ms 1.1314 KOps/s 1.1047 KOps/s $\color{#35bf28}+2.42\%$
test_seq_add[eager] 0.7298ms 0.1523ms 6.5644 KOps/s 6.3080 KOps/s $\color{#35bf28}+4.06\%$
test_seq_add[compile] 0.5539ms 0.1138ms 8.7896 KOps/s 8.4746 KOps/s $\color{#35bf28}+3.72\%$
test_seq_add[compile-overhead] 0.6142ms 0.1527ms 6.5475 KOps/s 6.2353 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_seq_wrap[eager] 0.9740ms 0.5167ms 1.9354 KOps/s 1.8502 KOps/s $\color{#35bf28}+4.60\%$
test_seq_wrap[compile] 0.8096ms 0.3660ms 2.7320 KOps/s 2.6809 KOps/s $\color{#35bf28}+1.91\%$
test_seq_wrap[compile-overhead] 0.7053ms 0.2641ms 3.7866 KOps/s 3.7392 KOps/s $\color{#35bf28}+1.27\%$
test_func_call_runtime[False-eager] 1.2940ms 0.8258ms 1.2109 KOps/s 1.2169 KOps/s $\color{#d91a1a}-0.49\%$
test_func_call_runtime[False-compile] 1.3402ms 0.9080ms 1.1013 KOps/s 1.1031 KOps/s $\color{#d91a1a}-0.16\%$
test_func_call_runtime[False-compile-overhead] 0.9085ms 0.4594ms 2.1770 KOps/s 2.1389 KOps/s $\color{#35bf28}+1.78\%$
test_func_call_runtime[True-eager] 1.4897ms 1.0621ms 941.5373 Ops/s 916.3803 Ops/s $\color{#35bf28}+2.75\%$
test_func_call_runtime[True-compile] 1.4889ms 0.9199ms 1.0871 KOps/s 1.0840 KOps/s $\color{#35bf28}+0.28\%$
test_func_call_runtime[True-compile-overhead] 0.9303ms 0.4747ms 2.1064 KOps/s 2.0805 KOps/s $\color{#35bf28}+1.25\%$
test_func_call_cm_runtime[False-eager] 1.2711ms 0.8409ms 1.1891 KOps/s 1.1542 KOps/s $\color{#35bf28}+3.03\%$
test_func_call_cm_runtime[False-compile] 1.3815ms 0.9141ms 1.0939 KOps/s 1.0935 KOps/s $\color{#35bf28}+0.04\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6263ms 0.4646ms 2.1525 KOps/s 2.1305 KOps/s $\color{#35bf28}+1.04\%$
test_func_call_cm_runtime[True-eager] 1.2907ms 1.2084ms 827.5217 Ops/s 813.8276 Ops/s $\color{#35bf28}+1.68\%$
test_func_call_cm_runtime[True-compile] 1.0275ms 0.9516ms 1.0508 KOps/s 1.0430 KOps/s $\color{#35bf28}+0.75\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5913ms 0.5096ms 1.9622 KOps/s 1.9415 KOps/s $\color{#35bf28}+1.06\%$
test_vmap_func_call_cm_runtime[eager] 2.8129ms 2.3395ms 427.4490 Ops/s 422.1919 Ops/s $\color{#35bf28}+1.25\%$
test_vmap_func_call_cm_runtime[compile] 1.0588ms 0.9666ms 1.0346 KOps/s 1.0213 KOps/s $\color{#35bf28}+1.30\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5914ms 0.5143ms 1.9443 KOps/s 1.9154 KOps/s $\color{#35bf28}+1.51\%$
test_distributed 0.5956ms 0.1528ms 6.5426 KOps/s 6.4816 KOps/s $\color{#35bf28}+0.94\%$
test_tdmodule 46.6500μs 27.0571μs 36.9588 KOps/s 35.3447 KOps/s $\color{#35bf28}+4.57\%$
test_tdmodule_dispatch 76.6810μs 47.0119μs 21.2712 KOps/s 21.7871 KOps/s $\color{#d91a1a}-2.37\%$
test_tdseq 51.7300μs 26.6298μs 37.5519 KOps/s 37.1272 KOps/s $\color{#35bf28}+1.14\%$
test_tdseq_dispatch 73.3110μs 47.0368μs 21.2600 KOps/s 20.9612 KOps/s $\color{#35bf28}+1.43\%$
test_instantiation_functorch 2.1856ms 2.0787ms 481.0700 Ops/s 480.3541 Ops/s $\color{#35bf28}+0.15\%$
test_exec_functorch 0.2627ms 0.1766ms 5.6628 KOps/s 5.6086 KOps/s $\color{#35bf28}+0.97\%$
test_exec_functional_call 0.2227ms 0.1586ms 6.3065 KOps/s 6.3584 KOps/s $\color{#d91a1a}-0.82\%$
test_exec_td_decorator 0.4576ms 0.2415ms 4.1406 KOps/s 4.2647 KOps/s $\color{#d91a1a}-2.91\%$
test_vmap_mlp_speed_decorator[True-True] 1.0094ms 0.8235ms 1.2143 KOps/s 1.2149 KOps/s $\color{#d91a1a}-0.05\%$
test_vmap_mlp_speed_decorator[True-False] 1.0090ms 0.8194ms 1.2204 KOps/s 1.2114 KOps/s $\color{#35bf28}+0.74\%$
test_vmap_mlp_speed_decorator[False-True] 0.8928ms 0.7012ms 1.4261 KOps/s 1.3992 KOps/s $\color{#35bf28}+1.92\%$
test_vmap_mlp_speed_decorator[False-False] 0.8843ms 0.7064ms 1.4156 KOps/s 1.4046 KOps/s $\color{#35bf28}+0.78\%$
test_vmap_transformer_speed_decorator[True-True] 21.0082ms 20.2262ms 49.4409 Ops/s 49.0093 Ops/s $\color{#35bf28}+0.88\%$
test_vmap_transformer_speed_decorator[True-False] 20.8568ms 20.2145ms 49.4695 Ops/s 48.9888 Ops/s $\color{#35bf28}+0.98\%$
test_vmap_transformer_speed_decorator[False-True] 20.2682ms 20.0001ms 49.9998 Ops/s 49.5342 Ops/s $\color{#35bf28}+0.94\%$
test_vmap_transformer_speed_decorator[False-False] 20.9755ms 20.1473ms 49.6345 Ops/s 49.4848 Ops/s $\color{#35bf28}+0.30\%$
test_to_module_speed[True] 1.6217ms 1.4770ms 677.0316 Ops/s 669.4962 Ops/s $\color{#35bf28}+1.13\%$
test_to_module_speed[False] 1.5810ms 1.4573ms 686.2185 Ops/s 704.1430 Ops/s $\color{#d91a1a}-2.55\%$
test_tc_init 66.1910μs 44.6124μs 22.4153 KOps/s 22.1826 KOps/s $\color{#35bf28}+1.05\%$
test_tc_init_tensor_only 39.6410μs 9.8733μs 101.2833 KOps/s 102.0890 KOps/s $\color{#d91a1a}-0.79\%$
test_tc_init_nested 0.1320ms 89.3917μs 11.1867 KOps/s 11.0895 KOps/s $\color{#35bf28}+0.88\%$
test_tc_init_many_fields 54.0710μs 16.7253μs 59.7895 KOps/s 60.5583 KOps/s $\color{#d91a1a}-1.27\%$
test_tc_first_layer_tensor 33.4510μs 1.8344μs 545.1400 KOps/s 547.7682 KOps/s $\color{#d91a1a}-0.48\%$
test_tc_first_layer_tensor_only 4.8631μs 0.4070μs 2.4571 MOps/s 2.5394 MOps/s $\color{#d91a1a}-3.24\%$
test_tc_first_layer_tensor_set 38.5300μs 3.9005μs 256.3745 KOps/s 252.7019 KOps/s $\color{#35bf28}+1.45\%$
test_tc_first_layer_tensor_only_set 21.6510μs 3.2773μs 305.1337 KOps/s 279.0046 KOps/s $\textbf{\color{#35bf28}+9.37\%}$
test_tc_first_layer_nontensor 25.7700μs 6.1836μs 161.7168 KOps/s 160.9177 KOps/s $\color{#35bf28}+0.50\%$
test_tc_second_layer_tensor 27.3200μs 4.4349μs 225.4858 KOps/s 220.9134 KOps/s $\color{#35bf28}+2.07\%$
test_tc_second_layer_nontensor 39.7700μs 8.7249μs 114.6140 KOps/s 112.8631 KOps/s $\color{#35bf28}+1.55\%$
test_unbind 0.2502s 16.6085ms 60.2100 Ops/s 54.9548 Ops/s $\textbf{\color{#35bf28}+9.56\%}$
test_full_like 17.4438ms 16.6885ms 59.9215 Ops/s 60.0561 Ops/s $\color{#d91a1a}-0.22\%$
test_zeros_like 16.9464ms 16.6053ms 60.2217 Ops/s 60.1109 Ops/s $\color{#35bf28}+0.18\%$
test_ones_like 17.7853ms 16.6414ms 60.0913 Ops/s 60.1619 Ops/s $\color{#d91a1a}-0.12\%$
test_clone 17.8815ms 17.5404ms 57.0113 Ops/s 57.2243 Ops/s $\color{#d91a1a}-0.37\%$
test_squeeze 89.4810μs 14.3097μs 69.8829 KOps/s 69.5865 KOps/s $\color{#35bf28}+0.43\%$
test_unsqueeze 0.1660ms 0.1114ms 8.9746 KOps/s 8.7756 KOps/s $\color{#35bf28}+2.27\%$
test_split 0.3486ms 0.1866ms 5.3587 KOps/s 5.4191 KOps/s $\color{#d91a1a}-1.11\%$
test_permute 0.2684ms 0.2056ms 4.8647 KOps/s 4.9185 KOps/s $\color{#d91a1a}-1.09\%$
test_stack 51.3004ms 50.8306ms 19.6732 Ops/s 19.4941 Ops/s $\color{#35bf28}+0.92\%$
test_cat 51.6318ms 50.9089ms 19.6429 Ops/s 19.6603 Ops/s $\color{#d91a1a}-0.09\%$
test_sequential_tensordict 0.2737ms 0.2181ms 4.5849 KOps/s 4.6541 KOps/s $\color{#d91a1a}-1.49\%$
test_sequential_graph_module 0.2551ms 0.1181ms 8.4685 KOps/s 8.5767 KOps/s $\color{#d91a1a}-1.26\%$
test_nested_tensordict 0.3827ms 0.2938ms 3.4033 KOps/s 3.4880 KOps/s $\color{#d91a1a}-2.43\%$
test_nested_graph_module 0.1933ms 0.1325ms 7.5499 KOps/s 7.7694 KOps/s $\color{#d91a1a}-2.83\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 10, 2026
- dtensor_transfer_plan_test.py: CPU-only test for shard algebra
  and transfer plan computation (no GPUs needed)
- dtensor_transfer_distributed_test.py: Multi-GPU test for strategies
  A and B using torchrun with real DTensors on NCCL
- minimal_p2p_test.py: Minimal NCCL P2P test for JSON metadata
  serialization over CUDA byte tensors

Made-with: Cursor
ghstack-source-id: 04408b0
Pull-Request: #1647
Made-with: Cursor
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

1 similar comment
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

[ghstack-poisoned]
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

1 similar comment
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

[ghstack-poisoned]
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

[ghstack-poisoned]
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

1 similar comment
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

[ghstack-poisoned]
@github-actions
Copy link
Contributor

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add example scripts for cross-mesh DTensor transfer

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant