Skip to content

[DTensor] Add transfer plan computation for cross-mesh DTensor redistribution#1639

Open
vmoens wants to merge 1 commit intogh/vmoens/80/basefrom
gh/vmoens/80/head
Open

[DTensor] Add transfer plan computation for cross-mesh DTensor redistribution#1639
vmoens wants to merge 1 commit intogh/vmoens/80/basefrom
gh/vmoens/80/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 6, 2026

Stack from ghstack (oldest at bottom):

Add tensordict/_dtensor.py with:

  • Shard algebra: compute per-rank local slices from mesh shape + placements
  • Transfer plan computation: given src/dst mesh+placements, compute the
    minimal set of P2P transfers (which byte ranges go from which src rank
    to which dst rank)
  • Transport abstraction: _TransportBackend protocol with
    _TorchDistributedBackend and _UCXXBackend implementations

The transfer plan is pure computation (no GPU, no distributed runtime)
and can be tested in isolation. It supports Shard, Replicate, and Partial
placements on arbitrary n-D meshes, uneven sharding, and custom rank maps.

All new classes are private (underscore-prefixed).

Made-with: Cursor

[ghstack-poisoned]
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add transfer plan computation for cross-mesh DTensor redistribution

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add transfer plan computation for cross-mesh DTensor redistribution

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}20$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 40.5100μs 15.3135μs 65.3018 KOps/s 66.6704 KOps/s $\color{#d91a1a}-2.05\%$
test_plain_set_stack_nested 37.0400μs 15.1823μs 65.8661 KOps/s 66.1368 KOps/s $\color{#d91a1a}-0.41\%$
test_plain_set_nested_inplace 44.0410μs 16.6865μs 59.9287 KOps/s 59.9007 KOps/s $\color{#35bf28}+0.05\%$
test_plain_set_stack_nested_inplace 42.1510μs 16.5489μs 60.4270 KOps/s 59.7537 KOps/s $\color{#35bf28}+1.13\%$
test_items 38.4900μs 6.1109μs 163.6408 KOps/s 165.9487 KOps/s $\color{#d91a1a}-1.39\%$
test_items_nested 0.5131ms 0.4636ms 2.1568 KOps/s 2.1567 KOps/s $+0.00\%$
test_items_nested_locked 0.5155ms 0.4675ms 2.1393 KOps/s 2.1354 KOps/s $\color{#35bf28}+0.18\%$
test_items_nested_leaf 0.1661ms 96.7166μs 10.3395 KOps/s 10.1746 KOps/s $\color{#35bf28}+1.62\%$
test_items_stack_nested 0.5120ms 0.4674ms 2.1397 KOps/s 2.1545 KOps/s $\color{#d91a1a}-0.69\%$
test_items_stack_nested_leaf 0.1355ms 98.6486μs 10.1370 KOps/s 10.0290 KOps/s $\color{#35bf28}+1.08\%$
test_items_stack_nested_locked 0.5117ms 0.4687ms 2.1335 KOps/s 2.1468 KOps/s $\color{#d91a1a}-0.62\%$
test_keys 36.4610μs 4.2331μs 236.2341 KOps/s 234.7756 KOps/s $\color{#35bf28}+0.62\%$
test_keys_nested 0.1964ms 0.1296ms 7.7185 KOps/s 7.7226 KOps/s $\color{#d91a1a}-0.05\%$
test_keys_nested_locked 0.7704ms 0.1397ms 7.1581 KOps/s 7.2127 KOps/s $\color{#d91a1a}-0.76\%$
test_keys_nested_leaf 0.1654ms 0.1217ms 8.2157 KOps/s 8.2975 KOps/s $\color{#d91a1a}-0.99\%$
test_keys_stack_nested 0.1769ms 0.1309ms 7.6422 KOps/s 7.6467 KOps/s $\color{#d91a1a}-0.06\%$
test_keys_stack_nested_leaf 0.1556ms 0.1213ms 8.2449 KOps/s 8.2653 KOps/s $\color{#d91a1a}-0.25\%$
test_keys_stack_nested_locked 0.1834ms 0.1399ms 7.1489 KOps/s 7.2519 KOps/s $\color{#d91a1a}-1.42\%$
test_values 11.3022μs 1.0240μs 976.5860 KOps/s 975.2531 KOps/s $\color{#35bf28}+0.14\%$
test_values_nested 83.0610μs 52.7437μs 18.9596 KOps/s 19.0916 KOps/s $\color{#d91a1a}-0.69\%$
test_values_nested_locked 94.1610μs 56.3215μs 17.7552 KOps/s 17.8145 KOps/s $\color{#d91a1a}-0.33\%$
test_values_nested_leaf 96.5820μs 60.6813μs 16.4795 KOps/s 17.0871 KOps/s $\color{#d91a1a}-3.56\%$
test_values_stack_nested 88.0610μs 52.9292μs 18.8932 KOps/s 18.7569 KOps/s $\color{#35bf28}+0.73\%$
test_values_stack_nested_leaf 93.4920μs 60.6546μs 16.4868 KOps/s 16.5513 KOps/s $\color{#d91a1a}-0.39\%$
test_values_stack_nested_locked 87.2910μs 56.4233μs 17.7232 KOps/s 17.8506 KOps/s $\color{#d91a1a}-0.71\%$
test_membership 11.4202μs 0.8372μs 1.1945 MOps/s 1.1639 MOps/s $\color{#35bf28}+2.62\%$
test_membership_nested 30.2100μs 2.8838μs 346.7629 KOps/s 344.6520 KOps/s $\color{#35bf28}+0.61\%$
test_membership_nested_leaf 25.4910μs 2.8968μs 345.2087 KOps/s 341.1969 KOps/s $\color{#35bf28}+1.18\%$
test_membership_stacked_nested 36.3910μs 2.8713μs 348.2736 KOps/s 347.8237 KOps/s $\color{#35bf28}+0.13\%$
test_membership_stacked_nested_leaf 22.2300μs 2.8790μs 347.3448 KOps/s 348.6078 KOps/s $\color{#d91a1a}-0.36\%$
test_membership_nested_last 29.6910μs 4.3460μs 230.0944 KOps/s 228.8973 KOps/s $\color{#35bf28}+0.52\%$
test_membership_nested_leaf_last 30.6700μs 4.3463μs 230.0814 KOps/s 227.5173 KOps/s $\color{#35bf28}+1.13\%$
test_membership_stacked_nested_last 71.0810μs 4.3516μs 229.8011 KOps/s 227.8177 KOps/s $\color{#35bf28}+0.87\%$
test_membership_stacked_nested_leaf_last 31.7900μs 4.3232μs 231.3092 KOps/s 230.6605 KOps/s $\color{#35bf28}+0.28\%$
test_nested_getleaf 51.1410μs 21.6296μs 46.2330 KOps/s 45.8269 KOps/s $\color{#35bf28}+0.89\%$
test_nested_get 58.2610μs 20.7229μs 48.2557 KOps/s 49.1516 KOps/s $\color{#d91a1a}-1.82\%$
test_stacked_getleaf 47.4010μs 21.5452μs 46.4141 KOps/s 45.5110 KOps/s $\color{#35bf28}+1.98\%$
test_stacked_get 57.5610μs 20.5948μs 48.5560 KOps/s 47.8800 KOps/s $\color{#35bf28}+1.41\%$
test_nested_getitemleaf 89.6220μs 22.0759μs 45.2983 KOps/s 45.1702 KOps/s $\color{#35bf28}+0.28\%$
test_nested_getitem 45.3810μs 21.0461μs 47.5147 KOps/s 47.0940 KOps/s $\color{#35bf28}+0.89\%$
test_stacked_getitemleaf 47.8200μs 22.0222μs 45.4087 KOps/s 44.4461 KOps/s $\color{#35bf28}+2.17\%$
test_stacked_getitem 48.8010μs 21.0484μs 47.5096 KOps/s 46.5242 KOps/s $\color{#35bf28}+2.12\%$
test_lock_nested 7.8906ms 0.4856ms 2.0595 KOps/s 2.0952 KOps/s $\color{#d91a1a}-1.71\%$
test_lock_stack_nested 0.5357ms 0.4798ms 2.0842 KOps/s 2.0583 KOps/s $\color{#35bf28}+1.26\%$
test_unlock_nested 0.4606ms 0.3885ms 2.5739 KOps/s 2.5737 KOps/s $+0.01\%$
test_unlock_stack_nested 0.4593ms 0.3900ms 2.5638 KOps/s 2.5179 KOps/s $\color{#35bf28}+1.83\%$
test_flatten_speed 0.1729ms 0.1213ms 8.2414 KOps/s 8.1155 KOps/s $\color{#35bf28}+1.55\%$
test_unflatten_speed 0.6417ms 0.5774ms 1.7319 KOps/s 1.7350 KOps/s $\color{#d91a1a}-0.18\%$
test_common_ops 0.9198ms 0.7015ms 1.4255 KOps/s 1.4382 KOps/s $\color{#d91a1a}-0.88\%$
test_creation 90.0210μs 3.1642μs 316.0374 KOps/s 318.5614 KOps/s $\color{#d91a1a}-0.79\%$
test_creation_empty 51.4610μs 6.9460μs 143.9673 KOps/s 141.6286 KOps/s $\color{#35bf28}+1.65\%$
test_creation_nested_1 49.0510μs 11.5697μs 86.4325 KOps/s 86.5407 KOps/s $\color{#d91a1a}-0.12\%$
test_creation_nested_2 46.6510μs 13.3625μs 74.8364 KOps/s 74.7257 KOps/s $\color{#35bf28}+0.15\%$
test_creation_many_keys[10] 54.0910μs 20.9429μs 47.7490 KOps/s 47.0981 KOps/s $\color{#35bf28}+1.38\%$
test_creation_many_keys[50] 0.1253ms 89.2396μs 11.2058 KOps/s 10.9006 KOps/s $\color{#35bf28}+2.80\%$
test_creation_many_keys[100] 0.2831ms 0.1763ms 5.6725 KOps/s 5.5082 KOps/s $\color{#35bf28}+2.98\%$
test_creation_nested_many_keys[10] 73.2210μs 45.3257μs 22.0625 KOps/s 21.7019 KOps/s $\color{#35bf28}+1.66\%$
test_creation_nested_many_keys[50] 0.2274ms 0.1831ms 5.4623 KOps/s 5.4804 KOps/s $\color{#d91a1a}-0.33\%$
test_clone 95.7920μs 13.5147μs 73.9933 KOps/s 73.9038 KOps/s $\color{#35bf28}+0.12\%$
test_getitem[int] 1.5267ms 15.1437μs 66.0342 KOps/s 59.4539 KOps/s $\textbf{\color{#35bf28}+11.07\%}$
test_getitem[slice_int] 0.1332ms 24.1946μs 41.3316 KOps/s 41.3898 KOps/s $\color{#d91a1a}-0.14\%$
test_getitem[range] 0.1765ms 63.8715μs 15.6564 KOps/s 15.6007 KOps/s $\color{#35bf28}+0.36\%$
test_getitem[tuple] 0.1398ms 24.1541μs 41.4008 KOps/s 42.0665 KOps/s $\color{#d91a1a}-1.58\%$
test_getitem[list] 0.1786ms 58.6637μs 17.0463 KOps/s 16.6128 KOps/s $\color{#35bf28}+2.61\%$
test_setitem_dim[int] 46.8710μs 26.4326μs 37.8320 KOps/s 37.0551 KOps/s $\color{#35bf28}+2.10\%$
test_setitem_dim[slice_int] 65.5210μs 43.1571μs 23.1711 KOps/s 22.1555 KOps/s $\color{#35bf28}+4.58\%$
test_setitem_dim[range] 0.1196ms 95.6660μs 10.4530 KOps/s 9.9936 KOps/s $\color{#35bf28}+4.60\%$
test_setitem_dim[tuple] 69.8410μs 40.5963μs 24.6328 KOps/s 24.2068 KOps/s $\color{#35bf28}+1.76\%$
test_setitem 54.7410μs 18.0037μs 55.5442 KOps/s 55.2244 KOps/s $\color{#35bf28}+0.58\%$
test_set 51.8710μs 17.2381μs 58.0110 KOps/s 58.5751 KOps/s $\color{#d91a1a}-0.96\%$
test_set_shared 0.6216ms 0.2130ms 4.6959 KOps/s 4.8817 KOps/s $\color{#d91a1a}-3.81\%$
test_update 0.2031ms 21.7312μs 46.0168 KOps/s 44.9664 KOps/s $\color{#35bf28}+2.34\%$
test_update_nested 68.6210μs 33.1134μs 30.1993 KOps/s 30.0251 KOps/s $\color{#35bf28}+0.58\%$
test_update__nested 0.4477ms 34.1817μs 29.2554 KOps/s 28.9103 KOps/s $\color{#35bf28}+1.19\%$
test_set_nested 60.7110μs 20.3960μs 49.0291 KOps/s 51.5897 KOps/s $\color{#d91a1a}-4.96\%$
test_set_nested_new 58.2110μs 26.0347μs 38.4103 KOps/s 41.6554 KOps/s $\textbf{\color{#d91a1a}-7.79\%}$
test_select 75.1310μs 43.2440μs 23.1246 KOps/s 24.4331 KOps/s $\textbf{\color{#d91a1a}-5.36\%}$
test_select_nested 0.1050ms 74.9398μs 13.3441 KOps/s 13.5116 KOps/s $\color{#d91a1a}-1.24\%$
test_exclude_nested 0.1314ms 91.9688μs 10.8733 KOps/s 10.9017 KOps/s $\color{#d91a1a}-0.26\%$
test_empty[True] 0.4634ms 0.3995ms 2.5029 KOps/s 2.5096 KOps/s $\color{#d91a1a}-0.27\%$
test_empty[False] 7.8475μs 1.3270μs 753.5974 KOps/s 768.0517 KOps/s $\color{#d91a1a}-1.88\%$
test_to 0.1024ms 71.7361μs 13.9400 KOps/s 13.4951 KOps/s $\color{#35bf28}+3.30\%$
test_to_nonblocking 0.1222ms 65.4454μs 15.2799 KOps/s 15.2487 KOps/s $\color{#35bf28}+0.20\%$
test_unbind_speed 0.3731ms 0.3338ms 2.9956 KOps/s 2.9876 KOps/s $\color{#35bf28}+0.27\%$
test_unbind_speed_stack0 0.4247ms 0.3290ms 3.0398 KOps/s 3.0080 KOps/s $\color{#35bf28}+1.06\%$
test_unbind_speed_stack1 0.1035s 0.8384ms 1.1927 KOps/s 1.1673 KOps/s $\color{#35bf28}+2.17\%$
test_split 0.1035s 1.2668ms 789.4121 Ops/s 782.4156 Ops/s $\color{#35bf28}+0.89\%$
test_chunk 0.1032s 1.2125ms 824.7189 Ops/s 918.6403 Ops/s $\textbf{\color{#d91a1a}-10.22\%}$
test_to_cpu_blocking 19.8678ms 19.7660ms 50.5919 Ops/s 34.1241 Ops/s $\textbf{\color{#35bf28}+48.26\%}$
test_to_cpu_global_sync 11.7691ms 11.6839ms 85.5879 Ops/s 76.8206 Ops/s $\textbf{\color{#35bf28}+11.41\%}$
test_to_cpu_event_sync 13.4181ms 12.6749ms 78.8960 Ops/s 78.7987 Ops/s $\color{#35bf28}+0.12\%$
test_to_cpu_default 0.1161s 14.0025ms 71.4159 Ops/s 78.5336 Ops/s $\textbf{\color{#d91a1a}-9.06\%}$
test_consolidate[False-None] 4.2335ms 4.1546ms 240.6945 Ops/s 214.6916 Ops/s $\textbf{\color{#35bf28}+12.11\%}$
test_consolidate[default-None] 2.6717ms 2.0491ms 488.0300 Ops/s 479.1963 Ops/s $\color{#35bf28}+1.84\%$
test_consolidate[reduce-overhead-None] 2.0520ms 1.9743ms 506.5149 Ops/s 504.8202 Ops/s $\color{#35bf28}+0.34\%$
test_consolidate_njt[False-None] 0.1913s 10.1671ms 98.3560 Ops/s 116.7773 Ops/s $\textbf{\color{#d91a1a}-15.77\%}$
test_to[False-False-None] 2.2287ms 2.1288ms 469.7501 Ops/s 465.5847 Ops/s $\color{#35bf28}+0.89\%$
test_to[True-False-None] 2.1483ms 1.9124ms 522.9123 Ops/s 519.6354 Ops/s $\color{#35bf28}+0.63\%$
test_to[within-False-None] 6.3868ms 6.1781ms 161.8616 Ops/s 162.6492 Ops/s $\color{#d91a1a}-0.48\%$
test_to[True-default-None] 9.0233ms 8.8430ms 113.0842 Ops/s 109.1644 Ops/s $\color{#35bf28}+3.59\%$
test_to_njt[False-False-None] 8.8336ms 8.5131ms 117.4664 Ops/s 116.1406 Ops/s $\color{#35bf28}+1.14\%$
test_to_njt[True-False-None] 7.1127ms 6.9275ms 144.3515 Ops/s 141.6555 Ops/s $\color{#35bf28}+1.90\%$
test_to_njt[within-False-None] 15.8162ms 15.6220ms 64.0124 Ops/s 62.8856 Ops/s $\color{#35bf28}+1.79\%$
test_creation[device0] 0.4486ms 0.1167ms 8.5680 KOps/s 8.6259 KOps/s $\color{#d91a1a}-0.67\%$
test_creation_from_tensor 0.5865ms 0.1142ms 8.7602 KOps/s 8.6896 KOps/s $\color{#35bf28}+0.81\%$
test_add_one[memmap_tensor0] 0.3086ms 6.6452μs 150.4854 KOps/s 146.8191 KOps/s $\color{#35bf28}+2.50\%$
test_contiguous[memmap_tensor0] 26.3900μs 0.6691μs 1.4945 MOps/s 2.1624 MOps/s $\textbf{\color{#d91a1a}-30.89\%}$
test_stack[memmap_tensor0] 32.7000μs 4.6024μs 217.2789 KOps/s 219.1664 KOps/s $\color{#d91a1a}-0.86\%$
test_memmaptd_index 1.0726ms 0.2649ms 3.7755 KOps/s 3.7607 KOps/s $\color{#35bf28}+0.39\%$
test_memmaptd_index_astensor 0.5351ms 0.3686ms 2.7128 KOps/s 2.7020 KOps/s $\color{#35bf28}+0.40\%$
test_memmaptd_index_op 0.1639s 0.7315ms 1.3671 KOps/s 1.5870 KOps/s $\textbf{\color{#d91a1a}-13.86\%}$
test_serialize_model 0.1394s 0.1371s 7.2957 Ops/s 7.2560 Ops/s $\color{#35bf28}+0.55\%$
test_serialize_model_pickle 1.3973s 1.2125s 0.8247 Ops/s 0.8249 Ops/s $\color{#d91a1a}-0.02\%$
test_serialize_weights 0.1375s 0.1359s 7.3585 Ops/s 7.3159 Ops/s $\color{#35bf28}+0.58\%$
test_serialize_weights_returnearly 0.2825s 94.1937ms 10.6164 Ops/s 14.4578 Ops/s $\textbf{\color{#d91a1a}-26.57\%}$
test_serialize_weights_pickle 1.3687s 1.2137s 0.8239 Ops/s 0.8236 Ops/s $\color{#35bf28}+0.04\%$
test_reshape_pytree 0.2005ms 32.9318μs 30.3658 KOps/s 30.1674 KOps/s $\color{#35bf28}+0.66\%$
test_reshape_td 89.1320μs 45.2927μs 22.0786 KOps/s 22.0490 KOps/s $\color{#35bf28}+0.13\%$
test_view_pytree 0.2194ms 32.5197μs 30.7506 KOps/s 30.3671 KOps/s $\color{#35bf28}+1.26\%$
test_view_td 0.1118ms 53.1348μs 18.8200 KOps/s 18.3961 KOps/s $\color{#35bf28}+2.30\%$
test_unbind_pytree 0.2346ms 36.5464μs 27.3625 KOps/s 26.8394 KOps/s $\color{#35bf28}+1.95\%$
test_unbind_td 0.1909ms 49.7230μs 20.1114 KOps/s 19.9336 KOps/s $\color{#35bf28}+0.89\%$
test_split_pytree 0.2483ms 42.7336μs 23.4008 KOps/s 23.2711 KOps/s $\color{#35bf28}+0.56\%$
test_split_td 0.2188ms 63.6834μs 15.7027 KOps/s 15.5271 KOps/s $\color{#35bf28}+1.13\%$
test_add_pytree 0.2045ms 42.5387μs 23.5080 KOps/s 23.1369 KOps/s $\color{#35bf28}+1.60\%$
test_add_td 0.1176ms 55.3437μs 18.0689 KOps/s 17.8144 KOps/s $\color{#35bf28}+1.43\%$
test_compile_add_one_nested[tensordict-compile] 0.1924ms 0.1401ms 7.1368 KOps/s 6.8622 KOps/s $\color{#35bf28}+4.00\%$
test_compile_add_one_nested[tensordict-eager] 0.2816ms 0.2025ms 4.9384 KOps/s 4.9537 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_add_one_nested[pytree-compile] 0.1421ms 0.1074ms 9.3133 KOps/s 8.8475 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_compile_add_one_nested[pytree-eager] 0.4575ms 0.1841ms 5.4311 KOps/s 5.3750 KOps/s $\color{#35bf28}+1.05\%$
test_compile_copy_nested[tensordict-compile] 0.3024ms 9.8977μs 101.0339 KOps/s 94.6967 KOps/s $\textbf{\color{#35bf28}+6.69\%}$
test_compile_copy_nested[tensordict-eager] 0.5811ms 53.3188μs 18.7551 KOps/s 18.3925 KOps/s $\color{#35bf28}+1.97\%$
test_compile_copy_nested[pytree-compile] 0.1393ms 9.8219μs 101.8129 KOps/s 102.0449 KOps/s $\color{#d91a1a}-0.23\%$
test_compile_copy_nested[pytree-eager] 0.4914ms 69.3723μs 14.4150 KOps/s 14.0529 KOps/s $\color{#35bf28}+2.58\%$
test_compile_add_one_flat[tensordict-compile] 0.2376ms 0.1770ms 5.6507 KOps/s 3.3181 KOps/s $\textbf{\color{#35bf28}+70.30\%}$
test_compile_add_one_flat[tensordict-eager] 0.3988ms 0.2820ms 3.5464 KOps/s 3.4817 KOps/s $\color{#35bf28}+1.86\%$
test_compile_add_one_flat[tensorclass-compile] 0.2138ms 0.1174ms 8.5144 KOps/s 7.9935 KOps/s $\textbf{\color{#35bf28}+6.52\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1159ms 73.5535μs 13.5955 KOps/s 13.7048 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_add_one_flat[pytree-compile] 0.2050ms 0.1570ms 6.3685 KOps/s 6.0588 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_compile_add_one_flat[pytree-eager] 0.8087ms 0.5387ms 1.8563 KOps/s 1.8497 KOps/s $\color{#35bf28}+0.35\%$
test_compile_add_self_flat[tensordict-eager] 0.4379ms 0.3344ms 2.9904 KOps/s 2.9294 KOps/s $\color{#35bf28}+2.08\%$
test_compile_add_self_flat[tensordict-compile] 0.2322ms 0.1786ms 5.5994 KOps/s 5.0613 KOps/s $\textbf{\color{#35bf28}+10.63\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1805ms 90.2627μs 11.0788 KOps/s 11.1964 KOps/s $\color{#d91a1a}-1.05\%$
test_compile_add_self_flat[tensorclass-compile] 0.6740ms 0.1211ms 8.2573 KOps/s 7.7812 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_compile_add_self_flat[pytree-eager] 0.8842ms 0.4464ms 2.2401 KOps/s 2.1966 KOps/s $\color{#35bf28}+1.98\%$
test_compile_add_self_flat[pytree-compile] 0.1906ms 0.1573ms 6.3591 KOps/s 6.0389 KOps/s $\textbf{\color{#35bf28}+5.30\%}$
test_compile_copy_flat[tensordict-compile] 0.1201ms 13.6590μs 73.2117 KOps/s 73.3661 KOps/s $\color{#d91a1a}-0.21\%$
test_compile_copy_flat[tensordict-eager] 0.5035ms 42.2612μs 23.6624 KOps/s 23.7952 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_copy_flat[pytree-compile] 0.4519ms 10.8009μs 92.5848 KOps/s 91.8209 KOps/s $\color{#35bf28}+0.83\%$
test_compile_copy_flat[pytree-eager] 0.5001ms 53.0541μs 18.8487 KOps/s 18.7992 KOps/s $\color{#35bf28}+0.26\%$
test_compile_assign_and_add[tensordict-compile] 2.0126ms 0.1778ms 5.6231 KOps/s 5.4066 KOps/s $\color{#35bf28}+4.00\%$
test_compile_assign_and_add[tensordict-eager] 3.8599ms 3.3187ms 301.3217 Ops/s 297.6538 Ops/s $\color{#35bf28}+1.23\%$
test_compile_assign_and_add[pytree-compile] 1.9557ms 0.1609ms 6.2162 KOps/s 5.9751 KOps/s $\color{#35bf28}+4.03\%$
test_compile_assign_and_add[pytree-eager] 3.2885ms 2.8536ms 350.4325 Ops/s 350.5013 Ops/s $\color{#d91a1a}-0.02\%$
test_compile_indexing[tensor-tensordict-compile] 0.2543ms 0.1096ms 9.1201 KOps/s 8.7245 KOps/s $\color{#35bf28}+4.53\%$
test_compile_indexing[tensor-tensordict-eager] 0.5133ms 75.0424μs 13.3258 KOps/s 13.2396 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[tensor-tensorclass-compile] 0.6187ms 95.5601μs 10.4646 KOps/s 10.1551 KOps/s $\color{#35bf28}+3.05\%$
test_compile_indexing[tensor-tensorclass-eager] 0.5210ms 44.9058μs 22.2688 KOps/s 20.7297 KOps/s $\textbf{\color{#35bf28}+7.42\%}$
test_compile_indexing[tensor-pytree-compile] 0.5621ms 96.0202μs 10.4145 KOps/s 10.0847 KOps/s $\color{#35bf28}+3.27\%$
test_compile_indexing[tensor-pytree-eager] 0.2492ms 44.4537μs 22.4953 KOps/s 21.7287 KOps/s $\color{#35bf28}+3.53\%$
test_compile_indexing[slice-tensordict-compile] 0.5253ms 56.5248μs 17.6913 KOps/s 17.0063 KOps/s $\color{#35bf28}+4.03\%$
test_compile_indexing[slice-tensordict-eager] 0.2172ms 27.4556μs 36.4224 KOps/s 35.8536 KOps/s $\color{#35bf28}+1.59\%$
test_compile_indexing[slice-tensorclass-compile] 0.1304ms 44.4844μs 22.4798 KOps/s 22.2755 KOps/s $\color{#35bf28}+0.92\%$
test_compile_indexing[slice-tensorclass-eager] 0.4814ms 22.6257μs 44.1975 KOps/s 44.1138 KOps/s $\color{#35bf28}+0.19\%$
test_compile_indexing[slice-pytree-compile] 0.4725ms 45.3822μs 22.0351 KOps/s 21.5143 KOps/s $\color{#35bf28}+2.42\%$
test_compile_indexing[slice-pytree-eager] 0.4622ms 22.6451μs 44.1596 KOps/s 43.6134 KOps/s $\color{#35bf28}+1.25\%$
test_compile_indexing[int-tensordict-compile] 0.4997ms 57.0164μs 17.5388 KOps/s 16.7974 KOps/s $\color{#35bf28}+4.41\%$
test_compile_indexing[int-tensordict-eager] 0.5809ms 27.5512μs 36.2961 KOps/s 36.0663 KOps/s $\color{#35bf28}+0.64\%$
test_compile_indexing[int-tensorclass-compile] 84.4320μs 44.9726μs 22.2357 KOps/s 21.8972 KOps/s $\color{#35bf28}+1.55\%$
test_compile_indexing[int-tensorclass-eager] 0.4635ms 22.5697μs 44.3072 KOps/s 43.6325 KOps/s $\color{#35bf28}+1.55\%$
test_compile_indexing[int-pytree-compile] 0.4873ms 45.1276μs 22.1594 KOps/s 22.0899 KOps/s $\color{#35bf28}+0.31\%$
test_compile_indexing[int-pytree-eager] 0.4532ms 22.4719μs 44.5001 KOps/s 43.7609 KOps/s $\color{#35bf28}+1.69\%$
test_compile_replace[single-eager] 0.4921ms 47.7454μs 20.9444 KOps/s 21.0288 KOps/s $\color{#d91a1a}-0.40\%$
test_compile_replace[single-compile] 0.1804ms 0.1060ms 9.4366 KOps/s 9.1537 KOps/s $\color{#35bf28}+3.09\%$
test_compile_replace[multi-eager] 1.0405ms 0.5729ms 1.7456 KOps/s 1.7741 KOps/s $\color{#d91a1a}-1.60\%$
test_compile_replace[multi-compile] 0.1438ms 0.1127ms 8.8754 KOps/s 8.6665 KOps/s $\color{#35bf28}+2.41\%$
test_compile_tc_getattr_20[eager] 0.6056ms 0.1731ms 5.7768 KOps/s 5.7892 KOps/s $\color{#d91a1a}-0.21\%$
test_compile_tc_getattr_20[compile] 0.2723ms 0.1196ms 8.3583 KOps/s 8.1217 KOps/s $\color{#35bf28}+2.91\%$
test_compile_clone_shallow[20-eager] 0.4468ms 19.8725μs 50.3207 KOps/s 52.5398 KOps/s $\color{#d91a1a}-4.22\%$
test_compile_clone_shallow[20-compile] 0.4872ms 11.2274μs 89.0676 KOps/s 85.1435 KOps/s $\color{#35bf28}+4.61\%$
test_compile_clone_shallow[40-eager] 0.4535ms 34.5778μs 28.9203 KOps/s 29.2583 KOps/s $\color{#d91a1a}-1.16\%$
test_compile_clone_shallow[40-compile] 67.6610μs 12.7920μs 78.1740 KOps/s 75.8066 KOps/s $\color{#35bf28}+3.12\%$
test_compile_clone_shallow[80-eager] 0.5073ms 63.0851μs 15.8516 KOps/s 15.5398 KOps/s $\color{#35bf28}+2.01\%$
test_compile_clone_shallow[80-compile] 0.4475ms 15.0108μs 66.6186 KOps/s 64.0099 KOps/s $\color{#35bf28}+4.08\%$
test_compile_update_inplace[eager] 0.5093ms 59.5563μs 16.7908 KOps/s 17.0130 KOps/s $\color{#d91a1a}-1.31\%$
test_compile_update_inplace[compile] 0.5708ms 0.1393ms 7.1775 KOps/s 6.9047 KOps/s $\color{#35bf28}+3.95\%$
test_mod_add[eager] 0.4843ms 49.8419μs 20.0635 KOps/s 20.2420 KOps/s $\color{#d91a1a}-0.88\%$
test_mod_add[compile] 0.3205ms 0.1043ms 9.5910 KOps/s 9.4114 KOps/s $\color{#35bf28}+1.91\%$
test_mod_add[compile-overhead] 0.2357ms 0.1495ms 6.6886 KOps/s 6.5218 KOps/s $\color{#35bf28}+2.56\%$
test_mod_wrap[eager] 0.7431ms 0.2907ms 3.4397 KOps/s 3.4074 KOps/s $\color{#35bf28}+0.95\%$
test_mod_wrap[compile] 0.4420ms 0.3607ms 2.7724 KOps/s 2.8072 KOps/s $\color{#d91a1a}-1.24\%$
test_mod_wrap[compile-overhead] 7.0034ms 3.7894ms 263.8942 Ops/s 250.1506 Ops/s $\textbf{\color{#35bf28}+5.49\%}$
test_mod_wrap_and_backward[eager] 1.6084ms 1.4942ms 669.2355 Ops/s 661.8537 Ops/s $\color{#35bf28}+1.12\%$
test_mod_wrap_and_backward[compile] 1.5289ms 1.4413ms 693.8331 Ops/s 678.4257 Ops/s $\color{#35bf28}+2.27\%$
test_mod_wrap_and_backward[compile-overhead] 1.3034ms 0.8939ms 1.1187 KOps/s 1.0945 KOps/s $\color{#35bf28}+2.21\%$
test_seq_add[eager] 0.2372ms 0.1610ms 6.2105 KOps/s 6.4532 KOps/s $\color{#d91a1a}-3.76\%$
test_seq_add[compile] 0.5209ms 0.1187ms 8.4236 KOps/s 7.9138 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_seq_add[compile-overhead] 0.2302ms 0.1614ms 6.1952 KOps/s 6.2071 KOps/s $\color{#d91a1a}-0.19\%$
test_seq_wrap[eager] 0.6240ms 0.5464ms 1.8302 KOps/s 1.9115 KOps/s $\color{#d91a1a}-4.25\%$
test_seq_wrap[compile] 0.4235ms 0.3667ms 2.7273 KOps/s 2.6553 KOps/s $\color{#35bf28}+2.71\%$
test_seq_wrap[compile-overhead] 0.3346ms 0.2640ms 3.7884 KOps/s 3.6608 KOps/s $\color{#35bf28}+3.49\%$
test_func_call_runtime[False-eager] 0.9486ms 0.8753ms 1.1424 KOps/s 1.1775 KOps/s $\color{#d91a1a}-2.98\%$
test_func_call_runtime[False-compile] 1.1083ms 0.9222ms 1.0843 KOps/s 1.0664 KOps/s $\color{#35bf28}+1.68\%$
test_func_call_runtime[False-compile-overhead] 0.5218ms 0.4641ms 2.1546 KOps/s 2.1392 KOps/s $\color{#35bf28}+0.72\%$
test_func_call_runtime[True-eager] 1.2159ms 1.0936ms 914.4053 Ops/s 910.2943 Ops/s $\color{#35bf28}+0.45\%$
test_func_call_runtime[True-compile] 1.0116ms 0.9388ms 1.0652 KOps/s 1.0605 KOps/s $\color{#35bf28}+0.44\%$
test_func_call_runtime[True-compile-overhead] 0.5583ms 0.4766ms 2.0982 KOps/s 2.0642 KOps/s $\color{#35bf28}+1.64\%$
test_func_call_cm_runtime[False-eager] 1.0179ms 0.8406ms 1.1897 KOps/s 1.1708 KOps/s $\color{#35bf28}+1.61\%$
test_func_call_cm_runtime[False-compile] 1.0422ms 0.9502ms 1.0524 KOps/s 1.0668 KOps/s $\color{#d91a1a}-1.35\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6329ms 0.4631ms 2.1593 KOps/s 2.1140 KOps/s $\color{#35bf28}+2.15\%$
test_func_call_cm_runtime[True-eager] 1.3953ms 1.2321ms 811.5934 Ops/s 800.7785 Ops/s $\color{#35bf28}+1.35\%$
test_func_call_cm_runtime[True-compile] 1.1294ms 0.9556ms 1.0465 KOps/s 1.0183 KOps/s $\color{#35bf28}+2.77\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5987ms 0.5091ms 1.9642 KOps/s 1.9209 KOps/s $\color{#35bf28}+2.25\%$
test_vmap_func_call_cm_runtime[eager] 2.8791ms 2.3773ms 420.6493 Ops/s 415.8002 Ops/s $\color{#35bf28}+1.17\%$
test_vmap_func_call_cm_runtime[compile] 1.0982ms 0.9811ms 1.0193 KOps/s 991.3028 Ops/s $\color{#35bf28}+2.82\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5636ms 0.5146ms 1.9434 KOps/s 1.9079 KOps/s $\color{#35bf28}+1.86\%$
test_distributed 0.6722ms 0.1534ms 6.5190 KOps/s 6.3736 KOps/s $\color{#35bf28}+2.28\%$
test_tdmodule 0.3947ms 27.6611μs 36.1519 KOps/s 35.5631 KOps/s $\color{#35bf28}+1.66\%$
test_tdmodule_dispatch 78.6320μs 45.1207μs 22.1628 KOps/s 21.9093 KOps/s $\color{#35bf28}+1.16\%$
test_tdseq 77.5110μs 26.7269μs 37.4154 KOps/s 37.3301 KOps/s $\color{#35bf28}+0.23\%$
test_tdseq_dispatch 69.4310μs 46.7983μs 21.3683 KOps/s 20.9876 KOps/s $\color{#35bf28}+1.81\%$
test_instantiation_functorch 2.1454ms 2.0844ms 479.7445 Ops/s 476.9101 Ops/s $\color{#35bf28}+0.59\%$
test_exec_functorch 0.2475ms 0.1777ms 5.6268 KOps/s 5.4605 KOps/s $\color{#35bf28}+3.05\%$
test_exec_functional_call 0.2105ms 0.1615ms 6.1918 KOps/s 6.0892 KOps/s $\color{#35bf28}+1.69\%$
test_exec_td_decorator 0.4380ms 0.2362ms 4.2330 KOps/s 4.0755 KOps/s $\color{#35bf28}+3.86\%$
test_vmap_mlp_speed_decorator[True-True] 1.0419ms 0.8321ms 1.2018 KOps/s 1.1580 KOps/s $\color{#35bf28}+3.78\%$
test_vmap_mlp_speed_decorator[True-False] 1.0400ms 0.8300ms 1.2048 KOps/s 1.1590 KOps/s $\color{#35bf28}+3.95\%$
test_vmap_mlp_speed_decorator[False-True] 1.0039ms 0.7186ms 1.3916 KOps/s 1.3467 KOps/s $\color{#35bf28}+3.34\%$
test_vmap_mlp_speed_decorator[False-False] 0.8929ms 0.7156ms 1.3974 KOps/s 1.3345 KOps/s $\color{#35bf28}+4.71\%$
test_vmap_transformer_speed_decorator[True-True] 21.2028ms 20.6326ms 48.4671 Ops/s 47.9364 Ops/s $\color{#35bf28}+1.11\%$
test_vmap_transformer_speed_decorator[True-False] 21.3554ms 20.6374ms 48.4558 Ops/s 47.9084 Ops/s $\color{#35bf28}+1.14\%$
test_vmap_transformer_speed_decorator[False-True] 21.1218ms 20.4469ms 48.9072 Ops/s 48.3888 Ops/s $\color{#35bf28}+1.07\%$
test_vmap_transformer_speed_decorator[False-False] 20.6631ms 20.4102ms 48.9951 Ops/s 48.3589 Ops/s $\color{#35bf28}+1.32\%$
test_to_module_speed[True] 1.5686ms 1.4869ms 672.5447 Ops/s 675.5950 Ops/s $\color{#d91a1a}-0.45\%$
test_to_module_speed[False] 1.5552ms 1.4690ms 680.7153 Ops/s 687.2723 Ops/s $\color{#d91a1a}-0.95\%$
test_tc_init 73.7910μs 44.3123μs 22.5671 KOps/s 22.3258 KOps/s $\color{#35bf28}+1.08\%$
test_tc_init_tensor_only 33.5810μs 9.7679μs 102.3764 KOps/s 103.3791 KOps/s $\color{#d91a1a}-0.97\%$
test_tc_init_nested 0.1233ms 88.1582μs 11.3432 KOps/s 11.1883 KOps/s $\color{#35bf28}+1.39\%$
test_tc_init_many_fields 39.4500μs 16.3144μs 61.2957 KOps/s 60.8914 KOps/s $\color{#35bf28}+0.66\%$
test_tc_first_layer_tensor 31.5700μs 1.8372μs 544.3064 KOps/s 551.8987 KOps/s $\color{#d91a1a}-1.38\%$
test_tc_first_layer_tensor_only 1.5495μs 0.3963μs 2.5231 MOps/s 2.5109 MOps/s $\color{#35bf28}+0.49\%$
test_tc_first_layer_tensor_set 27.0400μs 3.9351μs 254.1210 KOps/s 254.2313 KOps/s $\color{#d91a1a}-0.04\%$
test_tc_first_layer_tensor_only_set 24.5600μs 3.2652μs 306.2606 KOps/s 300.6727 KOps/s $\color{#35bf28}+1.86\%$
test_tc_first_layer_nontensor 26.6900μs 6.2300μs 160.5142 KOps/s 160.7150 KOps/s $\color{#d91a1a}-0.12\%$
test_tc_second_layer_tensor 34.4210μs 4.4901μs 222.7108 KOps/s 223.7364 KOps/s $\color{#d91a1a}-0.46\%$
test_tc_second_layer_nontensor 42.8510μs 8.8142μs 113.4534 KOps/s 115.1649 KOps/s $\color{#d91a1a}-1.49\%$
test_unbind 0.2649s 17.9828ms 55.6088 Ops/s 66.5562 Ops/s $\textbf{\color{#d91a1a}-16.45\%}$
test_full_like 7.5270ms 4.3436ms 230.2237 Ops/s 59.2888 Ops/s $\textbf{\color{#35bf28}+288.31\%}$
test_zeros_like 5.0215ms 4.3789ms 228.3676 Ops/s 59.3357 Ops/s $\textbf{\color{#35bf28}+284.87\%}$
test_ones_like 4.5911ms 4.3751ms 228.5672 Ops/s 59.5702 Ops/s $\textbf{\color{#35bf28}+283.69\%}$
test_clone 6.5930ms 6.4375ms 155.3397 Ops/s 56.2241 Ops/s $\textbf{\color{#35bf28}+176.29\%}$
test_squeeze 79.2910μs 13.9319μs 71.7776 KOps/s 69.9429 KOps/s $\color{#35bf28}+2.62\%$
test_unsqueeze 0.1745ms 0.1116ms 8.9595 KOps/s 8.9747 KOps/s $\color{#d91a1a}-0.17\%$
test_split 0.2935ms 0.1828ms 5.4715 KOps/s 5.3034 KOps/s $\color{#35bf28}+3.17\%$
test_permute 0.3204ms 0.2027ms 4.9329 KOps/s 4.6784 KOps/s $\textbf{\color{#35bf28}+5.44\%}$
test_stack 55.0164ms 54.2037ms 18.4489 Ops/s 19.3471 Ops/s $\color{#d91a1a}-4.64\%$
test_cat 54.1943ms 51.6324ms 19.3677 Ops/s 19.4073 Ops/s $\color{#d91a1a}-0.20\%$
test_sequential_tensordict 0.6281ms 0.2215ms 4.5150 KOps/s 4.5973 KOps/s $\color{#d91a1a}-1.79\%$
test_sequential_graph_module 0.1588ms 0.1263ms 7.9204 KOps/s 8.1941 KOps/s $\color{#d91a1a}-3.34\%$
test_nested_tensordict 0.3384ms 0.2841ms 3.5201 KOps/s 3.5288 KOps/s $\color{#d91a1a}-0.25\%$
test_nested_graph_module 0.5495ms 0.1359ms 7.3568 KOps/s 7.3948 KOps/s $\color{#d91a1a}-0.51\%$

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 29.9510μs 14.4746μs 69.0863 KOps/s 70.3572 KOps/s $\color{#d91a1a}-1.81\%$
test_plain_set_stack_nested 38.4310μs 14.8071μs 67.5354 KOps/s 68.4167 KOps/s $\color{#d91a1a}-1.29\%$
test_plain_set_nested_inplace 37.3010μs 16.2131μs 61.6785 KOps/s 61.8554 KOps/s $\color{#d91a1a}-0.29\%$
test_plain_set_stack_nested_inplace 40.6910μs 16.2810μs 61.4213 KOps/s 62.6643 KOps/s $\color{#d91a1a}-1.98\%$
test_items 33.8000μs 5.5689μs 179.5691 KOps/s 179.9665 KOps/s $\color{#d91a1a}-0.22\%$
test_items_nested 0.4881ms 0.4490ms 2.2271 KOps/s 2.2359 KOps/s $\color{#d91a1a}-0.39\%$
test_items_nested_locked 0.5027ms 0.4532ms 2.2064 KOps/s 2.2107 KOps/s $\color{#d91a1a}-0.19\%$
test_items_nested_leaf 0.1512ms 91.9447μs 10.8761 KOps/s 10.8080 KOps/s $\color{#35bf28}+0.63\%$
test_items_stack_nested 0.4797ms 0.4505ms 2.2195 KOps/s 2.2440 KOps/s $\color{#d91a1a}-1.09\%$
test_items_stack_nested_leaf 0.1541ms 94.4595μs 10.5865 KOps/s 10.6554 KOps/s $\color{#d91a1a}-0.65\%$
test_items_stack_nested_locked 0.5449ms 0.4538ms 2.2038 KOps/s 2.2231 KOps/s $\color{#d91a1a}-0.87\%$
test_keys 28.0310μs 4.1360μs 241.7796 KOps/s 240.9997 KOps/s $\color{#35bf28}+0.32\%$
test_keys_nested 0.1816ms 0.1283ms 7.7942 KOps/s 7.8401 KOps/s $\color{#d91a1a}-0.58\%$
test_keys_nested_locked 0.7720ms 0.1357ms 7.3682 KOps/s 7.3580 KOps/s $\color{#35bf28}+0.14\%$
test_keys_nested_leaf 0.1639ms 0.1186ms 8.4295 KOps/s 8.5272 KOps/s $\color{#d91a1a}-1.15\%$
test_keys_stack_nested 0.1691ms 0.1290ms 7.7523 KOps/s 7.8091 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_stack_nested_leaf 0.1495ms 0.1183ms 8.4512 KOps/s 8.4457 KOps/s $\color{#35bf28}+0.07\%$
test_keys_stack_nested_locked 0.1639ms 0.1363ms 7.3342 KOps/s 7.3446 KOps/s $\color{#d91a1a}-0.14\%$
test_values 6.1502μs 1.0115μs 988.5902 KOps/s 998.6009 KOps/s $\color{#d91a1a}-1.00\%$
test_values_nested 75.9710μs 51.5790μs 19.3877 KOps/s 19.6293 KOps/s $\color{#d91a1a}-1.23\%$
test_values_nested_locked 90.5910μs 55.1383μs 18.1362 KOps/s 18.4124 KOps/s $\color{#d91a1a}-1.50\%$
test_values_nested_leaf 0.1064ms 59.1945μs 16.8935 KOps/s 17.0319 KOps/s $\color{#d91a1a}-0.81\%$
test_values_stack_nested 85.0210μs 51.7311μs 19.3307 KOps/s 19.5398 KOps/s $\color{#d91a1a}-1.07\%$
test_values_stack_nested_leaf 93.0620μs 59.2799μs 16.8691 KOps/s 17.1487 KOps/s $\color{#d91a1a}-1.63\%$
test_values_stack_nested_locked 0.1002ms 55.2764μs 18.0909 KOps/s 18.5954 KOps/s $\color{#d91a1a}-2.71\%$
test_membership 4.8418μs 0.8146μs 1.2275 MOps/s 1.2204 MOps/s $\color{#35bf28}+0.59\%$
test_membership_nested 32.6310μs 2.7648μs 361.6950 KOps/s 362.3219 KOps/s $\color{#d91a1a}-0.17\%$
test_membership_nested_leaf 19.3455μs 2.6404μs 378.7293 KOps/s 360.8127 KOps/s $\color{#35bf28}+4.97\%$
test_membership_stacked_nested 31.9000μs 2.7177μs 367.9635 KOps/s 362.2735 KOps/s $\color{#35bf28}+1.57\%$
test_membership_stacked_nested_leaf 28.9600μs 2.7408μs 364.8513 KOps/s 366.4649 KOps/s $\color{#d91a1a}-0.44\%$
test_membership_nested_last 32.3000μs 4.1465μs 241.1668 KOps/s 241.7224 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_nested_leaf_last 34.4110μs 4.1278μs 242.2603 KOps/s 241.9704 KOps/s $\color{#35bf28}+0.12\%$
test_membership_stacked_nested_last 28.0000μs 4.1215μs 242.6305 KOps/s 244.2299 KOps/s $\color{#d91a1a}-0.65\%$
test_membership_stacked_nested_leaf_last 39.4810μs 4.1062μs 243.5338 KOps/s 243.0951 KOps/s $\color{#35bf28}+0.18\%$
test_nested_getleaf 50.1810μs 21.0310μs 47.5489 KOps/s 49.6247 KOps/s $\color{#d91a1a}-4.18\%$
test_nested_get 53.8210μs 19.4881μs 51.3133 KOps/s 52.6842 KOps/s $\color{#d91a1a}-2.60\%$
test_stacked_getleaf 80.2220μs 20.3114μs 49.2334 KOps/s 50.1236 KOps/s $\color{#d91a1a}-1.78\%$
test_stacked_get 42.6910μs 19.5531μs 51.1427 KOps/s 52.5698 KOps/s $\color{#d91a1a}-2.71\%$
test_nested_getitemleaf 65.5010μs 20.8844μs 47.8826 KOps/s 49.0730 KOps/s $\color{#d91a1a}-2.43\%$
test_nested_getitem 46.5310μs 19.7303μs 50.6834 KOps/s 51.4573 KOps/s $\color{#d91a1a}-1.50\%$
test_stacked_getitemleaf 45.9010μs 20.8309μs 48.0056 KOps/s 48.9079 KOps/s $\color{#d91a1a}-1.84\%$
test_stacked_getitem 62.2820μs 20.0357μs 49.9110 KOps/s 51.1194 KOps/s $\color{#d91a1a}-2.36\%$
test_lock_nested 8.0686ms 0.4650ms 2.1506 KOps/s 2.1924 KOps/s $\color{#d91a1a}-1.91\%$
test_lock_stack_nested 0.4932ms 0.4570ms 2.1883 KOps/s 2.1611 KOps/s $\color{#35bf28}+1.26\%$
test_unlock_nested 0.4447ms 0.3723ms 2.6862 KOps/s 2.6995 KOps/s $\color{#d91a1a}-0.49\%$
test_unlock_stack_nested 0.4151ms 0.3709ms 2.6964 KOps/s 2.6729 KOps/s $\color{#35bf28}+0.88\%$
test_flatten_speed 0.1597ms 0.1174ms 8.5172 KOps/s 8.5876 KOps/s $\color{#d91a1a}-0.82\%$
test_unflatten_speed 0.5794ms 0.5477ms 1.8257 KOps/s 1.8296 KOps/s $\color{#d91a1a}-0.21\%$
test_common_ops 0.8562ms 0.6781ms 1.4747 KOps/s 1.4584 KOps/s $\color{#35bf28}+1.11\%$
test_creation 70.7510μs 2.9560μs 338.2925 KOps/s 337.5566 KOps/s $\color{#35bf28}+0.22\%$
test_creation_empty 45.1310μs 6.6304μs 150.8195 KOps/s 152.5791 KOps/s $\color{#d91a1a}-1.15\%$
test_creation_nested_1 44.8910μs 11.0896μs 90.1742 KOps/s 91.5865 KOps/s $\color{#d91a1a}-1.54\%$
test_creation_nested_2 38.1510μs 12.5533μs 79.6606 KOps/s 79.5836 KOps/s $\color{#35bf28}+0.10\%$
test_creation_many_keys[10] 40.3900μs 19.7750μs 50.5690 KOps/s 50.4786 KOps/s $\color{#35bf28}+0.18\%$
test_creation_many_keys[50] 0.1159ms 85.7428μs 11.6628 KOps/s 11.9174 KOps/s $\color{#d91a1a}-2.14\%$
test_creation_many_keys[100] 0.2247ms 0.1703ms 5.8727 KOps/s 5.9365 KOps/s $\color{#d91a1a}-1.08\%$
test_creation_nested_many_keys[10] 72.4320μs 42.7515μs 23.3910 KOps/s 23.4469 KOps/s $\color{#d91a1a}-0.24\%$
test_creation_nested_many_keys[50] 0.2127ms 0.1751ms 5.7115 KOps/s 5.7755 KOps/s $\color{#d91a1a}-1.11\%$
test_clone 39.7000μs 12.4730μs 80.1731 KOps/s 75.7557 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_getitem[int] 1.6508ms 14.5103μs 68.9166 KOps/s 62.1008 KOps/s $\textbf{\color{#35bf28}+10.98\%}$
test_getitem[slice_int] 0.1313ms 23.1169μs 43.2583 KOps/s 43.0199 KOps/s $\color{#35bf28}+0.55\%$
test_getitem[range] 0.1716ms 60.3854μs 16.5603 KOps/s 16.3056 KOps/s $\color{#35bf28}+1.56\%$
test_getitem[tuple] 0.1389ms 22.6100μs 44.2281 KOps/s 43.6674 KOps/s $\color{#35bf28}+1.28\%$
test_getitem[list] 0.1736ms 57.1289μs 17.5043 KOps/s 17.6820 KOps/s $\color{#d91a1a}-1.01\%$
test_setitem_dim[int] 45.2610μs 25.3471μs 39.4522 KOps/s 39.5695 KOps/s $\color{#d91a1a}-0.30\%$
test_setitem_dim[slice_int] 74.6610μs 41.7705μs 23.9403 KOps/s 23.8929 KOps/s $\color{#35bf28}+0.20\%$
test_setitem_dim[range] 0.1276ms 91.3920μs 10.9419 KOps/s 10.4145 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_setitem_dim[tuple] 59.9810μs 38.7164μs 25.8289 KOps/s 24.4909 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_setitem 54.5310μs 16.9992μs 58.8265 KOps/s 55.2165 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_set 0.5076ms 16.0838μs 62.1744 KOps/s 59.7936 KOps/s $\color{#35bf28}+3.98\%$
test_set_shared 0.6236ms 0.2054ms 4.8674 KOps/s 4.7702 KOps/s $\color{#35bf28}+2.04\%$
test_update 0.1869ms 21.0581μs 47.4876 KOps/s 43.9517 KOps/s $\textbf{\color{#35bf28}+8.05\%}$
test_update_nested 64.7120μs 31.6188μs 31.6268 KOps/s 29.4382 KOps/s $\textbf{\color{#35bf28}+7.43\%}$
test_update__nested 0.5234ms 32.8994μs 30.3957 KOps/s 27.7926 KOps/s $\textbf{\color{#35bf28}+9.37\%}$
test_set_nested 54.6810μs 19.1531μs 52.2109 KOps/s 49.6973 KOps/s $\textbf{\color{#35bf28}+5.06\%}$
test_set_nested_new 59.6210μs 23.0374μs 43.4077 KOps/s 40.0664 KOps/s $\textbf{\color{#35bf28}+8.34\%}$
test_select 68.8410μs 38.4854μs 25.9839 KOps/s 24.8204 KOps/s $\color{#35bf28}+4.69\%$
test_select_nested 99.9020μs 71.1734μs 14.0502 KOps/s 14.0214 KOps/s $\color{#35bf28}+0.21\%$
test_exclude_nested 0.1148ms 87.2090μs 11.4667 KOps/s 11.3100 KOps/s $\color{#35bf28}+1.39\%$
test_empty[True] 0.4556ms 0.3842ms 2.6025 KOps/s 2.5994 KOps/s $\color{#35bf28}+0.12\%$
test_empty[False] 6.3427μs 1.2565μs 795.8755 KOps/s 793.4727 KOps/s $\color{#35bf28}+0.30\%$
test_to 0.1027ms 71.6340μs 13.9599 KOps/s 14.0452 KOps/s $\color{#d91a1a}-0.61\%$
test_to_nonblocking 0.1110ms 67.7509μs 14.7599 KOps/s 15.6195 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_unbind_speed 0.3511ms 0.3199ms 3.1262 KOps/s 3.1617 KOps/s $\color{#d91a1a}-1.12\%$
test_unbind_speed_stack0 0.3829ms 0.3173ms 3.1515 KOps/s 3.1851 KOps/s $\color{#d91a1a}-1.05\%$
test_unbind_speed_stack1 0.1040s 0.8081ms 1.2375 KOps/s 1.2369 KOps/s $\color{#35bf28}+0.05\%$
test_split 0.1040s 1.2188ms 820.4528 Ops/s 827.3493 Ops/s $\color{#d91a1a}-0.83\%$
test_chunk 0.1043s 1.1565ms 864.6781 Ops/s 968.2886 Ops/s $\textbf{\color{#d91a1a}-10.70\%}$
test_to_cpu_blocking 19.5850ms 19.5258ms 51.2142 Ops/s 46.2068 Ops/s $\textbf{\color{#35bf28}+10.84\%}$
test_to_cpu_global_sync 11.3958ms 11.3046ms 88.4599 Ops/s 88.5779 Ops/s $\color{#d91a1a}-0.13\%$
test_to_cpu_event_sync 12.6012ms 12.3321ms 81.0890 Ops/s 81.2191 Ops/s $\color{#d91a1a}-0.16\%$
test_to_cpu_default 12.6064ms 12.3354ms 81.0675 Ops/s 81.1779 Ops/s $\color{#d91a1a}-0.14\%$
test_consolidate[False-None] 4.0282ms 3.9598ms 252.5374 Ops/s 222.9650 Ops/s $\textbf{\color{#35bf28}+13.26\%}$
test_consolidate[default-None] 2.0184ms 1.9238ms 519.8022 Ops/s 502.6801 Ops/s $\color{#35bf28}+3.41\%$
test_consolidate[reduce-overhead-None] 1.9170ms 1.8431ms 542.5663 Ops/s 520.1939 Ops/s $\color{#35bf28}+4.30\%$
test_consolidate_njt[False-None] 8.4179ms 8.1817ms 122.2241 Ops/s 120.7256 Ops/s $\color{#35bf28}+1.24\%$
test_to[False-False-None] 2.1522ms 2.0648ms 484.2979 Ops/s 479.8149 Ops/s $\color{#35bf28}+0.93\%$
test_to[True-False-None] 2.1716ms 1.8522ms 539.8931 Ops/s 525.5844 Ops/s $\color{#35bf28}+2.72\%$
test_to[within-False-None] 6.2031ms 5.8644ms 170.5191 Ops/s 165.6574 Ops/s $\color{#35bf28}+2.93\%$
test_to[True-default-None] 8.8218ms 8.6810ms 115.1945 Ops/s 113.4335 Ops/s $\color{#35bf28}+1.55\%$
test_to_njt[False-False-None] 8.4674ms 8.2635ms 121.0147 Ops/s 120.3563 Ops/s $\color{#35bf28}+0.55\%$
test_to_njt[True-False-None] 6.8743ms 6.7520ms 148.1041 Ops/s 148.1472 Ops/s $\color{#d91a1a}-0.03\%$
test_to_njt[within-False-None] 15.2683ms 15.1747ms 65.8990 Ops/s 66.3607 Ops/s $\color{#d91a1a}-0.70\%$
test_creation[device0] 0.3525ms 0.1117ms 8.9509 KOps/s 8.4630 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_creation_from_tensor 0.3916ms 0.1091ms 9.1635 KOps/s 8.6482 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_add_one[memmap_tensor0] 0.2254ms 6.1031μs 163.8518 KOps/s 156.8092 KOps/s $\color{#35bf28}+4.49\%$
test_contiguous[memmap_tensor0] 17.3100μs 0.6025μs 1.6596 MOps/s 2.2522 MOps/s $\textbf{\color{#d91a1a}-26.31\%}$
test_stack[memmap_tensor0] 29.8100μs 4.3685μs 228.9096 KOps/s 215.2068 KOps/s $\textbf{\color{#35bf28}+6.37\%}$
test_memmaptd_index 1.0549ms 0.2620ms 3.8165 KOps/s 3.8182 KOps/s $\color{#d91a1a}-0.04\%$
test_memmaptd_index_astensor 0.5168ms 0.3636ms 2.7501 KOps/s 2.7762 KOps/s $\color{#d91a1a}-0.94\%$
test_memmaptd_index_op 0.8357ms 0.5952ms 1.6800 KOps/s 1.6445 KOps/s $\color{#35bf28}+2.16\%$
test_serialize_model 0.1401s 0.1364s 7.3334 Ops/s 5.8654 Ops/s $\textbf{\color{#35bf28}+25.03\%}$
test_serialize_model_pickle 1.3480s 1.2102s 0.8263 Ops/s 0.8384 Ops/s $\color{#d91a1a}-1.44\%$
test_serialize_weights 0.1374s 0.1355s 7.3786 Ops/s 7.3286 Ops/s $\color{#35bf28}+0.68\%$
test_serialize_weights_returnearly 0.4333s 91.4515ms 10.9348 Ops/s 10.5539 Ops/s $\color{#35bf28}+3.61\%$
test_serialize_weights_pickle 1.3507s 1.2109s 0.8258 Ops/s 0.8227 Ops/s $\color{#35bf28}+0.38\%$
test_reshape_pytree 0.2225ms 30.8765μs 32.3871 KOps/s 31.2228 KOps/s $\color{#35bf28}+3.73\%$
test_reshape_td 68.3210μs 43.9953μs 22.7297 KOps/s 22.8735 KOps/s $\color{#d91a1a}-0.63\%$
test_view_pytree 0.2150ms 30.5850μs 32.6958 KOps/s 31.6981 KOps/s $\color{#35bf28}+3.15\%$
test_view_td 76.1610μs 50.3693μs 19.8534 KOps/s 20.0181 KOps/s $\color{#d91a1a}-0.82\%$
test_unbind_pytree 0.2325ms 34.4356μs 29.0397 KOps/s 28.1408 KOps/s $\color{#35bf28}+3.19\%$
test_unbind_td 0.1075ms 47.5965μs 21.0099 KOps/s 21.1483 KOps/s $\color{#d91a1a}-0.65\%$
test_split_pytree 0.2554ms 39.7371μs 25.1654 KOps/s 24.4558 KOps/s $\color{#35bf28}+2.90\%$
test_split_td 0.1485ms 58.7680μs 17.0161 KOps/s 16.3856 KOps/s $\color{#35bf28}+3.85\%$
test_add_pytree 0.2255ms 39.7526μs 25.1556 KOps/s 24.4978 KOps/s $\color{#35bf28}+2.69\%$
test_add_td 0.1045ms 52.0055μs 19.2287 KOps/s 18.8809 KOps/s $\color{#35bf28}+1.84\%$
test_compile_add_one_nested[tensordict-compile] 0.2125ms 0.1397ms 7.1602 KOps/s 7.0295 KOps/s $\color{#35bf28}+1.86\%$
test_compile_add_one_nested[tensordict-eager] 0.4245ms 0.2070ms 4.8309 KOps/s 5.1782 KOps/s $\textbf{\color{#d91a1a}-6.71\%}$
test_compile_add_one_nested[pytree-compile] 0.6752ms 0.1082ms 9.2421 KOps/s 8.7872 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_compile_add_one_nested[pytree-eager] 0.6522ms 0.1865ms 5.3630 KOps/s 5.6410 KOps/s $\color{#d91a1a}-4.93\%$
test_compile_copy_nested[tensordict-compile] 0.2503ms 10.3265μs 96.8384 KOps/s 103.5455 KOps/s $\textbf{\color{#d91a1a}-6.48\%}$
test_compile_copy_nested[tensordict-eager] 77.2220μs 50.9487μs 19.6276 KOps/s 19.4398 KOps/s $\color{#35bf28}+0.97\%$
test_compile_copy_nested[pytree-compile] 46.2610μs 9.5292μs 104.9408 KOps/s 104.7371 KOps/s $\color{#35bf28}+0.19\%$
test_compile_copy_nested[pytree-eager] 0.4473ms 64.2487μs 15.5645 KOps/s 15.2384 KOps/s $\color{#35bf28}+2.14\%$
test_compile_add_one_flat[tensordict-compile] 0.2376ms 0.1742ms 5.7404 KOps/s 5.5242 KOps/s $\color{#35bf28}+3.91\%$
test_compile_add_one_flat[tensordict-eager] 0.4061ms 0.2757ms 3.6270 KOps/s 3.6383 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_add_one_flat[tensorclass-compile] 0.1889ms 0.1167ms 8.5696 KOps/s 8.4864 KOps/s $\color{#35bf28}+0.98\%$
test_compile_add_one_flat[tensorclass-eager] 0.1186ms 72.5933μs 13.7754 KOps/s 13.5592 KOps/s $\color{#35bf28}+1.59\%$
test_compile_add_one_flat[pytree-compile] 0.2202ms 0.1565ms 6.3899 KOps/s 6.2737 KOps/s $\color{#35bf28}+1.85\%$
test_compile_add_one_flat[pytree-eager] 0.9155ms 0.5268ms 1.8981 KOps/s 1.8948 KOps/s $\color{#35bf28}+0.18\%$
test_compile_add_self_flat[tensordict-eager] 0.3842ms 0.3282ms 3.0469 KOps/s 3.0455 KOps/s $\color{#35bf28}+0.05\%$
test_compile_add_self_flat[tensordict-compile] 0.2272ms 0.1773ms 5.6417 KOps/s 3.3697 KOps/s $\textbf{\color{#35bf28}+67.43\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1349ms 91.3949μs 10.9415 KOps/s 11.0864 KOps/s $\color{#d91a1a}-1.31\%$
test_compile_add_self_flat[tensorclass-compile] 0.2070ms 0.1201ms 8.3282 KOps/s 7.8547 KOps/s $\textbf{\color{#35bf28}+6.03\%}$
test_compile_add_self_flat[pytree-eager] 0.6685ms 0.4339ms 2.3048 KOps/s 2.2176 KOps/s $\color{#35bf28}+3.93\%$
test_compile_add_self_flat[pytree-compile] 0.1944ms 0.1554ms 6.4341 KOps/s 6.2204 KOps/s $\color{#35bf28}+3.43\%$
test_compile_copy_flat[tensordict-compile] 43.3010μs 13.1670μs 75.9477 KOps/s 74.8361 KOps/s $\color{#35bf28}+1.49\%$
test_compile_copy_flat[tensordict-eager] 74.3010μs 40.6168μs 24.6204 KOps/s 24.9185 KOps/s $\color{#d91a1a}-1.20\%$
test_compile_copy_flat[pytree-compile] 66.2010μs 10.3497μs 96.6216 KOps/s 94.5111 KOps/s $\color{#35bf28}+2.23\%$
test_compile_copy_flat[pytree-eager] 0.4259ms 50.5396μs 19.7865 KOps/s 19.5230 KOps/s $\color{#35bf28}+1.35\%$
test_compile_assign_and_add[tensordict-compile] 1.9381ms 0.1700ms 5.8830 KOps/s 5.4184 KOps/s $\textbf{\color{#35bf28}+8.57\%}$
test_compile_assign_and_add[tensordict-eager] 3.3482ms 3.2374ms 308.8885 Ops/s 302.0445 Ops/s $\color{#35bf28}+2.27\%$
test_compile_assign_and_add[pytree-compile] 1.9173ms 0.1567ms 6.3829 KOps/s 6.2232 KOps/s $\color{#35bf28}+2.57\%$
test_compile_assign_and_add[pytree-eager] 2.8839ms 2.7564ms 362.7860 Ops/s 356.4926 Ops/s $\color{#35bf28}+1.77\%$
test_compile_indexing[tensor-tensordict-compile] 0.1446ms 0.1065ms 9.3867 KOps/s 8.8786 KOps/s $\textbf{\color{#35bf28}+5.72\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3095ms 73.8427μs 13.5423 KOps/s 13.9160 KOps/s $\color{#d91a1a}-2.69\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2027ms 96.2995μs 10.3843 KOps/s 10.4544 KOps/s $\color{#d91a1a}-0.67\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2503ms 43.2035μs 23.1463 KOps/s 21.6090 KOps/s $\textbf{\color{#35bf28}+7.11\%}$
test_compile_indexing[tensor-pytree-compile] 0.1444ms 98.7164μs 10.1300 KOps/s 10.3920 KOps/s $\color{#d91a1a}-2.52\%$
test_compile_indexing[tensor-pytree-eager] 0.2698ms 45.1328μs 22.1568 KOps/s 22.9686 KOps/s $\color{#d91a1a}-3.53\%$
test_compile_indexing[slice-tensordict-compile] 0.1099ms 57.2495μs 17.4674 KOps/s 17.1955 KOps/s $\color{#35bf28}+1.58\%$
test_compile_indexing[slice-tensordict-eager] 0.2254ms 27.0528μs 36.9647 KOps/s 38.0890 KOps/s $\color{#d91a1a}-2.95\%$
test_compile_indexing[slice-tensorclass-compile] 96.9520μs 42.8970μs 23.3116 KOps/s 22.4267 KOps/s $\color{#35bf28}+3.95\%$
test_compile_indexing[slice-tensorclass-eager] 0.2668ms 21.3862μs 46.7591 KOps/s 46.2092 KOps/s $\color{#35bf28}+1.19\%$
test_compile_indexing[slice-pytree-compile] 75.8610μs 44.5869μs 22.4281 KOps/s 22.2481 KOps/s $\color{#35bf28}+0.81\%$
test_compile_indexing[slice-pytree-eager] 0.2604ms 21.5860μs 46.3264 KOps/s 46.5913 KOps/s $\color{#d91a1a}-0.57\%$
test_compile_indexing[int-tensordict-compile] 0.1069ms 57.7712μs 17.3097 KOps/s 17.1301 KOps/s $\color{#35bf28}+1.05\%$
test_compile_indexing[int-tensordict-eager] 0.2233ms 26.7918μs 37.3248 KOps/s 37.8390 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_indexing[int-tensorclass-compile] 82.7920μs 44.1563μs 22.6468 KOps/s 22.3488 KOps/s $\color{#35bf28}+1.33\%$
test_compile_indexing[int-tensorclass-eager] 0.2647ms 21.3704μs 46.7936 KOps/s 46.3613 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[int-pytree-compile] 80.3020μs 44.3870μs 22.5291 KOps/s 22.3233 KOps/s $\color{#35bf28}+0.92\%$
test_compile_indexing[int-pytree-eager] 0.2498ms 21.5562μs 46.3904 KOps/s 46.7378 KOps/s $\color{#d91a1a}-0.74\%$
test_compile_replace[single-eager] 0.1005ms 48.2132μs 20.7412 KOps/s 21.3571 KOps/s $\color{#d91a1a}-2.88\%$
test_compile_replace[single-compile] 0.1780ms 0.1026ms 9.7509 KOps/s 9.4054 KOps/s $\color{#35bf28}+3.67\%$
test_compile_replace[multi-eager] 0.6414ms 0.5654ms 1.7686 KOps/s 1.8008 KOps/s $\color{#d91a1a}-1.79\%$
test_compile_replace[multi-compile] 0.1669ms 0.1126ms 8.8821 KOps/s 8.9205 KOps/s $\color{#d91a1a}-0.43\%$
test_compile_tc_getattr_20[eager] 0.2085ms 0.1683ms 5.9420 KOps/s 5.8394 KOps/s $\color{#35bf28}+1.76\%$
test_compile_tc_getattr_20[compile] 0.1738ms 0.1192ms 8.3917 KOps/s 8.4679 KOps/s $\color{#d91a1a}-0.90\%$
test_compile_clone_shallow[20-eager] 58.8520μs 18.6122μs 53.7282 KOps/s 54.3742 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_clone_shallow[20-compile] 39.9200μs 10.8501μs 92.1649 KOps/s 89.7936 KOps/s $\color{#35bf28}+2.64\%$
test_compile_clone_shallow[40-eager] 69.0320μs 32.8805μs 30.4131 KOps/s 30.6981 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_clone_shallow[40-compile] 40.7010μs 12.1000μs 82.6446 KOps/s 82.2381 KOps/s $\color{#35bf28}+0.49\%$
test_compile_clone_shallow[80-eager] 96.6420μs 61.9429μs 16.1439 KOps/s 16.4497 KOps/s $\color{#d91a1a}-1.86\%$
test_compile_clone_shallow[80-compile] 50.1710μs 14.6803μs 68.1184 KOps/s 67.4496 KOps/s $\color{#35bf28}+0.99\%$
test_compile_update_inplace[eager] 0.1045ms 57.8300μs 17.2921 KOps/s 17.3509 KOps/s $\color{#d91a1a}-0.34\%$
test_compile_update_inplace[compile] 0.1896ms 0.1354ms 7.3872 KOps/s 7.1918 KOps/s $\color{#35bf28}+2.72\%$
test_mod_add[eager] 93.5420μs 47.4959μs 21.0544 KOps/s 20.7287 KOps/s $\color{#35bf28}+1.57\%$
test_mod_add[compile] 0.1569ms 0.1020ms 9.8038 KOps/s 9.1179 KOps/s $\textbf{\color{#35bf28}+7.52\%}$
test_mod_add[compile-overhead] 0.2541ms 0.1460ms 6.8513 KOps/s 6.6825 KOps/s $\color{#35bf28}+2.53\%$
test_mod_wrap[eager] 0.3559ms 0.2838ms 3.5240 KOps/s 3.3969 KOps/s $\color{#35bf28}+3.74\%$
test_mod_wrap[compile] 0.4017ms 0.3399ms 2.9417 KOps/s 2.9073 KOps/s $\color{#35bf28}+1.18\%$
test_mod_wrap[compile-overhead] 7.2408ms 3.9803ms 251.2402 Ops/s 248.7134 Ops/s $\color{#35bf28}+1.02\%$
test_mod_wrap_and_backward[eager] 1.6520ms 1.4635ms 683.2720 Ops/s 659.7152 Ops/s $\color{#35bf28}+3.57\%$
test_mod_wrap_and_backward[compile] 1.5053ms 1.4172ms 705.6383 Ops/s 702.0643 Ops/s $\color{#35bf28}+0.51\%$
test_mod_wrap_and_backward[compile-overhead] 1.2463ms 0.8654ms 1.1555 KOps/s 1.1289 KOps/s $\color{#35bf28}+2.36\%$
test_seq_add[eager] 0.2142ms 0.1560ms 6.4091 KOps/s 6.4102 KOps/s $\color{#d91a1a}-0.02\%$
test_seq_add[compile] 0.5438ms 0.1115ms 8.9684 KOps/s 8.6403 KOps/s $\color{#35bf28}+3.80\%$
test_seq_add[compile-overhead] 0.1953ms 0.1509ms 6.6279 KOps/s 6.4133 KOps/s $\color{#35bf28}+3.35\%$
test_seq_wrap[eager] 0.5923ms 0.5235ms 1.9102 KOps/s 1.9489 KOps/s $\color{#d91a1a}-1.99\%$
test_seq_wrap[compile] 0.4722ms 0.3733ms 2.6785 KOps/s 2.7610 KOps/s $\color{#d91a1a}-2.99\%$
test_seq_wrap[compile-overhead] 0.3763ms 0.2570ms 3.8913 KOps/s 3.8327 KOps/s $\color{#35bf28}+1.53\%$
test_func_call_runtime[False-eager] 0.9040ms 0.8082ms 1.2374 KOps/s 1.2070 KOps/s $\color{#35bf28}+2.52\%$
test_func_call_runtime[False-compile] 0.9300ms 0.8836ms 1.1317 KOps/s 1.1039 KOps/s $\color{#35bf28}+2.52\%$
test_func_call_runtime[False-compile-overhead] 0.4870ms 0.4431ms 2.2567 KOps/s 2.2417 KOps/s $\color{#35bf28}+0.67\%$
test_func_call_runtime[True-eager] 1.1410ms 1.0417ms 959.9346 Ops/s 940.9236 Ops/s $\color{#35bf28}+2.02\%$
test_func_call_runtime[True-compile] 0.9705ms 0.8895ms 1.1243 KOps/s 1.1109 KOps/s $\color{#35bf28}+1.20\%$
test_func_call_runtime[True-compile-overhead] 0.5042ms 0.4565ms 2.1907 KOps/s 2.1678 KOps/s $\color{#35bf28}+1.06\%$
test_func_call_cm_runtime[False-eager] 1.5462ms 0.8113ms 1.2327 KOps/s 1.2001 KOps/s $\color{#35bf28}+2.71\%$
test_func_call_cm_runtime[False-compile] 1.1085ms 0.8823ms 1.1334 KOps/s 1.1226 KOps/s $\color{#35bf28}+0.96\%$
test_func_call_cm_runtime[False-compile-overhead] 0.4966ms 0.4449ms 2.2478 KOps/s 2.2250 KOps/s $\color{#35bf28}+1.03\%$
test_func_call_cm_runtime[True-eager] 1.2856ms 1.1809ms 846.7858 Ops/s 830.1097 Ops/s $\color{#35bf28}+2.01\%$
test_func_call_cm_runtime[True-compile] 0.9874ms 0.9282ms 1.0773 KOps/s 1.0536 KOps/s $\color{#35bf28}+2.26\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5276ms 0.4893ms 2.0436 KOps/s 2.0194 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_func_call_cm_runtime[eager] 2.8286ms 2.3211ms 430.8296 Ops/s 428.5711 Ops/s $\color{#35bf28}+0.53\%$
test_vmap_func_call_cm_runtime[compile] 1.0199ms 0.9534ms 1.0489 KOps/s 1.0426 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5469ms 0.4985ms 2.0059 KOps/s 2.0041 KOps/s $\color{#35bf28}+0.09\%$
test_distributed 0.5721ms 0.1511ms 6.6186 KOps/s 5.7106 KOps/s $\textbf{\color{#35bf28}+15.90\%}$
test_tdmodule 0.4250ms 27.2310μs 36.7229 KOps/s 37.5028 KOps/s $\color{#d91a1a}-2.08\%$
test_tdmodule_dispatch 73.5220μs 44.2255μs 22.6114 KOps/s 22.8906 KOps/s $\color{#d91a1a}-1.22\%$
test_tdseq 46.2900μs 26.2316μs 38.1219 KOps/s 38.1126 KOps/s $\color{#35bf28}+0.02\%$
test_tdseq_dispatch 66.2820μs 46.3619μs 21.5695 KOps/s 21.7397 KOps/s $\color{#d91a1a}-0.78\%$
test_instantiation_functorch 2.0758ms 1.9834ms 504.1743 Ops/s 503.7004 Ops/s $\color{#35bf28}+0.09\%$
test_exec_functorch 0.2286ms 0.1733ms 5.7690 KOps/s 5.7484 KOps/s $\color{#35bf28}+0.36\%$
test_exec_functional_call 0.2239ms 0.1550ms 6.4528 KOps/s 6.4715 KOps/s $\color{#d91a1a}-0.29\%$
test_exec_td_decorator 0.4360ms 0.2247ms 4.4495 KOps/s 4.3638 KOps/s $\color{#35bf28}+1.96\%$
test_vmap_mlp_speed_decorator[True-True] 0.9777ms 0.8018ms 1.2471 KOps/s 1.2273 KOps/s $\color{#35bf28}+1.62\%$
test_vmap_mlp_speed_decorator[True-False] 0.9848ms 0.7996ms 1.2507 KOps/s 1.2277 KOps/s $\color{#35bf28}+1.87\%$
test_vmap_mlp_speed_decorator[False-True] 0.8945ms 0.6921ms 1.4449 KOps/s 1.4310 KOps/s $\color{#35bf28}+0.97\%$
test_vmap_mlp_speed_decorator[False-False] 0.8729ms 0.6909ms 1.4475 KOps/s 1.4257 KOps/s $\color{#35bf28}+1.53\%$
test_vmap_transformer_speed_decorator[True-True] 20.2735ms 20.1165ms 49.7105 Ops/s 49.5519 Ops/s $\color{#35bf28}+0.32\%$
test_vmap_transformer_speed_decorator[True-False] 20.7495ms 20.1821ms 49.5489 Ops/s 49.4994 Ops/s $\color{#35bf28}+0.10\%$
test_vmap_transformer_speed_decorator[False-True] 20.6017ms 19.9651ms 50.0874 Ops/s 50.0068 Ops/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed_decorator[False-False] 20.6242ms 20.0127ms 49.9682 Ops/s 49.9142 Ops/s $\color{#35bf28}+0.11\%$
test_to_module_speed[True] 1.9627ms 1.3813ms 723.9311 Ops/s 714.4193 Ops/s $\color{#35bf28}+1.33\%$
test_to_module_speed[False] 2.1921ms 1.3663ms 731.9232 Ops/s 722.7401 Ops/s $\color{#35bf28}+1.27\%$
test_tc_init 87.2420μs 43.1205μs 23.1908 KOps/s 22.5554 KOps/s $\color{#35bf28}+2.82\%$
test_tc_init_tensor_only 38.1310μs 9.2560μs 108.0383 KOps/s 107.7774 KOps/s $\color{#35bf28}+0.24\%$
test_tc_init_nested 0.3794ms 85.8077μs 11.6540 KOps/s 11.6720 KOps/s $\color{#d91a1a}-0.15\%$
test_tc_init_many_fields 41.2910μs 15.5260μs 64.4082 KOps/s 64.4895 KOps/s $\color{#d91a1a}-0.13\%$
test_tc_first_layer_tensor 24.3910μs 1.7087μs 585.2442 KOps/s 587.5501 KOps/s $\color{#d91a1a}-0.39\%$
test_tc_first_layer_tensor_only 1.6695μs 0.3842μs 2.6025 MOps/s 2.6049 MOps/s $\color{#d91a1a}-0.09\%$
test_tc_first_layer_tensor_set 28.0500μs 3.6653μs 272.8292 KOps/s 273.0340 KOps/s $\color{#d91a1a}-0.07\%$
test_tc_first_layer_tensor_only_set 23.7100μs 3.1102μs 321.5250 KOps/s 318.2200 KOps/s $\color{#35bf28}+1.04\%$
test_tc_first_layer_nontensor 29.3500μs 5.8894μs 169.7978 KOps/s 166.7028 KOps/s $\color{#35bf28}+1.86\%$
test_tc_second_layer_tensor 27.2110μs 4.1721μs 239.6897 KOps/s 246.1866 KOps/s $\color{#d91a1a}-2.64\%$
test_tc_second_layer_nontensor 34.5010μs 8.2797μs 120.7772 KOps/s 119.9295 KOps/s $\color{#35bf28}+0.71\%$
test_unbind 0.2633s 17.1216ms 58.4057 Ops/s 57.7874 Ops/s $\color{#35bf28}+1.07\%$
test_full_like 5.0753ms 4.3787ms 228.3776 Ops/s 227.0288 Ops/s $\color{#35bf28}+0.59\%$
test_zeros_like 4.9252ms 4.3642ms 229.1378 Ops/s 228.9288 Ops/s $\color{#35bf28}+0.09\%$
test_ones_like 5.0137ms 4.3782ms 228.4049 Ops/s 229.1025 Ops/s $\color{#d91a1a}-0.30\%$
test_clone 6.9092ms 6.5122ms 153.5571 Ops/s 153.4009 Ops/s $\color{#35bf28}+0.10\%$
test_squeeze 0.1803ms 13.5758μs 73.6604 KOps/s 73.6824 KOps/s $\color{#d91a1a}-0.03\%$
test_unsqueeze 0.2799ms 0.1092ms 9.1575 KOps/s 9.1813 KOps/s $\color{#d91a1a}-0.26\%$
test_split 0.2498ms 0.1791ms 5.5830 KOps/s 5.5412 KOps/s $\color{#35bf28}+0.76\%$
test_permute 0.2597ms 0.2067ms 4.8387 KOps/s 4.9896 KOps/s $\color{#d91a1a}-3.02\%$
test_stack 44.3271ms 43.1766ms 23.1607 Ops/s 23.1353 Ops/s $\color{#35bf28}+0.11\%$
test_cat 43.4310ms 43.1665ms 23.1661 Ops/s 23.1944 Ops/s $\color{#d91a1a}-0.12\%$
test_sequential_tensordict 0.3209ms 0.2127ms 4.7010 KOps/s 4.7183 KOps/s $\color{#d91a1a}-0.37\%$
test_sequential_graph_module 0.1664ms 0.1166ms 8.5776 KOps/s 8.7058 KOps/s $\color{#d91a1a}-1.47\%$
test_nested_tensordict 0.3800ms 0.2735ms 3.6569 KOps/s 3.6183 KOps/s $\color{#35bf28}+1.07\%$
test_nested_graph_module 0.1727ms 0.1257ms 7.9527 KOps/s 8.0026 KOps/s $\color{#d91a1a}-0.62\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant