Skip to content

[DTensor] Add Strategy B (local-shard transfer + redistribute on receiver)#1641

Open
vmoens wants to merge 1 commit intogh/vmoens/82/basefrom
gh/vmoens/82/head
Open

[DTensor] Add Strategy B (local-shard transfer + redistribute on receiver)#1641
vmoens wants to merge 1 commit intogh/vmoens/82/basefrom
gh/vmoens/82/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 6, 2026

Stack from ghstack (oldest at bottom):

Implement _dtensor_send_redistribute and _dtensor_recv_redistribute:

  • Sender: extracts local shards via to_local(), sends placement metadata
    (placements, mesh topology, mesh dim names) + local tensor data
  • Receiver: receives local shards and placement metadata, stores tensors
    for the caller to reconstruct as DTensors via from_local() + redistribute()

This avoids materializing full tensors (no memory spike on sender),
and transfers only the data each sender rank actually holds.

Made-with: Cursor

[ghstack-poisoned]
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add Strategy B (local-shard transfer + redistribute on receiver)

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add Strategy B (local-shard transfer + redistribute on receiver)

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.1810μs 15.1468μs 66.0206 KOps/s 66.9536 KOps/s $\color{#d91a1a}-1.39\%$
test_plain_set_stack_nested 42.3010μs 15.3918μs 64.9696 KOps/s 65.5792 KOps/s $\color{#d91a1a}-0.93\%$
test_plain_set_nested_inplace 44.7110μs 16.9114μs 59.1316 KOps/s 59.2037 KOps/s $\color{#d91a1a}-0.12\%$
test_plain_set_stack_nested_inplace 58.8020μs 16.8795μs 59.2433 KOps/s 59.6818 KOps/s $\color{#d91a1a}-0.73\%$
test_items 39.0500μs 6.1571μs 162.4146 KOps/s 167.5811 KOps/s $\color{#d91a1a}-3.08\%$
test_items_nested 0.5356ms 0.4720ms 2.1185 KOps/s 2.1215 KOps/s $\color{#d91a1a}-0.14\%$
test_items_nested_locked 0.5183ms 0.4754ms 2.1036 KOps/s 2.1158 KOps/s $\color{#d91a1a}-0.58\%$
test_items_nested_leaf 0.1278ms 99.2144μs 10.0792 KOps/s 10.1834 KOps/s $\color{#d91a1a}-1.02\%$
test_items_stack_nested 0.5347ms 0.4698ms 2.1284 KOps/s 2.1366 KOps/s $\color{#d91a1a}-0.38\%$
test_items_stack_nested_leaf 0.1346ms 98.7403μs 10.1276 KOps/s 10.1144 KOps/s $\color{#35bf28}+0.13\%$
test_items_stack_nested_locked 0.5065ms 0.4765ms 2.0986 KOps/s 2.1011 KOps/s $\color{#d91a1a}-0.12\%$
test_keys 27.2800μs 4.2660μs 234.4097 KOps/s 236.4689 KOps/s $\color{#d91a1a}-0.87\%$
test_keys_nested 0.1980ms 0.1320ms 7.5765 KOps/s 7.5985 KOps/s $\color{#d91a1a}-0.29\%$
test_keys_nested_locked 1.9953ms 0.1416ms 7.0636 KOps/s 7.1051 KOps/s $\color{#d91a1a}-0.58\%$
test_keys_nested_leaf 0.1839ms 0.1221ms 8.1902 KOps/s 8.2239 KOps/s $\color{#d91a1a}-0.41\%$
test_keys_stack_nested 0.1670ms 0.1312ms 7.6229 KOps/s 7.6003 KOps/s $\color{#35bf28}+0.30\%$
test_keys_stack_nested_leaf 0.1499ms 0.1221ms 8.1922 KOps/s 8.1470 KOps/s $\color{#35bf28}+0.55\%$
test_keys_stack_nested_locked 0.1688ms 0.1405ms 7.1153 KOps/s 7.1142 KOps/s $\color{#35bf28}+0.01\%$
test_values 5.8280μs 1.0374μs 963.9323 KOps/s 970.7805 KOps/s $\color{#d91a1a}-0.71\%$
test_values_nested 83.9210μs 54.1491μs 18.4675 KOps/s 18.6893 KOps/s $\color{#d91a1a}-1.19\%$
test_values_nested_locked 80.0620μs 57.2992μs 17.4523 KOps/s 17.6347 KOps/s $\color{#d91a1a}-1.03\%$
test_values_nested_leaf 85.8010μs 61.1878μs 16.3431 KOps/s 16.4695 KOps/s $\color{#d91a1a}-0.77\%$
test_values_stack_nested 93.8110μs 53.8462μs 18.5714 KOps/s 18.7211 KOps/s $\color{#d91a1a}-0.80\%$
test_values_stack_nested_leaf 0.1419ms 61.5948μs 16.2351 KOps/s 16.3494 KOps/s $\color{#d91a1a}-0.70\%$
test_values_stack_nested_locked 96.5720μs 57.3054μs 17.4504 KOps/s 17.5155 KOps/s $\color{#d91a1a}-0.37\%$
test_membership 6.3635μs 0.8662μs 1.1544 MOps/s 1.1847 MOps/s $\color{#d91a1a}-2.56\%$
test_membership_nested 32.4810μs 2.9825μs 335.2856 KOps/s 343.8662 KOps/s $\color{#d91a1a}-2.50\%$
test_membership_nested_leaf 66.5720μs 2.9871μs 334.7706 KOps/s 343.1503 KOps/s $\color{#d91a1a}-2.44\%$
test_membership_stacked_nested 34.6400μs 2.9246μs 341.9248 KOps/s 341.6643 KOps/s $\color{#35bf28}+0.08\%$
test_membership_stacked_nested_leaf 21.3810μs 2.9498μs 339.0114 KOps/s 344.3124 KOps/s $\color{#d91a1a}-1.54\%$
test_membership_nested_last 40.3510μs 4.4311μs 225.6768 KOps/s 227.0283 KOps/s $\color{#d91a1a}-0.60\%$
test_membership_nested_leaf_last 35.4610μs 4.4412μs 225.1644 KOps/s 229.0837 KOps/s $\color{#d91a1a}-1.71\%$
test_membership_stacked_nested_last 43.2110μs 4.4026μs 227.1365 KOps/s 229.2578 KOps/s $\color{#d91a1a}-0.93\%$
test_membership_stacked_nested_leaf_last 26.4600μs 4.3903μs 227.7729 KOps/s 231.3810 KOps/s $\color{#d91a1a}-1.56\%$
test_nested_getleaf 56.5110μs 21.7565μs 45.9633 KOps/s 46.9593 KOps/s $\color{#d91a1a}-2.12\%$
test_nested_get 46.7210μs 20.7477μs 48.1981 KOps/s 48.0656 KOps/s $\color{#35bf28}+0.28\%$
test_stacked_getleaf 54.2710μs 21.5849μs 46.3287 KOps/s 46.7389 KOps/s $\color{#d91a1a}-0.88\%$
test_stacked_get 54.4910μs 20.7693μs 48.1480 KOps/s 48.8391 KOps/s $\color{#d91a1a}-1.42\%$
test_nested_getitemleaf 59.1720μs 22.4359μs 44.5715 KOps/s 45.7240 KOps/s $\color{#d91a1a}-2.52\%$
test_nested_getitem 49.9310μs 20.9883μs 47.6455 KOps/s 48.2107 KOps/s $\color{#d91a1a}-1.17\%$
test_stacked_getitemleaf 48.7710μs 22.2896μs 44.8641 KOps/s 45.4545 KOps/s $\color{#d91a1a}-1.30\%$
test_stacked_getitem 52.4020μs 21.0130μs 47.5895 KOps/s 48.7947 KOps/s $\color{#d91a1a}-2.47\%$
test_lock_nested 0.5463ms 0.4827ms 2.0716 KOps/s 2.0980 KOps/s $\color{#d91a1a}-1.26\%$
test_lock_stack_nested 0.5796ms 0.4838ms 2.0669 KOps/s 2.0649 KOps/s $\color{#35bf28}+0.10\%$
test_unlock_nested 0.4572ms 0.3930ms 2.5443 KOps/s 2.5637 KOps/s $\color{#d91a1a}-0.75\%$
test_unlock_stack_nested 0.4292ms 0.3917ms 2.5528 KOps/s 2.5376 KOps/s $\color{#35bf28}+0.60\%$
test_flatten_speed 0.1531ms 0.1232ms 8.1190 KOps/s 8.1090 KOps/s $\color{#35bf28}+0.12\%$
test_unflatten_speed 0.6326ms 0.5855ms 1.7080 KOps/s 1.7285 KOps/s $\color{#d91a1a}-1.19\%$
test_common_ops 0.8364ms 0.7028ms 1.4228 KOps/s 1.4201 KOps/s $\color{#35bf28}+0.19\%$
test_creation 92.7220μs 3.0396μs 328.9925 KOps/s 319.0697 KOps/s $\color{#35bf28}+3.11\%$
test_creation_empty 27.8100μs 6.9935μs 142.9908 KOps/s 144.0987 KOps/s $\color{#d91a1a}-0.77\%$
test_creation_nested_1 34.5510μs 11.6727μs 85.6699 KOps/s 86.5848 KOps/s $\color{#d91a1a}-1.06\%$
test_creation_nested_2 38.8000μs 13.3228μs 75.0594 KOps/s 74.6021 KOps/s $\color{#35bf28}+0.61\%$
test_creation_many_keys[10] 50.7310μs 21.0395μs 47.5297 KOps/s 48.0968 KOps/s $\color{#d91a1a}-1.18\%$
test_creation_many_keys[50] 0.1209ms 90.2223μs 11.0837 KOps/s 11.1251 KOps/s $\color{#d91a1a}-0.37\%$
test_creation_many_keys[100] 0.2061ms 0.1750ms 5.7149 KOps/s 5.6420 KOps/s $\color{#35bf28}+1.29\%$
test_creation_nested_many_keys[10] 84.0720μs 44.9793μs 22.2325 KOps/s 22.3232 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_nested_many_keys[50] 0.2261ms 0.1834ms 5.4528 KOps/s 5.4002 KOps/s $\color{#35bf28}+0.97\%$
test_clone 43.7610μs 13.5151μs 73.9914 KOps/s 74.1943 KOps/s $\color{#d91a1a}-0.27\%$
test_getitem[int] 1.5612ms 15.0222μs 66.5680 KOps/s 59.3741 KOps/s $\textbf{\color{#35bf28}+12.12\%}$
test_getitem[slice_int] 0.1327ms 24.1744μs 41.3660 KOps/s 41.1236 KOps/s $\color{#35bf28}+0.59\%$
test_getitem[range] 0.1753ms 64.0474μs 15.6134 KOps/s 15.6341 KOps/s $\color{#d91a1a}-0.13\%$
test_getitem[tuple] 0.1383ms 23.7951μs 42.0254 KOps/s 41.4073 KOps/s $\color{#35bf28}+1.49\%$
test_getitem[list] 0.1849ms 59.9424μs 16.6827 KOps/s 17.0034 KOps/s $\color{#d91a1a}-1.89\%$
test_setitem_dim[int] 48.2710μs 26.0197μs 38.4324 KOps/s 38.0077 KOps/s $\color{#35bf28}+1.12\%$
test_setitem_dim[slice_int] 65.5510μs 43.0457μs 23.2311 KOps/s 22.7716 KOps/s $\color{#35bf28}+2.02\%$
test_setitem_dim[range] 0.1222ms 95.8264μs 10.4355 KOps/s 10.5173 KOps/s $\color{#d91a1a}-0.78\%$
test_setitem_dim[tuple] 62.9110μs 39.7554μs 25.1538 KOps/s 24.2820 KOps/s $\color{#35bf28}+3.59\%$
test_setitem 49.3610μs 17.8660μs 55.9723 KOps/s 55.7887 KOps/s $\color{#35bf28}+0.33\%$
test_set 43.8210μs 17.0892μs 58.5164 KOps/s 58.2053 KOps/s $\color{#35bf28}+0.53\%$
test_set_shared 0.5002ms 0.2051ms 4.8745 KOps/s 4.9014 KOps/s $\color{#d91a1a}-0.55\%$
test_update 0.3330ms 21.7580μs 45.9602 KOps/s 45.9180 KOps/s $\color{#35bf28}+0.09\%$
test_update_nested 66.3510μs 33.5549μs 29.8019 KOps/s 29.6310 KOps/s $\color{#35bf28}+0.58\%$
test_update__nested 0.4519ms 34.3032μs 29.1518 KOps/s 28.7383 KOps/s $\color{#35bf28}+1.44\%$
test_set_nested 49.4410μs 18.9229μs 52.8459 KOps/s 52.6048 KOps/s $\color{#35bf28}+0.46\%$
test_set_nested_new 63.8120μs 24.2971μs 41.1572 KOps/s 41.4499 KOps/s $\color{#d91a1a}-0.71\%$
test_select 78.9120μs 41.3308μs 24.1951 KOps/s 24.2355 KOps/s $\color{#d91a1a}-0.17\%$
test_select_nested 0.1113ms 74.4233μs 13.4367 KOps/s 13.2320 KOps/s $\color{#35bf28}+1.55\%$
test_exclude_nested 0.1283ms 92.7921μs 10.7768 KOps/s 10.7313 KOps/s $\color{#35bf28}+0.42\%$
test_empty[True] 0.4570ms 0.4023ms 2.4855 KOps/s 2.4852 KOps/s $\color{#35bf28}+0.02\%$
test_empty[False] 9.5377μs 1.3160μs 759.8712 KOps/s 758.8738 KOps/s $\color{#35bf28}+0.13\%$
test_to 0.1057ms 71.4464μs 13.9965 KOps/s 13.5615 KOps/s $\color{#35bf28}+3.21\%$
test_to_nonblocking 0.1039ms 64.8198μs 15.4274 KOps/s 15.3954 KOps/s $\color{#35bf28}+0.21\%$
test_unbind_speed 0.3708ms 0.3330ms 3.0026 KOps/s 2.9906 KOps/s $\color{#35bf28}+0.40\%$
test_unbind_speed_stack0 0.4152ms 0.3331ms 3.0024 KOps/s 3.0098 KOps/s $\color{#d91a1a}-0.25\%$
test_unbind_speed_stack1 0.1038s 0.8458ms 1.1823 KOps/s 1.1741 KOps/s $\color{#35bf28}+0.70\%$
test_split 0.1033s 1.2698ms 787.5393 Ops/s 782.3303 Ops/s $\color{#35bf28}+0.67\%$
test_chunk 0.1036s 1.2154ms 822.7732 Ops/s 922.4359 Ops/s $\textbf{\color{#d91a1a}-10.80\%}$
test_to_cpu_blocking 19.4731ms 19.3896ms 51.5740 Ops/s 35.1867 Ops/s $\textbf{\color{#35bf28}+46.57\%}$
test_to_cpu_global_sync 11.5389ms 11.4543ms 87.3032 Ops/s 78.9137 Ops/s $\textbf{\color{#35bf28}+10.63\%}$
test_to_cpu_event_sync 12.7311ms 12.4365ms 80.4082 Ops/s 80.6696 Ops/s $\color{#d91a1a}-0.32\%$
test_to_cpu_default 0.1155s 13.7389ms 72.7861 Ops/s 80.6910 Ops/s $\textbf{\color{#d91a1a}-9.80\%}$
test_consolidate[False-None] 4.2299ms 4.1610ms 240.3281 Ops/s 216.8407 Ops/s $\textbf{\color{#35bf28}+10.83\%}$
test_consolidate[default-None] 2.1527ms 2.0072ms 498.2046 Ops/s 487.1938 Ops/s $\color{#35bf28}+2.26\%$
test_consolidate[reduce-overhead-None] 2.0092ms 1.9125ms 522.8664 Ops/s 507.9723 Ops/s $\color{#35bf28}+2.93\%$
test_consolidate_njt[False-None] 8.8629ms 8.6190ms 116.0224 Ops/s 117.6604 Ops/s $\color{#d91a1a}-1.39\%$
test_to[False-False-None] 2.2964ms 2.1104ms 473.8402 Ops/s 470.6205 Ops/s $\color{#35bf28}+0.68\%$
test_to[True-False-None] 2.2513ms 1.9376ms 516.1085 Ops/s 511.7222 Ops/s $\color{#35bf28}+0.86\%$
test_to[within-False-None] 6.2786ms 6.1682ms 162.1229 Ops/s 162.6628 Ops/s $\color{#d91a1a}-0.33\%$
test_to[True-default-None] 9.3031ms 8.9433ms 111.8152 Ops/s 110.5601 Ops/s $\color{#35bf28}+1.14\%$
test_to_njt[False-False-None] 8.6230ms 8.5094ms 117.5173 Ops/s 117.8506 Ops/s $\color{#d91a1a}-0.28\%$
test_to_njt[True-False-None] 7.3353ms 6.9771ms 143.3266 Ops/s 143.6131 Ops/s $\color{#d91a1a}-0.20\%$
test_to_njt[within-False-None] 15.9085ms 15.6804ms 63.7737 Ops/s 64.1588 Ops/s $\color{#d91a1a}-0.60\%$
test_creation[device0] 0.3839ms 0.1139ms 8.7815 KOps/s 8.7002 KOps/s $\color{#35bf28}+0.93\%$
test_creation_from_tensor 0.4129ms 0.1115ms 8.9667 KOps/s 8.5047 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_add_one[memmap_tensor0] 0.2939ms 6.6013μs 151.4864 KOps/s 150.9784 KOps/s $\color{#35bf28}+0.34\%$
test_contiguous[memmap_tensor0] 26.3200μs 0.6761μs 1.4791 MOps/s 1.9012 MOps/s $\textbf{\color{#d91a1a}-22.20\%}$
test_stack[memmap_tensor0] 33.3110μs 4.6485μs 215.1223 KOps/s 220.8156 KOps/s $\color{#d91a1a}-2.58\%$
test_memmaptd_index 1.1468ms 0.2720ms 3.6771 KOps/s 3.7740 KOps/s $\color{#d91a1a}-2.57\%$
test_memmaptd_index_astensor 0.5318ms 0.3775ms 2.6489 KOps/s 2.7067 KOps/s $\color{#d91a1a}-2.14\%$
test_memmaptd_index_op 0.8914ms 0.6246ms 1.6009 KOps/s 1.6053 KOps/s $\color{#d91a1a}-0.27\%$
test_serialize_model 0.3161s 0.1657s 6.0333 Ops/s 7.3102 Ops/s $\textbf{\color{#d91a1a}-17.47\%}$
test_serialize_model_pickle 2.1095s 1.3626s 0.7339 Ops/s 0.8254 Ops/s $\textbf{\color{#d91a1a}-11.08\%}$
test_serialize_weights 0.1391s 0.1368s 7.3085 Ops/s 7.3401 Ops/s $\color{#d91a1a}-0.43\%$
test_serialize_weights_returnearly 0.4557s 93.6088ms 10.6828 Ops/s 10.3950 Ops/s $\color{#35bf28}+2.77\%$
test_serialize_weights_pickle 1.3775s 1.2180s 0.8210 Ops/s 0.8186 Ops/s $\color{#35bf28}+0.29\%$
test_reshape_pytree 0.2076ms 33.0853μs 30.2249 KOps/s 30.7492 KOps/s $\color{#d91a1a}-1.70\%$
test_reshape_td 0.2161ms 45.9692μs 21.7537 KOps/s 22.2743 KOps/s $\color{#d91a1a}-2.34\%$
test_view_pytree 0.2207ms 32.7049μs 30.5764 KOps/s 30.8383 KOps/s $\color{#d91a1a}-0.85\%$
test_view_td 93.1220μs 53.6069μs 18.6543 KOps/s 18.0148 KOps/s $\color{#35bf28}+3.55\%$
test_unbind_pytree 0.2440ms 36.7841μs 27.1856 KOps/s 27.6223 KOps/s $\color{#d91a1a}-1.58\%$
test_unbind_td 0.1026ms 50.3707μs 19.8528 KOps/s 19.8542 KOps/s $-0.01\%$
test_split_pytree 0.2494ms 42.7075μs 23.4151 KOps/s 23.3278 KOps/s $\color{#35bf28}+0.37\%$
test_split_td 0.1764ms 64.2355μs 15.5677 KOps/s 15.3542 KOps/s $\color{#35bf28}+1.39\%$
test_add_pytree 0.2346ms 44.7264μs 22.3582 KOps/s 23.4353 KOps/s $\color{#d91a1a}-4.60\%$
test_add_td 97.5120μs 59.8405μs 16.7111 KOps/s 17.6940 KOps/s $\textbf{\color{#d91a1a}-5.55\%}$
test_compile_add_one_nested[tensordict-compile] 0.2298ms 0.1456ms 6.8683 KOps/s 6.6940 KOps/s $\color{#35bf28}+2.60\%$
test_compile_add_one_nested[tensordict-eager] 0.3070ms 0.2037ms 4.9100 KOps/s 5.0094 KOps/s $\color{#d91a1a}-1.98\%$
test_compile_add_one_nested[pytree-compile] 0.1809ms 0.1085ms 9.2195 KOps/s 9.1408 KOps/s $\color{#35bf28}+0.86\%$
test_compile_add_one_nested[pytree-eager] 0.4350ms 0.1806ms 5.5378 KOps/s 5.5728 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_copy_nested[tensordict-compile] 0.5438ms 10.7770μs 92.7906 KOps/s 97.0721 KOps/s $\color{#d91a1a}-4.41\%$
test_compile_copy_nested[tensordict-eager] 0.1389ms 54.8017μs 18.2476 KOps/s 18.5604 KOps/s $\color{#d91a1a}-1.69\%$
test_compile_copy_nested[pytree-compile] 0.1468ms 9.7741μs 102.3110 KOps/s 102.5887 KOps/s $\color{#d91a1a}-0.27\%$
test_compile_copy_nested[pytree-eager] 0.4648ms 70.2915μs 14.2265 KOps/s 14.6184 KOps/s $\color{#d91a1a}-2.68\%$
test_compile_add_one_flat[tensordict-compile] 0.2396ms 0.1748ms 5.7217 KOps/s 5.4352 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_compile_add_one_flat[tensordict-eager] 0.4062ms 0.2825ms 3.5401 KOps/s 3.5559 KOps/s $\color{#d91a1a}-0.45\%$
test_compile_add_one_flat[tensorclass-compile] 0.2572ms 0.1163ms 8.5982 KOps/s 8.3570 KOps/s $\color{#35bf28}+2.89\%$
test_compile_add_one_flat[tensorclass-eager] 0.1186ms 73.0184μs 13.6952 KOps/s 13.3137 KOps/s $\color{#35bf28}+2.87\%$
test_compile_add_one_flat[pytree-compile] 0.2058ms 0.1570ms 6.3705 KOps/s 6.1541 KOps/s $\color{#35bf28}+3.52\%$
test_compile_add_one_flat[pytree-eager] 0.8704ms 0.5222ms 1.9149 KOps/s 1.8938 KOps/s $\color{#35bf28}+1.12\%$
test_compile_add_self_flat[tensordict-eager] 0.5103ms 0.3364ms 2.9724 KOps/s 2.9605 KOps/s $\color{#35bf28}+0.40\%$
test_compile_add_self_flat[tensordict-compile] 0.3392ms 0.1784ms 5.6057 KOps/s 5.2271 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_compile_add_self_flat[tensorclass-eager] 0.5512ms 91.3722μs 10.9442 KOps/s 11.0511 KOps/s $\color{#d91a1a}-0.97\%$
test_compile_add_self_flat[tensorclass-compile] 0.6031ms 0.1180ms 8.4780 KOps/s 8.0530 KOps/s $\textbf{\color{#35bf28}+5.28\%}$
test_compile_add_self_flat[pytree-eager] 0.9203ms 0.4292ms 2.3300 KOps/s 2.2828 KOps/s $\color{#35bf28}+2.07\%$
test_compile_add_self_flat[pytree-compile] 0.2153ms 0.1637ms 6.1095 KOps/s 6.1582 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_copy_flat[tensordict-compile] 76.2110μs 13.6038μs 73.5087 KOps/s 75.0175 KOps/s $\color{#d91a1a}-2.01\%$
test_compile_copy_flat[tensordict-eager] 79.3510μs 42.2465μs 23.6706 KOps/s 23.6349 KOps/s $\color{#35bf28}+0.15\%$
test_compile_copy_flat[pytree-compile] 89.9720μs 10.8265μs 92.3661 KOps/s 91.2390 KOps/s $\color{#35bf28}+1.24\%$
test_compile_copy_flat[pytree-eager] 0.4120ms 52.5714μs 19.0217 KOps/s 18.9061 KOps/s $\color{#35bf28}+0.61\%$
test_compile_assign_and_add[tensordict-compile] 2.0444ms 0.1744ms 5.7353 KOps/s 5.4847 KOps/s $\color{#35bf28}+4.57\%$
test_compile_assign_and_add[tensordict-eager] 3.7626ms 3.3156ms 301.6008 Ops/s 298.7962 Ops/s $\color{#35bf28}+0.94\%$
test_compile_assign_and_add[pytree-compile] 2.0568ms 0.1637ms 6.1105 KOps/s 6.0612 KOps/s $\color{#35bf28}+0.81\%$
test_compile_assign_and_add[pytree-eager] 3.0112ms 2.7957ms 357.6930 Ops/s 358.2589 Ops/s $\color{#d91a1a}-0.16\%$
test_compile_indexing[tensor-tensordict-compile] 0.2316ms 0.1083ms 9.2368 KOps/s 8.7653 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_compile_indexing[tensor-tensordict-eager] 0.5292ms 73.8997μs 13.5318 KOps/s 13.4443 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1357ms 95.3919μs 10.4831 KOps/s 10.2594 KOps/s $\color{#35bf28}+2.18\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2564ms 44.5166μs 22.4635 KOps/s 22.4871 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_indexing[tensor-pytree-compile] 0.2646ms 96.0234μs 10.4141 KOps/s 10.2069 KOps/s $\color{#35bf28}+2.03\%$
test_compile_indexing[tensor-pytree-eager] 0.2468ms 44.2655μs 22.5909 KOps/s 22.3824 KOps/s $\color{#35bf28}+0.93\%$
test_compile_indexing[slice-tensordict-compile] 0.2152ms 56.6530μs 17.6513 KOps/s 17.3836 KOps/s $\color{#35bf28}+1.54\%$
test_compile_indexing[slice-tensordict-eager] 0.2217ms 27.5752μs 36.2645 KOps/s 35.3967 KOps/s $\color{#35bf28}+2.45\%$
test_compile_indexing[slice-tensorclass-compile] 0.1529ms 44.4304μs 22.5071 KOps/s 22.0552 KOps/s $\color{#35bf28}+2.05\%$
test_compile_indexing[slice-tensorclass-eager] 0.2389ms 22.6116μs 44.2251 KOps/s 44.3341 KOps/s $\color{#d91a1a}-0.25\%$
test_compile_indexing[slice-pytree-compile] 83.3710μs 45.6824μs 21.8903 KOps/s 21.7985 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[slice-pytree-eager] 0.2678ms 22.5748μs 44.2971 KOps/s 44.3017 KOps/s $\color{#d91a1a}-0.01\%$
test_compile_indexing[int-tensordict-compile] 96.6020μs 57.6537μs 17.3450 KOps/s 17.2775 KOps/s $\color{#35bf28}+0.39\%$
test_compile_indexing[int-tensordict-eager] 0.2404ms 27.5627μs 36.2809 KOps/s 36.0228 KOps/s $\color{#35bf28}+0.72\%$
test_compile_indexing[int-tensorclass-compile] 87.0420μs 45.3031μs 22.0736 KOps/s 22.0020 KOps/s $\color{#35bf28}+0.32\%$
test_compile_indexing[int-tensorclass-eager] 0.2612ms 22.7525μs 43.9513 KOps/s 44.4379 KOps/s $\color{#d91a1a}-1.10\%$
test_compile_indexing[int-pytree-compile] 76.8710μs 45.5995μs 21.9301 KOps/s 21.8727 KOps/s $\color{#35bf28}+0.26\%$
test_compile_indexing[int-pytree-eager] 0.1950ms 22.4864μs 44.4713 KOps/s 44.4217 KOps/s $\color{#35bf28}+0.11\%$
test_compile_replace[single-eager] 89.6120μs 47.0910μs 21.2355 KOps/s 20.9408 KOps/s $\color{#35bf28}+1.41\%$
test_compile_replace[single-compile] 0.1845ms 0.1045ms 9.5686 KOps/s 9.3573 KOps/s $\color{#35bf28}+2.26\%$
test_compile_replace[multi-eager] 0.6523ms 0.5659ms 1.7670 KOps/s 1.7610 KOps/s $\color{#35bf28}+0.34\%$
test_compile_replace[multi-compile] 0.2462ms 0.1150ms 8.6924 KOps/s 8.7640 KOps/s $\color{#d91a1a}-0.82\%$
test_compile_tc_getattr_20[eager] 0.2114ms 0.1658ms 6.0321 KOps/s 6.0138 KOps/s $\color{#35bf28}+0.31\%$
test_compile_tc_getattr_20[compile] 0.4606ms 0.1183ms 8.4502 KOps/s 8.3246 KOps/s $\color{#35bf28}+1.51\%$
test_compile_clone_shallow[20-eager] 40.8600μs 19.9150μs 50.2134 KOps/s 51.2542 KOps/s $\color{#d91a1a}-2.03\%$
test_compile_clone_shallow[20-compile] 0.1131ms 11.5656μs 86.4629 KOps/s 88.0743 KOps/s $\color{#d91a1a}-1.83\%$
test_compile_clone_shallow[40-eager] 55.3110μs 34.9478μs 28.6141 KOps/s 29.1445 KOps/s $\color{#d91a1a}-1.82\%$
test_compile_clone_shallow[40-compile] 67.4310μs 12.9507μs 77.2158 KOps/s 65.5660 KOps/s $\textbf{\color{#35bf28}+17.77\%}$
test_compile_clone_shallow[80-eager] 93.1620μs 64.7994μs 15.4322 KOps/s 15.8322 KOps/s $\color{#d91a1a}-2.53\%$
test_compile_clone_shallow[80-compile] 0.1338ms 15.1225μs 66.1264 KOps/s 67.2489 KOps/s $\color{#d91a1a}-1.67\%$
test_compile_update_inplace[eager] 95.8420μs 60.4709μs 16.5369 KOps/s 16.6286 KOps/s $\color{#d91a1a}-0.55\%$
test_compile_update_inplace[compile] 0.2629ms 0.1387ms 7.2076 KOps/s 7.0666 KOps/s $\color{#35bf28}+1.99\%$
test_mod_add[eager] 91.3710μs 48.7276μs 20.5222 KOps/s 20.3061 KOps/s $\color{#35bf28}+1.06\%$
test_mod_add[compile] 0.2022ms 0.1033ms 9.6842 KOps/s 9.2906 KOps/s $\color{#35bf28}+4.24\%$
test_mod_add[compile-overhead] 0.3413ms 0.1493ms 6.6989 KOps/s 6.5525 KOps/s $\color{#35bf28}+2.24\%$
test_mod_wrap[eager] 0.3880ms 0.2872ms 3.4825 KOps/s 3.4268 KOps/s $\color{#35bf28}+1.62\%$
test_mod_wrap[compile] 0.4396ms 0.3537ms 2.8274 KOps/s 2.7086 KOps/s $\color{#35bf28}+4.39\%$
test_mod_wrap[compile-overhead] 7.5076ms 4.0493ms 246.9581 Ops/s 250.2105 Ops/s $\color{#d91a1a}-1.30\%$
test_mod_wrap_and_backward[eager] 1.6183ms 1.4827ms 674.4373 Ops/s 669.6368 Ops/s $\color{#35bf28}+0.72\%$
test_mod_wrap_and_backward[compile] 1.6568ms 1.4443ms 692.3622 Ops/s 688.1573 Ops/s $\color{#35bf28}+0.61\%$
test_mod_wrap_and_backward[compile-overhead] 1.2718ms 0.8830ms 1.1326 KOps/s 1.1043 KOps/s $\color{#35bf28}+2.56\%$
test_seq_add[eager] 0.2408ms 0.1602ms 6.2418 KOps/s 6.4741 KOps/s $\color{#d91a1a}-3.59\%$
test_seq_add[compile] 0.5721ms 0.1135ms 8.8126 KOps/s 8.4787 KOps/s $\color{#35bf28}+3.94\%$
test_seq_add[compile-overhead] 0.2295ms 0.1593ms 6.2758 KOps/s 6.3060 KOps/s $\color{#d91a1a}-0.48\%$
test_seq_wrap[eager] 0.6669ms 0.5364ms 1.8644 KOps/s 1.9247 KOps/s $\color{#d91a1a}-3.13\%$
test_seq_wrap[compile] 0.4691ms 0.3648ms 2.7409 KOps/s 2.7263 KOps/s $\color{#35bf28}+0.53\%$
test_seq_wrap[compile-overhead] 0.3259ms 0.2657ms 3.7630 KOps/s 3.6363 KOps/s $\color{#35bf28}+3.48\%$
test_func_call_runtime[False-eager] 0.9708ms 0.8361ms 1.1961 KOps/s 1.1656 KOps/s $\color{#35bf28}+2.61\%$
test_func_call_runtime[False-compile] 1.1015ms 0.9067ms 1.1029 KOps/s 1.0932 KOps/s $\color{#35bf28}+0.88\%$
test_func_call_runtime[False-compile-overhead] 0.5274ms 0.4650ms 2.1505 KOps/s 2.1470 KOps/s $\color{#35bf28}+0.16\%$
test_func_call_runtime[True-eager] 1.2911ms 1.0692ms 935.2462 Ops/s 932.8066 Ops/s $\color{#35bf28}+0.26\%$
test_func_call_runtime[True-compile] 1.0133ms 0.9228ms 1.0837 KOps/s 1.0442 KOps/s $\color{#35bf28}+3.78\%$
test_func_call_runtime[True-compile-overhead] 0.5513ms 0.4758ms 2.1016 KOps/s 2.0688 KOps/s $\color{#35bf28}+1.58\%$
test_func_call_cm_runtime[False-eager] 1.0179ms 0.8808ms 1.1354 KOps/s 1.1948 KOps/s $\color{#d91a1a}-4.97\%$
test_func_call_cm_runtime[False-compile] 1.0342ms 0.9261ms 1.0798 KOps/s 1.0873 KOps/s $\color{#d91a1a}-0.69\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5199ms 0.4644ms 2.1532 KOps/s 2.1385 KOps/s $\color{#35bf28}+0.68\%$
test_func_call_cm_runtime[True-eager] 1.3037ms 1.2114ms 825.5106 Ops/s 811.2490 Ops/s $\color{#35bf28}+1.76\%$
test_func_call_cm_runtime[True-compile] 1.3048ms 0.9904ms 1.0097 KOps/s 1.0356 KOps/s $\color{#d91a1a}-2.50\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5713ms 0.5107ms 1.9581 KOps/s 1.9458 KOps/s $\color{#35bf28}+0.63\%$
test_vmap_func_call_cm_runtime[eager] 2.8511ms 2.3599ms 423.7469 Ops/s 421.5673 Ops/s $\color{#35bf28}+0.52\%$
test_vmap_func_call_cm_runtime[compile] 1.1068ms 0.9785ms 1.0220 KOps/s 1.0099 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.6429ms 0.5161ms 1.9376 KOps/s 1.9212 KOps/s $\color{#35bf28}+0.85\%$
test_distributed 0.6426ms 0.1524ms 6.5618 KOps/s 6.4797 KOps/s $\color{#35bf28}+1.27\%$
test_tdmodule 0.5126ms 29.5166μs 33.8792 KOps/s 35.6232 KOps/s $\color{#d91a1a}-4.90\%$
test_tdmodule_dispatch 83.2020μs 48.2500μs 20.7254 KOps/s 21.7412 KOps/s $\color{#d91a1a}-4.67\%$
test_tdseq 42.0910μs 29.0220μs 34.4567 KOps/s 36.9620 KOps/s $\textbf{\color{#d91a1a}-6.78\%}$
test_tdseq_dispatch 71.2420μs 50.8562μs 19.6633 KOps/s 20.6630 KOps/s $\color{#d91a1a}-4.84\%$
test_instantiation_functorch 2.1724ms 2.0915ms 478.1368 Ops/s 474.1385 Ops/s $\color{#35bf28}+0.84\%$
test_exec_functorch 0.3045ms 0.1793ms 5.5758 KOps/s 5.5400 KOps/s $\color{#35bf28}+0.65\%$
test_exec_functional_call 0.6110ms 0.1617ms 6.1851 KOps/s 6.2255 KOps/s $\color{#d91a1a}-0.65\%$
test_exec_td_decorator 0.4468ms 0.2398ms 4.1704 KOps/s 4.2321 KOps/s $\color{#d91a1a}-1.46\%$
test_vmap_mlp_speed_decorator[True-True] 1.2508ms 0.8288ms 1.2065 KOps/s 1.2046 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_mlp_speed_decorator[True-False] 1.2691ms 0.8228ms 1.2154 KOps/s 1.2000 KOps/s $\color{#35bf28}+1.29\%$
test_vmap_mlp_speed_decorator[False-True] 1.1922ms 0.7082ms 1.4120 KOps/s 1.3905 KOps/s $\color{#35bf28}+1.55\%$
test_vmap_mlp_speed_decorator[False-False] 1.1442ms 0.7079ms 1.4126 KOps/s 1.3877 KOps/s $\color{#35bf28}+1.80\%$
test_vmap_transformer_speed_decorator[True-True] 21.0638ms 20.3833ms 49.0597 Ops/s 48.6559 Ops/s $\color{#35bf28}+0.83\%$
test_vmap_transformer_speed_decorator[True-False] 21.0182ms 20.4122ms 48.9904 Ops/s 48.7575 Ops/s $\color{#35bf28}+0.48\%$
test_vmap_transformer_speed_decorator[False-True] 20.5852ms 20.1750ms 49.5663 Ops/s 49.1742 Ops/s $\color{#35bf28}+0.80\%$
test_vmap_transformer_speed_decorator[False-False] 20.6188ms 20.2062ms 49.4897 Ops/s 49.3036 Ops/s $\color{#35bf28}+0.38\%$
test_to_module_speed[True] 1.9559ms 1.4829ms 674.3381 Ops/s 666.6685 Ops/s $\color{#35bf28}+1.15\%$
test_to_module_speed[False] 1.9000ms 1.4656ms 682.3000 Ops/s 677.0204 Ops/s $\color{#35bf28}+0.78\%$
test_tc_init 78.4720μs 44.1731μs 22.6382 KOps/s 21.6726 KOps/s $\color{#35bf28}+4.46\%$
test_tc_init_tensor_only 37.6610μs 9.8583μs 101.4375 KOps/s 101.1959 KOps/s $\color{#35bf28}+0.24\%$
test_tc_init_nested 0.1322ms 88.9981μs 11.2362 KOps/s 11.0806 KOps/s $\color{#35bf28}+1.40\%$
test_tc_init_many_fields 44.3810μs 16.5169μs 60.5440 KOps/s 60.6681 KOps/s $\color{#d91a1a}-0.20\%$
test_tc_first_layer_tensor 26.2910μs 1.7887μs 559.0527 KOps/s 547.0724 KOps/s $\color{#35bf28}+2.19\%$
test_tc_first_layer_tensor_only 1.6551μs 0.3980μs 2.5123 MOps/s 2.5551 MOps/s $\color{#d91a1a}-1.68\%$
test_tc_first_layer_tensor_set 38.2400μs 3.9215μs 255.0037 KOps/s 256.8133 KOps/s $\color{#d91a1a}-0.70\%$
test_tc_first_layer_tensor_only_set 29.4000μs 3.2961μs 303.3904 KOps/s 308.6595 KOps/s $\color{#d91a1a}-1.71\%$
test_tc_first_layer_nontensor 70.2310μs 6.1509μs 162.5782 KOps/s 162.6540 KOps/s $\color{#d91a1a}-0.05\%$
test_tc_second_layer_tensor 39.1210μs 4.3450μs 230.1505 KOps/s 229.2418 KOps/s $\color{#35bf28}+0.40\%$
test_tc_second_layer_nontensor 36.3710μs 8.6864μs 115.1230 KOps/s 115.0659 KOps/s $\color{#35bf28}+0.05\%$
test_unbind 0.2489s 14.2096ms 70.3751 Ops/s 65.2251 Ops/s $\textbf{\color{#35bf28}+7.90\%}$
test_full_like 7.4904ms 4.4217ms 226.1598 Ops/s 135.7072 Ops/s $\textbf{\color{#35bf28}+66.65\%}$
test_zeros_like 4.9139ms 4.3789ms 228.3656 Ops/s 227.7485 Ops/s $\color{#35bf28}+0.27\%$
test_ones_like 5.0021ms 4.3989ms 227.3313 Ops/s 227.4915 Ops/s $\color{#d91a1a}-0.07\%$
test_clone 6.8466ms 6.5937ms 151.6608 Ops/s 151.8710 Ops/s $\color{#d91a1a}-0.14\%$
test_squeeze 0.1593ms 14.5028μs 68.9521 KOps/s 66.9965 KOps/s $\color{#35bf28}+2.92\%$
test_unsqueeze 0.2695ms 0.1120ms 8.9259 KOps/s 8.5408 KOps/s $\color{#35bf28}+4.51\%$
test_split 0.2919ms 0.1861ms 5.3748 KOps/s 5.0122 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_permute 0.6461ms 0.2048ms 4.8820 KOps/s 4.7573 KOps/s $\color{#35bf28}+2.62\%$
test_stack 52.5094ms 51.6754ms 19.3516 Ops/s 19.3453 Ops/s $\color{#35bf28}+0.03\%$
test_cat 52.1261ms 51.5872ms 19.3846 Ops/s 19.3896 Ops/s $\color{#d91a1a}-0.03\%$
test_sequential_tensordict 0.2946ms 0.2239ms 4.4658 KOps/s 4.5153 KOps/s $\color{#d91a1a}-1.10\%$
test_sequential_graph_module 0.2683ms 0.1208ms 8.2798 KOps/s 8.1561 KOps/s $\color{#35bf28}+1.52\%$
test_nested_tensordict 0.4667ms 0.3001ms 3.3321 KOps/s 3.4455 KOps/s $\color{#d91a1a}-3.29\%$
test_nested_graph_module 0.1942ms 0.1361ms 7.3455 KOps/s 7.7342 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}19$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 70.0210μs 14.9025μs 67.1029 KOps/s 67.4927 KOps/s $\color{#d91a1a}-0.58\%$
test_plain_set_stack_nested 35.7600μs 15.3172μs 65.2860 KOps/s 65.5820 KOps/s $\color{#d91a1a}-0.45\%$
test_plain_set_nested_inplace 42.9400μs 16.7802μs 59.5942 KOps/s 59.9319 KOps/s $\color{#d91a1a}-0.56\%$
test_plain_set_stack_nested_inplace 49.2200μs 16.6127μs 60.1948 KOps/s 59.7499 KOps/s $\color{#35bf28}+0.74\%$
test_items 36.6400μs 5.9482μs 168.1174 KOps/s 164.9846 KOps/s $\color{#35bf28}+1.90\%$
test_items_nested 0.5273ms 0.4653ms 2.1492 KOps/s 2.1365 KOps/s $\color{#35bf28}+0.59\%$
test_items_nested_locked 0.5309ms 0.4690ms 2.1324 KOps/s 2.1475 KOps/s $\color{#d91a1a}-0.70\%$
test_items_nested_leaf 0.1198ms 97.6912μs 10.2363 KOps/s 10.1974 KOps/s $\color{#35bf28}+0.38\%$
test_items_stack_nested 0.5011ms 0.4633ms 2.1585 KOps/s 2.1546 KOps/s $\color{#35bf28}+0.18\%$
test_items_stack_nested_leaf 0.1353ms 99.2045μs 10.0802 KOps/s 10.1370 KOps/s $\color{#d91a1a}-0.56\%$
test_items_stack_nested_locked 0.5471ms 0.4699ms 2.1283 KOps/s 2.1427 KOps/s $\color{#d91a1a}-0.67\%$
test_keys 29.4000μs 4.2232μs 236.7860 KOps/s 236.3948 KOps/s $\color{#35bf28}+0.17\%$
test_keys_nested 0.1645ms 0.1299ms 7.6958 KOps/s 7.7237 KOps/s $\color{#d91a1a}-0.36\%$
test_keys_nested_locked 1.9065ms 0.1405ms 7.1157 KOps/s 7.2193 KOps/s $\color{#d91a1a}-1.44\%$
test_keys_nested_leaf 0.1595ms 0.1214ms 8.2347 KOps/s 8.3253 KOps/s $\color{#d91a1a}-1.09\%$
test_keys_stack_nested 0.1953ms 0.1306ms 7.6550 KOps/s 7.6775 KOps/s $\color{#d91a1a}-0.29\%$
test_keys_stack_nested_leaf 0.1549ms 0.1211ms 8.2568 KOps/s 8.3460 KOps/s $\color{#d91a1a}-1.07\%$
test_keys_stack_nested_locked 0.1754ms 0.1384ms 7.2253 KOps/s 7.2502 KOps/s $\color{#d91a1a}-0.34\%$
test_values 3.7371μs 0.9972μs 1.0028 MOps/s 976.3076 KOps/s $\color{#35bf28}+2.72\%$
test_values_nested 88.9110μs 52.8237μs 18.9309 KOps/s 19.1762 KOps/s $\color{#d91a1a}-1.28\%$
test_values_nested_locked 82.3510μs 56.4141μs 17.7261 KOps/s 17.9614 KOps/s $\color{#d91a1a}-1.31\%$
test_values_nested_leaf 87.0210μs 60.5944μs 16.5032 KOps/s 16.7238 KOps/s $\color{#d91a1a}-1.32\%$
test_values_stack_nested 84.9110μs 53.1956μs 18.7985 KOps/s 19.0615 KOps/s $\color{#d91a1a}-1.38\%$
test_values_stack_nested_leaf 92.9910μs 60.4435μs 16.5444 KOps/s 16.7065 KOps/s $\color{#d91a1a}-0.97\%$
test_values_stack_nested_locked 0.1267ms 56.4845μs 17.7040 KOps/s 18.0458 KOps/s $\color{#d91a1a}-1.89\%$
test_membership 5.3500μs 0.8638μs 1.1577 MOps/s 1.1816 MOps/s $\color{#d91a1a}-2.02\%$
test_membership_nested 32.9900μs 2.9360μs 340.6020 KOps/s 347.6101 KOps/s $\color{#d91a1a}-2.02\%$
test_membership_nested_leaf 30.2800μs 2.9432μs 339.7703 KOps/s 350.1143 KOps/s $\color{#d91a1a}-2.95\%$
test_membership_stacked_nested 39.6500μs 2.9388μs 340.2754 KOps/s 346.6530 KOps/s $\color{#d91a1a}-1.84\%$
test_membership_stacked_nested_leaf 31.5010μs 2.9112μs 343.4991 KOps/s 347.2106 KOps/s $\color{#d91a1a}-1.07\%$
test_membership_nested_last 34.7700μs 4.4077μs 226.8756 KOps/s 231.3285 KOps/s $\color{#d91a1a}-1.92\%$
test_membership_nested_leaf_last 39.6810μs 4.3454μs 230.1261 KOps/s 229.0440 KOps/s $\color{#35bf28}+0.47\%$
test_membership_stacked_nested_last 35.4610μs 4.4229μs 226.0983 KOps/s 229.5250 KOps/s $\color{#d91a1a}-1.49\%$
test_membership_stacked_nested_leaf_last 40.9700μs 4.3812μs 228.2475 KOps/s 229.9962 KOps/s $\color{#d91a1a}-0.76\%$
test_nested_getleaf 48.5410μs 21.9044μs 45.6530 KOps/s 46.1097 KOps/s $\color{#d91a1a}-0.99\%$
test_nested_get 71.6010μs 20.6680μs 48.3839 KOps/s 48.9018 KOps/s $\color{#d91a1a}-1.06\%$
test_stacked_getleaf 51.0000μs 21.6846μs 46.1157 KOps/s 46.2680 KOps/s $\color{#d91a1a}-0.33\%$
test_stacked_get 51.6800μs 20.4729μs 48.8451 KOps/s 48.7964 KOps/s $\color{#35bf28}+0.10\%$
test_nested_getitemleaf 53.3500μs 22.1884μs 45.0685 KOps/s 45.5723 KOps/s $\color{#d91a1a}-1.11\%$
test_nested_getitem 48.5310μs 20.9507μs 47.7310 KOps/s 47.9075 KOps/s $\color{#d91a1a}-0.37\%$
test_stacked_getitemleaf 41.2310μs 22.1553μs 45.1359 KOps/s 45.6772 KOps/s $\color{#d91a1a}-1.19\%$
test_stacked_getitem 62.3610μs 20.9663μs 47.6957 KOps/s 47.9379 KOps/s $\color{#d91a1a}-0.51\%$
test_lock_nested 0.5826ms 0.4784ms 2.0904 KOps/s 2.0770 KOps/s $\color{#35bf28}+0.64\%$
test_lock_stack_nested 0.5827ms 0.4833ms 2.0693 KOps/s 2.0499 KOps/s $\color{#35bf28}+0.94\%$
test_unlock_nested 0.5241ms 0.3898ms 2.5654 KOps/s 2.5701 KOps/s $\color{#d91a1a}-0.18\%$
test_unlock_stack_nested 0.4621ms 0.3908ms 2.5590 KOps/s 2.5341 KOps/s $\color{#35bf28}+0.98\%$
test_flatten_speed 0.1616ms 0.1213ms 8.2447 KOps/s 8.1240 KOps/s $\color{#35bf28}+1.49\%$
test_unflatten_speed 0.6181ms 0.5737ms 1.7430 KOps/s 1.7592 KOps/s $\color{#d91a1a}-0.92\%$
test_common_ops 0.8883ms 0.7084ms 1.4116 KOps/s 1.4271 KOps/s $\color{#d91a1a}-1.09\%$
test_creation 0.1058ms 3.1745μs 315.0125 KOps/s 317.0173 KOps/s $\color{#d91a1a}-0.63\%$
test_creation_empty 32.8300μs 6.9840μs 143.1842 KOps/s 141.7945 KOps/s $\color{#35bf28}+0.98\%$
test_creation_nested_1 40.4100μs 11.5269μs 86.7536 KOps/s 86.4803 KOps/s $\color{#35bf28}+0.32\%$
test_creation_nested_2 74.1310μs 13.2103μs 75.6983 KOps/s 74.3462 KOps/s $\color{#35bf28}+1.82\%$
test_creation_many_keys[10] 56.0700μs 21.1531μs 47.2744 KOps/s 47.4640 KOps/s $\color{#d91a1a}-0.40\%$
test_creation_many_keys[50] 0.1241ms 89.8037μs 11.1354 KOps/s 11.0934 KOps/s $\color{#35bf28}+0.38\%$
test_creation_many_keys[100] 0.2303ms 0.1768ms 5.6566 KOps/s 5.6100 KOps/s $\color{#35bf28}+0.83\%$
test_creation_nested_many_keys[10] 89.4110μs 45.0526μs 22.1963 KOps/s 22.2149 KOps/s $\color{#d91a1a}-0.08\%$
test_creation_nested_many_keys[50] 0.2543ms 0.1840ms 5.4334 KOps/s 5.4032 KOps/s $\color{#35bf28}+0.56\%$
test_clone 51.9510μs 13.0777μs 76.4659 KOps/s 76.0663 KOps/s $\color{#35bf28}+0.53\%$
test_getitem[int] 1.4897ms 15.1676μs 65.9300 KOps/s 60.5236 KOps/s $\textbf{\color{#35bf28}+8.93\%}$
test_getitem[slice_int] 0.1473ms 24.0143μs 41.6419 KOps/s 41.5276 KOps/s $\color{#35bf28}+0.28\%$
test_getitem[range] 0.1873ms 64.3114μs 15.5493 KOps/s 15.8168 KOps/s $\color{#d91a1a}-1.69\%$
test_getitem[tuple] 0.1416ms 23.6764μs 42.2362 KOps/s 42.4855 KOps/s $\color{#d91a1a}-0.59\%$
test_getitem[list] 0.2104ms 57.2951μs 17.4535 KOps/s 17.1695 KOps/s $\color{#35bf28}+1.65\%$
test_setitem_dim[int] 39.7810μs 24.8517μs 40.2387 KOps/s 38.0571 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_setitem_dim[slice_int] 65.0310μs 42.8803μs 23.3207 KOps/s 22.9625 KOps/s $\color{#35bf28}+1.56\%$
test_setitem_dim[range] 0.1376ms 95.9416μs 10.4230 KOps/s 10.4985 KOps/s $\color{#d91a1a}-0.72\%$
test_setitem_dim[tuple] 79.1010μs 39.8204μs 25.1128 KOps/s 24.4569 KOps/s $\color{#35bf28}+2.68\%$
test_setitem 51.8710μs 17.5950μs 56.8344 KOps/s 56.1191 KOps/s $\color{#35bf28}+1.27\%$
test_set 46.9900μs 16.9739μs 58.9138 KOps/s 59.3895 KOps/s $\color{#d91a1a}-0.80\%$
test_set_shared 0.5571ms 0.2102ms 4.7578 KOps/s 4.9094 KOps/s $\color{#d91a1a}-3.09\%$
test_update 0.4067ms 21.3782μs 46.7767 KOps/s 46.5369 KOps/s $\color{#35bf28}+0.52\%$
test_update_nested 76.3210μs 32.3924μs 30.8714 KOps/s 30.0669 KOps/s $\color{#35bf28}+2.68\%$
test_update__nested 0.5057ms 33.7980μs 29.5876 KOps/s 29.0195 KOps/s $\color{#35bf28}+1.96\%$
test_set_nested 48.5610μs 18.7228μs 53.4107 KOps/s 53.5866 KOps/s $\color{#d91a1a}-0.33\%$
test_set_nested_new 64.4310μs 23.5900μs 42.3908 KOps/s 41.5842 KOps/s $\color{#35bf28}+1.94\%$
test_select 78.1110μs 40.2547μs 24.8418 KOps/s 24.4331 KOps/s $\color{#35bf28}+1.67\%$
test_select_nested 0.1314ms 75.6211μs 13.2238 KOps/s 13.3544 KOps/s $\color{#d91a1a}-0.98\%$
test_exclude_nested 0.1245ms 93.2843μs 10.7199 KOps/s 10.8564 KOps/s $\color{#d91a1a}-1.26\%$
test_empty[True] 0.4726ms 0.4005ms 2.4969 KOps/s 2.4986 KOps/s $\color{#d91a1a}-0.07\%$
test_empty[False] 10.9367μs 1.3441μs 743.9899 KOps/s 758.7407 KOps/s $\color{#d91a1a}-1.94\%$
test_to 0.1045ms 74.7460μs 13.3786 KOps/s 13.4407 KOps/s $\color{#d91a1a}-0.46\%$
test_to_nonblocking 0.1105ms 64.7264μs 15.4497 KOps/s 15.4278 KOps/s $\color{#35bf28}+0.14\%$
test_unbind_speed 0.3883ms 0.3352ms 2.9832 KOps/s 3.0104 KOps/s $\color{#d91a1a}-0.90\%$
test_unbind_speed_stack0 0.3800ms 0.3326ms 3.0063 KOps/s 3.0418 KOps/s $\color{#d91a1a}-1.17\%$
test_unbind_speed_stack1 0.1035s 1.0576ms 945.5808 Ops/s 1.1912 KOps/s $\textbf{\color{#d91a1a}-20.62\%}$
test_split 1.2514ms 1.1486ms 870.6000 Ops/s 786.3578 Ops/s $\textbf{\color{#35bf28}+10.71\%}$
test_chunk 0.1038s 1.2097ms 826.6805 Ops/s 916.7735 Ops/s $\textbf{\color{#d91a1a}-9.83\%}$
test_to_cpu_blocking 19.3758ms 19.2917ms 51.8357 Ops/s 51.3026 Ops/s $\color{#35bf28}+1.04\%$
test_to_cpu_global_sync 11.3359ms 11.2012ms 89.2765 Ops/s 78.9689 Ops/s $\textbf{\color{#35bf28}+13.05\%}$
test_to_cpu_event_sync 0.1152s 13.4113ms 74.5641 Ops/s 80.2244 Ops/s $\textbf{\color{#d91a1a}-7.06\%}$
test_to_cpu_default 12.4567ms 12.1935ms 82.0109 Ops/s 80.6764 Ops/s $\color{#35bf28}+1.65\%$
test_consolidate[False-None] 4.2616ms 4.1606ms 240.3515 Ops/s 216.9879 Ops/s $\textbf{\color{#35bf28}+10.77\%}$
test_consolidate[default-None] 2.1458ms 2.0100ms 497.5119 Ops/s 487.1495 Ops/s $\color{#35bf28}+2.13\%$
test_consolidate[reduce-overhead-None] 2.0355ms 1.9435ms 514.5298 Ops/s 508.6437 Ops/s $\color{#35bf28}+1.16\%$
test_consolidate_njt[False-None] 8.6913ms 8.4440ms 118.4276 Ops/s 117.7129 Ops/s $\color{#35bf28}+0.61\%$
test_to[False-False-None] 2.3227ms 2.0744ms 482.0637 Ops/s 473.9723 Ops/s $\color{#35bf28}+1.71\%$
test_to[True-False-None] 2.1310ms 1.8945ms 527.8518 Ops/s 520.0717 Ops/s $\color{#35bf28}+1.50\%$
test_to[within-False-None] 6.3861ms 6.1350ms 162.9991 Ops/s 163.1155 Ops/s $\color{#d91a1a}-0.07\%$
test_to[True-default-None] 8.9004ms 8.6410ms 115.7279 Ops/s 111.2757 Ops/s $\color{#35bf28}+4.00\%$
test_to_njt[False-False-None] 8.5305ms 8.4515ms 118.3218 Ops/s 117.1108 Ops/s $\color{#35bf28}+1.03\%$
test_to_njt[True-False-None] 7.0806ms 6.8804ms 145.3407 Ops/s 142.1915 Ops/s $\color{#35bf28}+2.21\%$
test_to_njt[within-False-None] 15.6386ms 15.3588ms 65.1092 Ops/s 63.7603 Ops/s $\color{#35bf28}+2.12\%$
test_creation[device0] 0.4141ms 0.1144ms 8.7447 KOps/s 8.6670 KOps/s $\color{#35bf28}+0.90\%$
test_creation_from_tensor 0.3957ms 0.1118ms 8.9442 KOps/s 8.8637 KOps/s $\color{#35bf28}+0.91\%$
test_add_one[memmap_tensor0] 0.3490ms 6.4542μs 154.9385 KOps/s 155.9302 KOps/s $\color{#d91a1a}-0.64\%$
test_contiguous[memmap_tensor0] 14.2600μs 0.6840μs 1.4620 MOps/s 2.1148 MOps/s $\textbf{\color{#d91a1a}-30.87\%}$
test_stack[memmap_tensor0] 46.9600μs 4.7189μs 211.9120 KOps/s 212.4636 KOps/s $\color{#d91a1a}-0.26\%$
test_memmaptd_index 0.9804ms 0.2749ms 3.6372 KOps/s 3.7386 KOps/s $\color{#d91a1a}-2.71\%$
test_memmaptd_index_astensor 0.5560ms 0.3765ms 2.6561 KOps/s 2.7436 KOps/s $\color{#d91a1a}-3.19\%$
test_memmaptd_index_op 0.7807ms 0.6222ms 1.6071 KOps/s 1.6353 KOps/s $\color{#d91a1a}-1.72\%$
test_serialize_model 0.1389s 0.1364s 7.3322 Ops/s 7.2803 Ops/s $\color{#35bf28}+0.71\%$
test_serialize_model_pickle 1.3680s 1.1894s 0.8407 Ops/s 0.8261 Ops/s $\color{#35bf28}+1.78\%$
test_serialize_weights 0.1386s 0.1360s 7.3520 Ops/s 7.3839 Ops/s $\color{#d91a1a}-0.43\%$
test_serialize_weights_returnearly 0.4390s 93.1447ms 10.7360 Ops/s 11.2691 Ops/s $\color{#d91a1a}-4.73\%$
test_serialize_weights_pickle 1.3506s 1.2143s 0.8235 Ops/s 0.8231 Ops/s $\color{#35bf28}+0.05\%$
test_reshape_pytree 0.2052ms 34.0810μs 29.3419 KOps/s 30.7136 KOps/s $\color{#d91a1a}-4.47\%$
test_reshape_td 83.1210μs 47.2263μs 21.1746 KOps/s 22.4973 KOps/s $\textbf{\color{#d91a1a}-5.88\%}$
test_view_pytree 0.2160ms 33.8747μs 29.5206 KOps/s 30.9972 KOps/s $\color{#d91a1a}-4.76\%$
test_view_td 87.9710μs 53.2281μs 18.7871 KOps/s 18.9527 KOps/s $\color{#d91a1a}-0.87\%$
test_unbind_pytree 0.2512ms 36.8672μs 27.1244 KOps/s 27.7626 KOps/s $\color{#d91a1a}-2.30\%$
test_unbind_td 82.3310μs 50.3740μs 19.8515 KOps/s 20.2738 KOps/s $\color{#d91a1a}-2.08\%$
test_split_pytree 0.2184ms 44.6286μs 22.4072 KOps/s 23.8758 KOps/s $\textbf{\color{#d91a1a}-6.15\%}$
test_split_td 0.1164ms 68.2950μs 14.6424 KOps/s 15.5342 KOps/s $\textbf{\color{#d91a1a}-5.74\%}$
test_add_pytree 0.2286ms 44.3322μs 22.5570 KOps/s 23.9562 KOps/s $\textbf{\color{#d91a1a}-5.84\%}$
test_add_td 91.0720μs 57.5619μs 17.3726 KOps/s 18.3801 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_compile_add_one_nested[tensordict-compile] 0.3247ms 0.1445ms 6.9205 KOps/s 6.8813 KOps/s $\color{#35bf28}+0.57\%$
test_compile_add_one_nested[tensordict-eager] 0.2902ms 0.2056ms 4.8646 KOps/s 4.9825 KOps/s $\color{#d91a1a}-2.37\%$
test_compile_add_one_nested[pytree-compile] 0.1966ms 0.1080ms 9.2579 KOps/s 9.0051 KOps/s $\color{#35bf28}+2.81\%$
test_compile_add_one_nested[pytree-eager] 0.4271ms 0.1812ms 5.5190 KOps/s 5.5632 KOps/s $\color{#d91a1a}-0.79\%$
test_compile_copy_nested[tensordict-compile] 0.3082ms 10.5469μs 94.8150 KOps/s 96.9610 KOps/s $\color{#d91a1a}-2.21\%$
test_compile_copy_nested[tensordict-eager] 82.1910μs 53.8457μs 18.5716 KOps/s 18.2623 KOps/s $\color{#35bf28}+1.69\%$
test_compile_copy_nested[pytree-compile] 45.5700μs 9.6465μs 103.6646 KOps/s 99.7691 KOps/s $\color{#35bf28}+3.90\%$
test_compile_copy_nested[pytree-eager] 0.4265ms 69.4631μs 14.3961 KOps/s 14.6397 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_add_one_flat[tensordict-compile] 0.2278ms 0.1776ms 5.6313 KOps/s 5.2680 KOps/s $\textbf{\color{#35bf28}+6.89\%}$
test_compile_add_one_flat[tensordict-eager] 0.3283ms 0.2777ms 3.6005 KOps/s 3.4991 KOps/s $\color{#35bf28}+2.90\%$
test_compile_add_one_flat[tensorclass-compile] 0.1684ms 0.1176ms 8.5047 KOps/s 8.0644 KOps/s $\textbf{\color{#35bf28}+5.46\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1190ms 73.5225μs 13.6013 KOps/s 13.4587 KOps/s $\color{#35bf28}+1.06\%$
test_compile_add_one_flat[pytree-compile] 0.1936ms 0.1587ms 6.2996 KOps/s 6.0994 KOps/s $\color{#35bf28}+3.28\%$
test_compile_add_one_flat[pytree-eager] 0.8014ms 0.5285ms 1.8920 KOps/s 1.7904 KOps/s $\textbf{\color{#35bf28}+5.68\%}$
test_compile_add_self_flat[tensordict-eager] 0.5086ms 0.3324ms 3.0083 KOps/s 2.9570 KOps/s $\color{#35bf28}+1.74\%$
test_compile_add_self_flat[tensordict-compile] 0.2306ms 0.1795ms 5.5720 KOps/s 5.2719 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1357ms 90.1068μs 11.0979 KOps/s 11.1199 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_add_self_flat[tensorclass-compile] 0.1620ms 0.1198ms 8.3468 KOps/s 7.9888 KOps/s $\color{#35bf28}+4.48\%$
test_compile_add_self_flat[pytree-eager] 0.6583ms 0.4368ms 2.2893 KOps/s 2.2565 KOps/s $\color{#35bf28}+1.45\%$
test_compile_add_self_flat[pytree-compile] 0.2022ms 0.1586ms 6.3037 KOps/s 6.1288 KOps/s $\color{#35bf28}+2.85\%$
test_compile_copy_flat[tensordict-compile] 42.8700μs 13.3910μs 74.6770 KOps/s 70.5391 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_compile_copy_flat[tensordict-eager] 76.6210μs 41.9615μs 23.8314 KOps/s 24.2200 KOps/s $\color{#d91a1a}-1.60\%$
test_compile_copy_flat[pytree-compile] 48.4910μs 10.9713μs 91.1468 KOps/s 91.6971 KOps/s $\color{#d91a1a}-0.60\%$
test_compile_copy_flat[pytree-eager] 0.4054ms 52.1982μs 19.1578 KOps/s 19.0813 KOps/s $\color{#35bf28}+0.40\%$
test_compile_assign_and_add[tensordict-compile] 2.0150ms 0.1744ms 5.7334 KOps/s 5.5068 KOps/s $\color{#35bf28}+4.11\%$
test_compile_assign_and_add[tensordict-eager] 3.3722ms 3.2867ms 304.2594 Ops/s 299.9706 Ops/s $\color{#35bf28}+1.43\%$
test_compile_assign_and_add[pytree-compile] 2.0316ms 0.1622ms 6.1651 KOps/s 6.0482 KOps/s $\color{#35bf28}+1.93\%$
test_compile_assign_and_add[pytree-eager] 2.9207ms 2.7932ms 358.0111 Ops/s 356.5978 Ops/s $\color{#35bf28}+0.40\%$
test_compile_indexing[tensor-tensordict-compile] 0.1720ms 0.1103ms 9.0649 KOps/s 8.8330 KOps/s $\color{#35bf28}+2.63\%$
test_compile_indexing[tensor-tensordict-eager] 0.3163ms 73.2233μs 13.6569 KOps/s 13.3764 KOps/s $\color{#35bf28}+2.10\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1519ms 96.5276μs 10.3597 KOps/s 10.0810 KOps/s $\color{#35bf28}+2.76\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2491ms 44.4916μs 22.4761 KOps/s 22.2730 KOps/s $\color{#35bf28}+0.91\%$
test_compile_indexing[tensor-pytree-compile] 0.1393ms 97.6363μs 10.2421 KOps/s 10.1081 KOps/s $\color{#35bf28}+1.33\%$
test_compile_indexing[tensor-pytree-eager] 0.2769ms 44.4980μs 22.4729 KOps/s 22.1346 KOps/s $\color{#35bf28}+1.53\%$
test_compile_indexing[slice-tensordict-compile] 0.1009ms 56.8707μs 17.5837 KOps/s 17.4205 KOps/s $\color{#35bf28}+0.94\%$
test_compile_indexing[slice-tensordict-eager] 0.2186ms 26.7467μs 37.3877 KOps/s 36.2341 KOps/s $\color{#35bf28}+3.18\%$
test_compile_indexing[slice-tensorclass-compile] 81.4210μs 43.9204μs 22.7684 KOps/s 22.2723 KOps/s $\color{#35bf28}+2.23\%$
test_compile_indexing[slice-tensorclass-eager] 0.2514ms 22.3896μs 44.6635 KOps/s 44.4624 KOps/s $\color{#35bf28}+0.45\%$
test_compile_indexing[slice-pytree-compile] 79.3010μs 45.6584μs 21.9018 KOps/s 22.0406 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_indexing[slice-pytree-eager] 0.2863ms 22.5522μs 44.3415 KOps/s 44.9063 KOps/s $\color{#d91a1a}-1.26\%$
test_compile_indexing[int-tensordict-compile] 0.1157ms 57.1069μs 17.5110 KOps/s 17.0801 KOps/s $\color{#35bf28}+2.52\%$
test_compile_indexing[int-tensordict-eager] 0.2773ms 26.9246μs 37.1408 KOps/s 36.3254 KOps/s $\color{#35bf28}+2.24\%$
test_compile_indexing[int-tensorclass-compile] 89.2210μs 44.3599μs 22.5429 KOps/s 22.4137 KOps/s $\color{#35bf28}+0.58\%$
test_compile_indexing[int-tensorclass-eager] 0.2589ms 22.4277μs 44.5878 KOps/s 44.9856 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_indexing[int-pytree-compile] 75.8210μs 44.1036μs 22.6739 KOps/s 22.3387 KOps/s $\color{#35bf28}+1.50\%$
test_compile_indexing[int-pytree-eager] 0.2519ms 22.2362μs 44.9717 KOps/s 45.1486 KOps/s $\color{#d91a1a}-0.39\%$
test_compile_replace[single-eager] 84.0110μs 46.5068μs 21.5022 KOps/s 21.0591 KOps/s $\color{#35bf28}+2.10\%$
test_compile_replace[single-compile] 0.1851ms 0.1048ms 9.5437 KOps/s 9.2366 KOps/s $\color{#35bf28}+3.32\%$
test_compile_replace[multi-eager] 0.6060ms 0.5601ms 1.7855 KOps/s 1.7670 KOps/s $\color{#35bf28}+1.05\%$
test_compile_replace[multi-compile] 0.1772ms 0.1114ms 8.9776 KOps/s 8.4086 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_compile_tc_getattr_20[eager] 0.2417ms 0.1783ms 5.6086 KOps/s 5.9433 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_compile_tc_getattr_20[compile] 0.1789ms 0.1190ms 8.4046 KOps/s 8.2019 KOps/s $\color{#35bf28}+2.47\%$
test_compile_clone_shallow[20-eager] 44.8400μs 19.3473μs 51.6869 KOps/s 51.1154 KOps/s $\color{#35bf28}+1.12\%$
test_compile_clone_shallow[20-compile] 81.5410μs 11.2698μs 88.7328 KOps/s 88.0581 KOps/s $\color{#35bf28}+0.77\%$
test_compile_clone_shallow[40-eager] 66.3810μs 34.3775μs 29.0888 KOps/s 29.1795 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_clone_shallow[40-compile] 44.8300μs 12.5829μs 79.4726 KOps/s 78.3976 KOps/s $\color{#35bf28}+1.37\%$
test_compile_clone_shallow[80-eager] 0.1018ms 62.9000μs 15.8982 KOps/s 15.6393 KOps/s $\color{#35bf28}+1.66\%$
test_compile_clone_shallow[80-compile] 50.5600μs 14.5533μs 68.7128 KOps/s 65.9108 KOps/s $\color{#35bf28}+4.25\%$
test_compile_update_inplace[eager] 97.3220μs 59.7122μs 16.7470 KOps/s 16.7158 KOps/s $\color{#35bf28}+0.19\%$
test_compile_update_inplace[compile] 0.2106ms 0.1394ms 7.1734 KOps/s 6.7007 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_mod_add[eager] 99.2610μs 50.3944μs 19.8435 KOps/s 19.6742 KOps/s $\color{#35bf28}+0.86\%$
test_mod_add[compile] 0.1551ms 0.1050ms 9.5281 KOps/s 9.3658 KOps/s $\color{#35bf28}+1.73\%$
test_mod_add[compile-overhead] 0.2324ms 0.1470ms 6.8043 KOps/s 6.5235 KOps/s $\color{#35bf28}+4.30\%$
test_mod_wrap[eager] 0.3800ms 0.3026ms 3.3043 KOps/s 3.3755 KOps/s $\color{#d91a1a}-2.11\%$
test_mod_wrap[compile] 0.3991ms 0.3460ms 2.8902 KOps/s 2.8114 KOps/s $\color{#35bf28}+2.80\%$
test_mod_wrap[compile-overhead] 7.3809ms 4.0795ms 245.1299 Ops/s 253.4924 Ops/s $\color{#d91a1a}-3.30\%$
test_mod_wrap_and_backward[eager] 1.6236ms 1.5002ms 666.5711 Ops/s 658.1628 Ops/s $\color{#35bf28}+1.28\%$
test_mod_wrap_and_backward[compile] 1.6361ms 1.5553ms 642.9525 Ops/s 683.0620 Ops/s $\textbf{\color{#d91a1a}-5.87\%}$
test_mod_wrap_and_backward[compile-overhead] 1.4647ms 1.0000ms 1.0000 KOps/s 1.0961 KOps/s $\textbf{\color{#d91a1a}-8.77\%}$
test_seq_add[eager] 0.2233ms 0.1516ms 6.5957 KOps/s 6.2680 KOps/s $\textbf{\color{#35bf28}+5.23\%}$
test_seq_add[compile] 0.1725ms 0.1119ms 8.9332 KOps/s 8.4925 KOps/s $\textbf{\color{#35bf28}+5.19\%}$
test_seq_add[compile-overhead] 0.2260ms 0.1658ms 6.0324 KOps/s 6.1969 KOps/s $\color{#d91a1a}-2.66\%$
test_seq_wrap[eager] 0.6172ms 0.5523ms 1.8108 KOps/s 1.8757 KOps/s $\color{#d91a1a}-3.46\%$
test_seq_wrap[compile] 0.4690ms 0.3869ms 2.5847 KOps/s 2.6826 KOps/s $\color{#d91a1a}-3.65\%$
test_seq_wrap[compile-overhead] 0.3371ms 0.2662ms 3.7564 KOps/s 3.7167 KOps/s $\color{#35bf28}+1.07\%$
test_func_call_runtime[False-eager] 0.9022ms 0.8331ms 1.2003 KOps/s 1.1648 KOps/s $\color{#35bf28}+3.05\%$
test_func_call_runtime[False-compile] 0.9706ms 0.9072ms 1.1023 KOps/s 1.0866 KOps/s $\color{#35bf28}+1.45\%$
test_func_call_runtime[False-compile-overhead] 0.5613ms 0.4642ms 2.1541 KOps/s 2.1384 KOps/s $\color{#35bf28}+0.74\%$
test_func_call_runtime[True-eager] 1.1363ms 1.0781ms 927.5463 Ops/s 920.0990 Ops/s $\color{#35bf28}+0.81\%$
test_func_call_runtime[True-compile] 0.9818ms 0.9222ms 1.0844 KOps/s 1.0648 KOps/s $\color{#35bf28}+1.84\%$
test_func_call_runtime[True-compile-overhead] 0.5368ms 0.4772ms 2.0954 KOps/s 2.0762 KOps/s $\color{#35bf28}+0.92\%$
test_func_call_cm_runtime[False-eager] 0.9468ms 0.8861ms 1.1286 KOps/s 1.1690 KOps/s $\color{#d91a1a}-3.46\%$
test_func_call_cm_runtime[False-compile] 1.0012ms 0.9159ms 1.0918 KOps/s 1.0788 KOps/s $\color{#35bf28}+1.21\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5330ms 0.4672ms 2.1404 KOps/s 2.1303 KOps/s $\color{#35bf28}+0.48\%$
test_func_call_cm_runtime[True-eager] 1.3060ms 1.2272ms 814.8518 Ops/s 804.3518 Ops/s $\color{#35bf28}+1.31\%$
test_func_call_cm_runtime[True-compile] 1.0109ms 0.9572ms 1.0447 KOps/s 1.0276 KOps/s $\color{#35bf28}+1.66\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5798ms 0.5107ms 1.9580 KOps/s 1.9258 KOps/s $\color{#35bf28}+1.67\%$
test_vmap_func_call_cm_runtime[eager] 2.8496ms 2.3629ms 423.2044 Ops/s 416.3101 Ops/s $\color{#35bf28}+1.66\%$
test_vmap_func_call_cm_runtime[compile] 1.0564ms 0.9777ms 1.0228 KOps/s 1.0082 KOps/s $\color{#35bf28}+1.45\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5682ms 0.5167ms 1.9355 KOps/s 1.9150 KOps/s $\color{#35bf28}+1.07\%$
test_distributed 2.7916ms 0.1749ms 5.7161 KOps/s 6.5135 KOps/s $\textbf{\color{#d91a1a}-12.24\%}$
test_tdmodule 0.1779ms 28.3807μs 35.2352 KOps/s 35.8465 KOps/s $\color{#d91a1a}-1.71\%$
test_tdmodule_dispatch 72.7610μs 45.2216μs 22.1133 KOps/s 21.8889 KOps/s $\color{#35bf28}+1.03\%$
test_tdseq 66.3010μs 27.1554μs 36.8251 KOps/s 36.8426 KOps/s $\color{#d91a1a}-0.05\%$
test_tdseq_dispatch 75.4710μs 47.5791μs 21.0176 KOps/s 20.9630 KOps/s $\color{#35bf28}+0.26\%$
test_instantiation_functorch 2.1943ms 2.0794ms 480.9151 Ops/s 480.1015 Ops/s $\color{#35bf28}+0.17\%$
test_exec_functorch 0.2304ms 0.1803ms 5.5463 KOps/s 5.5594 KOps/s $\color{#d91a1a}-0.24\%$
test_exec_functional_call 0.2231ms 0.1637ms 6.1091 KOps/s 6.2130 KOps/s $\color{#d91a1a}-1.67\%$
test_exec_td_decorator 0.4400ms 0.2390ms 4.1849 KOps/s 4.2503 KOps/s $\color{#d91a1a}-1.54\%$
test_vmap_mlp_speed_decorator[True-True] 0.9979ms 0.8215ms 1.2173 KOps/s 1.2050 KOps/s $\color{#35bf28}+1.03\%$
test_vmap_mlp_speed_decorator[True-False] 1.0183ms 0.8413ms 1.1887 KOps/s 1.1695 KOps/s $\color{#35bf28}+1.64\%$
test_vmap_mlp_speed_decorator[False-True] 0.9378ms 0.7199ms 1.3891 KOps/s 1.3472 KOps/s $\color{#35bf28}+3.11\%$
test_vmap_mlp_speed_decorator[False-False] 0.9199ms 0.7166ms 1.3954 KOps/s 1.3562 KOps/s $\color{#35bf28}+2.89\%$
test_vmap_transformer_speed_decorator[True-True] 20.9544ms 20.4908ms 48.8025 Ops/s 47.1333 Ops/s $\color{#35bf28}+3.54\%$
test_vmap_transformer_speed_decorator[True-False] 21.0121ms 20.5128ms 48.7500 Ops/s 48.1325 Ops/s $\color{#35bf28}+1.28\%$
test_vmap_transformer_speed_decorator[False-True] 20.8740ms 20.2702ms 49.3336 Ops/s 48.9000 Ops/s $\color{#35bf28}+0.89\%$
test_vmap_transformer_speed_decorator[False-False] 20.8591ms 20.2718ms 49.3295 Ops/s 48.9145 Ops/s $\color{#35bf28}+0.85\%$
test_to_module_speed[True] 2.0504ms 1.4758ms 677.5965 Ops/s 673.4486 Ops/s $\color{#35bf28}+0.62\%$
test_to_module_speed[False] 1.9318ms 1.4536ms 687.9563 Ops/s 680.8005 Ops/s $\color{#35bf28}+1.05\%$
test_tc_init 81.0210μs 45.1692μs 22.1390 KOps/s 22.1994 KOps/s $\color{#d91a1a}-0.27\%$
test_tc_init_tensor_only 37.4010μs 9.6840μs 103.2635 KOps/s 101.9391 KOps/s $\color{#35bf28}+1.30\%$
test_tc_init_nested 0.1191ms 87.8365μs 11.3848 KOps/s 11.3506 KOps/s $\color{#35bf28}+0.30\%$
test_tc_init_many_fields 48.1200μs 16.3521μs 61.1542 KOps/s 60.8158 KOps/s $\color{#35bf28}+0.56\%$
test_tc_first_layer_tensor 28.5110μs 1.8326μs 545.6834 KOps/s 540.4270 KOps/s $\color{#35bf28}+0.97\%$
test_tc_first_layer_tensor_only 3.5200μs 0.4102μs 2.4380 MOps/s 2.4594 MOps/s $\color{#d91a1a}-0.87\%$
test_tc_first_layer_tensor_set 28.4000μs 3.9700μs 251.8920 KOps/s 252.9086 KOps/s $\color{#d91a1a}-0.40\%$
test_tc_first_layer_tensor_only_set 23.2000μs 3.2942μs 303.5661 KOps/s 305.0497 KOps/s $\color{#d91a1a}-0.49\%$
test_tc_first_layer_nontensor 48.2000μs 6.1494μs 162.6178 KOps/s 160.6697 KOps/s $\color{#35bf28}+1.21\%$
test_tc_second_layer_tensor 33.8600μs 4.4201μs 226.2390 KOps/s 227.4721 KOps/s $\color{#d91a1a}-0.54\%$
test_tc_second_layer_nontensor 49.2210μs 8.6439μs 115.6881 KOps/s 115.0925 KOps/s $\color{#35bf28}+0.52\%$
test_unbind 0.2628s 16.5450ms 60.4412 Ops/s 53.0614 Ops/s $\textbf{\color{#35bf28}+13.91\%}$
test_full_like 5.4464ms 4.3942ms 227.5740 Ops/s 58.3779 Ops/s $\textbf{\color{#35bf28}+289.83\%}$
test_zeros_like 4.9649ms 4.3873ms 227.9285 Ops/s 59.6096 Ops/s $\textbf{\color{#35bf28}+282.37\%}$
test_ones_like 4.6448ms 4.4049ms 227.0183 Ops/s 59.5299 Ops/s $\textbf{\color{#35bf28}+281.35\%}$
test_clone 6.9395ms 6.5257ms 153.2391 Ops/s 56.4190 Ops/s $\textbf{\color{#35bf28}+171.61\%}$
test_squeeze 60.4210μs 14.5311μs 68.8180 KOps/s 70.9669 KOps/s $\color{#d91a1a}-3.03\%$
test_unsqueeze 0.1594ms 0.1107ms 9.0347 KOps/s 8.9854 KOps/s $\color{#35bf28}+0.55\%$
test_split 0.2438ms 0.1847ms 5.4140 KOps/s 5.3905 KOps/s $\color{#35bf28}+0.44\%$
test_permute 0.2746ms 0.2108ms 4.7446 KOps/s 4.8579 KOps/s $\color{#d91a1a}-2.33\%$
test_stack 53.3136ms 51.9025ms 19.2669 Ops/s 19.3997 Ops/s $\color{#d91a1a}-0.68\%$
test_cat 52.4766ms 51.6763ms 19.3512 Ops/s 19.4082 Ops/s $\color{#d91a1a}-0.29\%$
test_sequential_tensordict 0.3276ms 0.2262ms 4.4207 KOps/s 4.5361 KOps/s $\color{#d91a1a}-2.54\%$
test_sequential_graph_module 0.5419ms 0.1218ms 8.2132 KOps/s 8.2197 KOps/s $\color{#d91a1a}-0.08\%$
test_nested_tensordict 0.3323ms 0.2888ms 3.4629 KOps/s 3.4701 KOps/s $\color{#d91a1a}-0.21\%$
test_nested_graph_module 0.2501ms 0.1336ms 7.4847 KOps/s 7.7406 KOps/s $\color{#d91a1a}-3.31\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant