Skip to content

[DTensor] Add unified dtensor_send/dtensor_recv API with Strategy A#1640

Open
vmoens wants to merge 1 commit intogh/vmoens/81/basefrom
gh/vmoens/81/head
Open

[DTensor] Add unified dtensor_send/dtensor_recv API with Strategy A#1640
vmoens wants to merge 1 commit intogh/vmoens/81/basefrom
gh/vmoens/81/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 6, 2026

Stack from ghstack (oldest at bottom):

Add dtensor_send() and dtensor_recv() methods to TensorDictBase with:

  • strategy parameter: "materialize" (A), "redistribute" (B), "optimal" (C), "auto"
  • transport parameter: "torch_distributed", "ucxx", "auto"
  • Transport auto-detection based on dst/src type (int -> torch.distributed,
    TensorDictPipe -> UCXX)

Strategy A (materialize) implementation:

  • Sender: materializes DTensors via full_tensor(), sends metadata + full tensors
  • Receiver: receives full tensors, stores them in the TensorDict

Strategies B and C are stubbed with NotImplementedError (implemented in
subsequent PRs).

Made-with: Cursor

[ghstack-poisoned]
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add unified dtensor_send/dtensor_recv API with Strategy A

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add unified dtensor_send/dtensor_recv API with Strategy A

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}35$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 43.9500μs 14.3876μs 69.5044 KOps/s 70.4946 KOps/s $\color{#d91a1a}-1.40\%$
test_plain_set_stack_nested 37.2910μs 14.7438μs 67.8250 KOps/s 68.7282 KOps/s $\color{#d91a1a}-1.31\%$
test_plain_set_nested_inplace 42.4910μs 16.0103μs 62.4597 KOps/s 62.2523 KOps/s $\color{#35bf28}+0.33\%$
test_plain_set_stack_nested_inplace 62.5210μs 15.8845μs 62.9544 KOps/s 62.4856 KOps/s $\color{#35bf28}+0.75\%$
test_items 27.0910μs 5.4805μs 182.4636 KOps/s 182.0408 KOps/s $\color{#35bf28}+0.23\%$
test_items_nested 0.5698ms 0.4462ms 2.2409 KOps/s 2.2634 KOps/s $\color{#d91a1a}-0.99\%$
test_items_nested_locked 0.5638ms 0.4518ms 2.2134 KOps/s 2.2362 KOps/s $\color{#d91a1a}-1.02\%$
test_items_nested_leaf 0.1665ms 91.3404μs 10.9481 KOps/s 10.8012 KOps/s $\color{#35bf28}+1.36\%$
test_items_stack_nested 0.4805ms 0.4472ms 2.2359 KOps/s 2.2537 KOps/s $\color{#d91a1a}-0.79\%$
test_items_stack_nested_leaf 0.1301ms 93.7867μs 10.6625 KOps/s 10.6662 KOps/s $\color{#d91a1a}-0.03\%$
test_items_stack_nested_locked 0.5176ms 0.4493ms 2.2258 KOps/s 2.2304 KOps/s $\color{#d91a1a}-0.21\%$
test_keys 33.9100μs 4.1361μs 241.7738 KOps/s 246.1273 KOps/s $\color{#d91a1a}-1.77\%$
test_keys_nested 0.1937ms 0.1276ms 7.8350 KOps/s 7.8925 KOps/s $\color{#d91a1a}-0.73\%$
test_keys_nested_locked 2.1438ms 0.1374ms 7.2793 KOps/s 7.4172 KOps/s $\color{#d91a1a}-1.86\%$
test_keys_nested_leaf 0.1678ms 0.1189ms 8.4132 KOps/s 8.5649 KOps/s $\color{#d91a1a}-1.77\%$
test_keys_stack_nested 0.2127ms 0.1292ms 7.7417 KOps/s 7.8971 KOps/s $\color{#d91a1a}-1.97\%$
test_keys_stack_nested_leaf 0.1886ms 0.1189ms 8.4098 KOps/s 8.5845 KOps/s $\color{#d91a1a}-2.03\%$
test_keys_stack_nested_locked 0.1630ms 0.1380ms 7.2474 KOps/s 7.4417 KOps/s $\color{#d91a1a}-2.61\%$
test_values 5.4782μs 1.0093μs 990.8121 KOps/s 999.7890 KOps/s $\color{#d91a1a}-0.90\%$
test_values_nested 92.6810μs 51.7589μs 19.3204 KOps/s 19.5699 KOps/s $\color{#d91a1a}-1.28\%$
test_values_nested_locked 87.6210μs 55.2306μs 18.1059 KOps/s 18.5448 KOps/s $\color{#d91a1a}-2.37\%$
test_values_nested_leaf 89.3110μs 59.3172μs 16.8585 KOps/s 17.2467 KOps/s $\color{#d91a1a}-2.25\%$
test_values_stack_nested 81.6310μs 51.8681μs 19.2797 KOps/s 19.5830 KOps/s $\color{#d91a1a}-1.55\%$
test_values_stack_nested_leaf 86.8820μs 59.1589μs 16.9036 KOps/s 17.2172 KOps/s $\color{#d91a1a}-1.82\%$
test_values_stack_nested_locked 0.1030ms 55.7281μs 17.9443 KOps/s 18.3803 KOps/s $\color{#d91a1a}-2.37\%$
test_membership 9.5518μs 0.7978μs 1.2534 MOps/s 1.2521 MOps/s $\color{#35bf28}+0.11\%$
test_membership_nested 33.3110μs 2.7173μs 368.0186 KOps/s 370.0032 KOps/s $\color{#d91a1a}-0.54\%$
test_membership_nested_leaf 15.4150μs 2.6389μs 378.9392 KOps/s 362.2450 KOps/s $\color{#35bf28}+4.61\%$
test_membership_stacked_nested 33.6400μs 2.7621μs 362.0380 KOps/s 369.0774 KOps/s $\color{#d91a1a}-1.91\%$
test_membership_stacked_nested_leaf 39.8610μs 2.7499μs 363.6506 KOps/s 364.1836 KOps/s $\color{#d91a1a}-0.15\%$
test_membership_nested_last 35.8010μs 4.1316μs 242.0356 KOps/s 244.2148 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_nested_leaf_last 29.6210μs 4.1077μs 243.4443 KOps/s 244.9041 KOps/s $\color{#d91a1a}-0.60\%$
test_membership_stacked_nested_last 73.4710μs 4.0802μs 245.0843 KOps/s 244.1228 KOps/s $\color{#35bf28}+0.39\%$
test_membership_stacked_nested_leaf_last 36.2310μs 4.1006μs 243.8696 KOps/s 246.6123 KOps/s $\color{#d91a1a}-1.11\%$
test_nested_getleaf 51.2110μs 20.6709μs 48.3771 KOps/s 48.9640 KOps/s $\color{#d91a1a}-1.20\%$
test_nested_get 61.8610μs 19.2225μs 52.0224 KOps/s 51.7182 KOps/s $\color{#35bf28}+0.59\%$
test_stacked_getleaf 48.6210μs 20.2948μs 49.2738 KOps/s 48.1768 KOps/s $\color{#35bf28}+2.28\%$
test_stacked_get 50.1410μs 19.4363μs 51.4502 KOps/s 51.0424 KOps/s $\color{#35bf28}+0.80\%$
test_nested_getitemleaf 85.2010μs 21.0590μs 47.4855 KOps/s 46.8230 KOps/s $\color{#35bf28}+1.42\%$
test_nested_getitem 46.9510μs 19.7722μs 50.5760 KOps/s 49.9813 KOps/s $\color{#35bf28}+1.19\%$
test_stacked_getitemleaf 48.7810μs 20.7738μs 48.1374 KOps/s 47.2947 KOps/s $\color{#35bf28}+1.78\%$
test_stacked_getitem 46.9910μs 19.9313μs 50.1724 KOps/s 49.4553 KOps/s $\color{#35bf28}+1.45\%$
test_lock_nested 0.5208ms 0.4586ms 2.1808 KOps/s 2.1894 KOps/s $\color{#d91a1a}-0.39\%$
test_lock_stack_nested 0.5230ms 0.4622ms 2.1635 KOps/s 2.1527 KOps/s $\color{#35bf28}+0.50\%$
test_unlock_nested 0.5241ms 0.3726ms 2.6842 KOps/s 2.6912 KOps/s $\color{#d91a1a}-0.26\%$
test_unlock_stack_nested 0.4330ms 0.3743ms 2.6717 KOps/s 2.6395 KOps/s $\color{#35bf28}+1.22\%$
test_flatten_speed 0.1596ms 0.1152ms 8.6834 KOps/s 8.6092 KOps/s $\color{#35bf28}+0.86\%$
test_unflatten_speed 0.6151ms 0.5436ms 1.8396 KOps/s 1.7923 KOps/s $\color{#35bf28}+2.64\%$
test_common_ops 0.8383ms 0.6969ms 1.4349 KOps/s 1.4781 KOps/s $\color{#d91a1a}-2.92\%$
test_creation 69.0610μs 2.9521μs 338.7461 KOps/s 340.8953 KOps/s $\color{#d91a1a}-0.63\%$
test_creation_empty 32.2200μs 6.5272μs 153.2040 KOps/s 151.5111 KOps/s $\color{#35bf28}+1.12\%$
test_creation_nested_1 57.3110μs 11.0053μs 90.8650 KOps/s 91.0301 KOps/s $\color{#d91a1a}-0.18\%$
test_creation_nested_2 38.6110μs 12.5652μs 79.5851 KOps/s 78.9655 KOps/s $\color{#35bf28}+0.78\%$
test_creation_many_keys[10] 48.3310μs 19.5516μs 51.1466 KOps/s 50.5929 KOps/s $\color{#35bf28}+1.09\%$
test_creation_many_keys[50] 0.1489ms 83.5534μs 11.9684 KOps/s 11.6980 KOps/s $\color{#35bf28}+2.31\%$
test_creation_many_keys[100] 0.2267ms 0.1642ms 6.0917 KOps/s 5.9790 KOps/s $\color{#35bf28}+1.88\%$
test_creation_nested_many_keys[10] 75.4310μs 41.7514μs 23.9513 KOps/s 23.3619 KOps/s $\color{#35bf28}+2.52\%$
test_creation_nested_many_keys[50] 0.2343ms 0.1710ms 5.8469 KOps/s 5.6919 KOps/s $\color{#35bf28}+2.72\%$
test_clone 42.6010μs 12.7681μs 78.3202 KOps/s 75.4074 KOps/s $\color{#35bf28}+3.86\%$
test_getitem[int] 1.7072ms 14.5209μs 68.8664 KOps/s 61.9057 KOps/s $\textbf{\color{#35bf28}+11.24\%}$
test_getitem[slice_int] 0.1372ms 24.5522μs 40.7296 KOps/s 43.3453 KOps/s $\textbf{\color{#d91a1a}-6.03\%}$
test_getitem[range] 0.1762ms 65.0297μs 15.3776 KOps/s 15.2469 KOps/s $\color{#35bf28}+0.86\%$
test_getitem[tuple] 0.1455ms 23.1887μs 43.1245 KOps/s 41.2563 KOps/s $\color{#35bf28}+4.53\%$
test_getitem[list] 0.1775ms 59.6661μs 16.7599 KOps/s 16.4677 KOps/s $\color{#35bf28}+1.77\%$
test_setitem_dim[int] 59.3110μs 27.4442μs 36.4376 KOps/s 37.4142 KOps/s $\color{#d91a1a}-2.61\%$
test_setitem_dim[slice_int] 67.2610μs 43.0431μs 23.2325 KOps/s 23.0178 KOps/s $\color{#35bf28}+0.93\%$
test_setitem_dim[range] 0.1225ms 97.6786μs 10.2377 KOps/s 10.3676 KOps/s $\color{#d91a1a}-1.25\%$
test_setitem_dim[tuple] 70.2410μs 38.7339μs 25.8172 KOps/s 24.1116 KOps/s $\textbf{\color{#35bf28}+7.07\%}$
test_setitem 49.9000μs 17.6363μs 56.7012 KOps/s 53.7185 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_set 58.3110μs 17.0492μs 58.6539 KOps/s 56.4200 KOps/s $\color{#35bf28}+3.96\%$
test_set_shared 0.4909ms 0.2083ms 4.8016 KOps/s 4.8322 KOps/s $\color{#d91a1a}-0.63\%$
test_update 0.3316ms 22.2901μs 44.8629 KOps/s 44.2841 KOps/s $\color{#35bf28}+1.31\%$
test_update_nested 72.7310μs 32.0420μs 31.2091 KOps/s 29.3695 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_update__nested 0.4756ms 33.7405μs 29.6380 KOps/s 28.3864 KOps/s $\color{#35bf28}+4.41\%$
test_set_nested 58.7100μs 18.8534μs 53.0408 KOps/s 49.6968 KOps/s $\textbf{\color{#35bf28}+6.73\%}$
test_set_nested_new 72.3310μs 23.6633μs 42.2595 KOps/s 39.3543 KOps/s $\textbf{\color{#35bf28}+7.38\%}$
test_select 80.8010μs 39.8606μs 25.0874 KOps/s 23.7536 KOps/s $\textbf{\color{#35bf28}+5.62\%}$
test_select_nested 0.1068ms 70.3085μs 14.2230 KOps/s 14.1926 KOps/s $\color{#35bf28}+0.21\%$
test_exclude_nested 0.1354ms 87.1797μs 11.4706 KOps/s 11.7028 KOps/s $\color{#d91a1a}-1.98\%$
test_empty[True] 0.4469ms 0.3877ms 2.5790 KOps/s 2.6127 KOps/s $\color{#d91a1a}-1.29\%$
test_empty[False] 8.2325μs 1.2523μs 798.5551 KOps/s 808.4200 KOps/s $\color{#d91a1a}-1.22\%$
test_to 0.1023ms 71.0523μs 14.0741 KOps/s 14.0506 KOps/s $\color{#35bf28}+0.17\%$
test_to_nonblocking 0.1141ms 65.4895μs 15.2696 KOps/s 16.1060 KOps/s $\textbf{\color{#d91a1a}-5.19\%}$
test_unbind_speed 0.3637ms 0.3222ms 3.1039 KOps/s 3.1749 KOps/s $\color{#d91a1a}-2.24\%$
test_unbind_speed_stack0 0.3957ms 0.3172ms 3.1522 KOps/s 3.1636 KOps/s $\color{#d91a1a}-0.36\%$
test_unbind_speed_stack1 0.1049s 0.8889ms 1.1250 KOps/s 1.2313 KOps/s $\textbf{\color{#d91a1a}-8.64\%}$
test_split 1.1472ms 1.0897ms 917.6620 Ops/s 810.0656 Ops/s $\textbf{\color{#35bf28}+13.28\%}$
test_chunk 0.1055s 1.1590ms 862.7786 Ops/s 957.4254 Ops/s $\textbf{\color{#d91a1a}-9.89\%}$
test_to_cpu_blocking 28.5776ms 28.1296ms 35.5497 Ops/s 53.3079 Ops/s $\textbf{\color{#d91a1a}-33.31\%}$
test_to_cpu_global_sync 11.2611ms 11.1424ms 89.7471 Ops/s 81.4204 Ops/s $\textbf{\color{#35bf28}+10.23\%}$
test_to_cpu_event_sync 12.3055ms 12.0195ms 83.1985 Ops/s 83.9742 Ops/s $\color{#d91a1a}-0.92\%$
test_to_cpu_default 12.3259ms 12.0569ms 82.9402 Ops/s 83.9019 Ops/s $\color{#d91a1a}-1.15\%$
test_consolidate[False-None] 4.0514ms 3.9607ms 252.4792 Ops/s 224.2915 Ops/s $\textbf{\color{#35bf28}+12.57\%}$
test_consolidate[default-None] 2.2210ms 1.9313ms 517.7986 Ops/s 494.3408 Ops/s $\color{#35bf28}+4.75\%$
test_consolidate[reduce-overhead-None] 1.9435ms 1.8660ms 535.8932 Ops/s 515.3975 Ops/s $\color{#35bf28}+3.98\%$
test_consolidate_njt[False-None] 8.4550ms 8.2501ms 121.2106 Ops/s 120.0504 Ops/s $\color{#35bf28}+0.97\%$
test_to[False-False-None] 2.1710ms 2.0248ms 493.8807 Ops/s 487.9463 Ops/s $\color{#35bf28}+1.22\%$
test_to[True-False-None] 2.0172ms 1.8714ms 534.3733 Ops/s 530.1603 Ops/s $\color{#35bf28}+0.79\%$
test_to[within-False-None] 6.2363ms 5.9472ms 168.1465 Ops/s 165.9848 Ops/s $\color{#35bf28}+1.30\%$
test_to[True-default-None] 9.0054ms 8.7144ms 114.7521 Ops/s 112.5332 Ops/s $\color{#35bf28}+1.97\%$
test_to_njt[False-False-None] 8.5388ms 8.2590ms 121.0807 Ops/s 120.3476 Ops/s $\color{#35bf28}+0.61\%$
test_to_njt[True-False-None] 6.8938ms 6.7835ms 147.4159 Ops/s 146.6121 Ops/s $\color{#35bf28}+0.55\%$
test_to_njt[within-False-None] 15.7899ms 15.2026ms 65.7783 Ops/s 65.4721 Ops/s $\color{#35bf28}+0.47\%$
test_creation[device0] 0.3528ms 0.1101ms 9.0791 KOps/s 8.8788 KOps/s $\color{#35bf28}+2.26\%$
test_creation_from_tensor 0.3579ms 0.1097ms 9.1169 KOps/s 8.9933 KOps/s $\color{#35bf28}+1.37\%$
test_add_one[memmap_tensor0] 0.2790ms 6.3189μs 158.2557 KOps/s 156.5768 KOps/s $\color{#35bf28}+1.07\%$
test_contiguous[memmap_tensor0] 15.2700μs 0.6077μs 1.6456 MOps/s 2.2697 MOps/s $\textbf{\color{#d91a1a}-27.50\%}$
test_stack[memmap_tensor0] 32.8500μs 4.4642μs 224.0060 KOps/s 219.1616 KOps/s $\color{#35bf28}+2.21\%$
test_memmaptd_index 1.0126ms 0.2644ms 3.7818 KOps/s 3.8193 KOps/s $\color{#d91a1a}-0.98\%$
test_memmaptd_index_astensor 0.5162ms 0.3625ms 2.7590 KOps/s 2.7748 KOps/s $\color{#d91a1a}-0.57\%$
test_memmaptd_index_op 0.8939ms 0.5988ms 1.6701 KOps/s 1.6556 KOps/s $\color{#35bf28}+0.88\%$
test_serialize_model 0.1392s 0.1368s 7.3108 Ops/s 7.3416 Ops/s $\color{#d91a1a}-0.42\%$
test_serialize_model_pickle 1.3619s 1.2125s 0.8248 Ops/s 0.8262 Ops/s $\color{#d91a1a}-0.17\%$
test_serialize_weights 0.1387s 0.1367s 7.3155 Ops/s 7.3918 Ops/s $\color{#d91a1a}-1.03\%$
test_serialize_weights_returnearly 0.3978s 85.8888ms 11.6430 Ops/s 6.0512 Ops/s $\textbf{\color{#35bf28}+92.41\%}$
test_serialize_weights_pickle 1.3698s 1.2141s 0.8236 Ops/s 0.8224 Ops/s $\color{#35bf28}+0.15\%$
test_reshape_pytree 0.2161ms 30.9432μs 32.3173 KOps/s 30.2491 KOps/s $\textbf{\color{#35bf28}+6.84\%}$
test_reshape_td 90.2720μs 43.0876μs 23.2086 KOps/s 22.6174 KOps/s $\color{#35bf28}+2.61\%$
test_view_pytree 0.2125ms 30.7738μs 32.4951 KOps/s 30.7558 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_view_td 91.7320μs 51.9014μs 19.2673 KOps/s 19.0076 KOps/s $\color{#35bf28}+1.37\%$
test_unbind_pytree 0.2328ms 34.6836μs 28.8321 KOps/s 27.2902 KOps/s $\textbf{\color{#35bf28}+5.65\%}$
test_unbind_td 0.1891ms 47.7887μs 20.9255 KOps/s 20.2093 KOps/s $\color{#35bf28}+3.54\%$
test_split_pytree 0.2397ms 40.2995μs 24.8142 KOps/s 23.6289 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_split_td 0.1229ms 61.8757μs 16.1614 KOps/s 15.3843 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_add_pytree 0.2274ms 41.9811μs 23.8203 KOps/s 23.6734 KOps/s $\color{#35bf28}+0.62\%$
test_add_td 99.3020μs 57.7142μs 17.3267 KOps/s 17.4002 KOps/s $\color{#d91a1a}-0.42\%$
test_compile_add_one_nested[tensordict-compile] 0.1898ms 0.1360ms 7.3548 KOps/s 6.7497 KOps/s $\textbf{\color{#35bf28}+8.96\%}$
test_compile_add_one_nested[tensordict-eager] 0.4364ms 0.1936ms 5.1656 KOps/s 5.2034 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_add_one_nested[pytree-compile] 0.1500ms 0.1050ms 9.5233 KOps/s 9.3811 KOps/s $\color{#35bf28}+1.52\%$
test_compile_add_one_nested[pytree-eager] 0.4301ms 0.1709ms 5.8525 KOps/s 5.8162 KOps/s $\color{#35bf28}+0.62\%$
test_compile_copy_nested[tensordict-compile] 0.2520ms 9.6702μs 103.4107 KOps/s 97.9012 KOps/s $\textbf{\color{#35bf28}+5.63\%}$
test_compile_copy_nested[tensordict-eager] 0.1107ms 50.5786μs 19.7712 KOps/s 19.5222 KOps/s $\color{#35bf28}+1.28\%$
test_compile_copy_nested[pytree-compile] 0.1472ms 9.5602μs 104.6007 KOps/s 105.1866 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_copy_nested[pytree-eager] 0.4607ms 64.5649μs 15.4883 KOps/s 15.5073 KOps/s $\color{#d91a1a}-0.12\%$
test_compile_add_one_flat[tensordict-compile] 0.2273ms 0.1735ms 5.7640 KOps/s 5.4699 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_compile_add_one_flat[tensordict-eager] 0.3116ms 0.2753ms 3.6321 KOps/s 3.6114 KOps/s $\color{#35bf28}+0.57\%$
test_compile_add_one_flat[tensorclass-compile] 0.3064ms 0.1150ms 8.6985 KOps/s 8.5564 KOps/s $\color{#35bf28}+1.66\%$
test_compile_add_one_flat[tensorclass-eager] 0.1159ms 75.3891μs 13.2645 KOps/s 13.7948 KOps/s $\color{#d91a1a}-3.84\%$
test_compile_add_one_flat[pytree-compile] 0.2042ms 0.1551ms 6.4481 KOps/s 6.3510 KOps/s $\color{#35bf28}+1.53\%$
test_compile_add_one_flat[pytree-eager] 0.7669ms 0.5039ms 1.9845 KOps/s 1.9475 KOps/s $\color{#35bf28}+1.90\%$
test_compile_add_self_flat[tensordict-eager] 0.4848ms 0.3275ms 3.0535 KOps/s 3.0247 KOps/s $\color{#35bf28}+0.95\%$
test_compile_add_self_flat[tensordict-compile] 0.3082ms 0.1755ms 5.6973 KOps/s 5.4049 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1292ms 87.4687μs 11.4327 KOps/s 11.3693 KOps/s $\color{#35bf28}+0.56\%$
test_compile_add_self_flat[tensorclass-compile] 0.3294ms 0.1169ms 8.5551 KOps/s 7.9955 KOps/s $\textbf{\color{#35bf28}+7.00\%}$
test_compile_add_self_flat[pytree-eager] 0.6337ms 0.4186ms 2.3889 KOps/s 2.3068 KOps/s $\color{#35bf28}+3.56\%$
test_compile_add_self_flat[pytree-compile] 0.1887ms 0.1557ms 6.4209 KOps/s 6.2437 KOps/s $\color{#35bf28}+2.84\%$
test_compile_copy_flat[tensordict-compile] 55.3710μs 13.1493μs 76.0499 KOps/s 73.4540 KOps/s $\color{#35bf28}+3.53\%$
test_compile_copy_flat[tensordict-eager] 65.5310μs 40.0824μs 24.9486 KOps/s 24.9993 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_copy_flat[pytree-compile] 0.1190ms 10.5573μs 94.7208 KOps/s 96.3560 KOps/s $\color{#d91a1a}-1.70\%$
test_compile_copy_flat[pytree-eager] 0.4036ms 51.5072μs 19.4148 KOps/s 19.5658 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_assign_and_add[tensordict-compile] 1.9706ms 0.1714ms 5.8327 KOps/s 5.5493 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_compile_assign_and_add[tensordict-eager] 3.5348ms 3.3142ms 301.7274 Ops/s 307.5907 Ops/s $\color{#d91a1a}-1.91\%$
test_compile_assign_and_add[pytree-compile] 1.9179ms 0.1576ms 6.3438 KOps/s 6.0741 KOps/s $\color{#35bf28}+4.44\%$
test_compile_assign_and_add[pytree-eager] 2.9230ms 2.7507ms 363.5463 Ops/s 368.5598 Ops/s $\color{#d91a1a}-1.36\%$
test_compile_indexing[tensor-tensordict-compile] 0.2190ms 0.1118ms 8.9473 KOps/s 8.9589 KOps/s $\color{#d91a1a}-0.13\%$
test_compile_indexing[tensor-tensordict-eager] 0.3156ms 75.6438μs 13.2198 KOps/s 13.6862 KOps/s $\color{#d91a1a}-3.41\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2139ms 97.7599μs 10.2291 KOps/s 10.3475 KOps/s $\color{#d91a1a}-1.14\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2501ms 44.7380μs 22.3524 KOps/s 23.3598 KOps/s $\color{#d91a1a}-4.31\%$
test_compile_indexing[tensor-pytree-compile] 0.1737ms 98.6168μs 10.1403 KOps/s 10.5005 KOps/s $\color{#d91a1a}-3.43\%$
test_compile_indexing[tensor-pytree-eager] 0.2592ms 44.8896μs 22.2769 KOps/s 23.3753 KOps/s $\color{#d91a1a}-4.70\%$
test_compile_indexing[slice-tensordict-compile] 0.1975ms 56.6533μs 17.6512 KOps/s 16.4805 KOps/s $\textbf{\color{#35bf28}+7.10\%}$
test_compile_indexing[slice-tensordict-eager] 0.2221ms 26.9639μs 37.0866 KOps/s 37.9288 KOps/s $\color{#d91a1a}-2.22\%$
test_compile_indexing[slice-tensorclass-compile] 0.1537ms 44.5785μs 22.4324 KOps/s 21.9284 KOps/s $\color{#35bf28}+2.30\%$
test_compile_indexing[slice-tensorclass-eager] 0.2568ms 21.2579μs 47.0412 KOps/s 46.3510 KOps/s $\color{#35bf28}+1.49\%$
test_compile_indexing[slice-pytree-compile] 85.7710μs 45.7084μs 21.8778 KOps/s 22.5172 KOps/s $\color{#d91a1a}-2.84\%$
test_compile_indexing[slice-pytree-eager] 0.2744ms 21.5654μs 46.3705 KOps/s 46.4178 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_indexing[int-tensordict-compile] 0.1362ms 57.2840μs 17.4569 KOps/s 17.3533 KOps/s $\color{#35bf28}+0.60\%$
test_compile_indexing[int-tensordict-eager] 0.2389ms 27.5982μs 36.2342 KOps/s 37.5533 KOps/s $\color{#d91a1a}-3.51\%$
test_compile_indexing[int-tensorclass-compile] 0.1022ms 44.3722μs 22.5366 KOps/s 21.6460 KOps/s $\color{#35bf28}+4.11\%$
test_compile_indexing[int-tensorclass-eager] 0.2604ms 21.2694μs 47.0158 KOps/s 46.6142 KOps/s $\color{#35bf28}+0.86\%$
test_compile_indexing[int-pytree-compile] 94.1510μs 44.7105μs 22.3661 KOps/s 21.9285 KOps/s $\color{#35bf28}+2.00\%$
test_compile_indexing[int-pytree-eager] 0.2608ms 21.3546μs 46.8282 KOps/s 46.4967 KOps/s $\color{#35bf28}+0.71\%$
test_compile_replace[single-eager] 0.1095ms 47.1690μs 21.2004 KOps/s 22.1953 KOps/s $\color{#d91a1a}-4.48\%$
test_compile_replace[single-compile] 0.1714ms 0.1053ms 9.4939 KOps/s 9.4245 KOps/s $\color{#35bf28}+0.74\%$
test_compile_replace[multi-eager] 0.6229ms 0.5627ms 1.7773 KOps/s 1.8377 KOps/s $\color{#d91a1a}-3.29\%$
test_compile_replace[multi-compile] 0.2628ms 0.1102ms 9.0756 KOps/s 8.6270 KOps/s $\textbf{\color{#35bf28}+5.20\%}$
test_compile_tc_getattr_20[eager] 0.2151ms 0.1628ms 6.1411 KOps/s 6.1038 KOps/s $\color{#35bf28}+0.61\%$
test_compile_tc_getattr_20[compile] 0.3152ms 0.1189ms 8.4077 KOps/s 8.4732 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_clone_shallow[20-eager] 54.3110μs 18.6009μs 53.7609 KOps/s 53.8095 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_clone_shallow[20-compile] 67.3010μs 11.0020μs 90.8922 KOps/s 90.5705 KOps/s $\color{#35bf28}+0.36\%$
test_compile_clone_shallow[40-eager] 90.6810μs 32.8757μs 30.4176 KOps/s 30.7664 KOps/s $\color{#d91a1a}-1.13\%$
test_compile_clone_shallow[40-compile] 0.1806ms 12.1626μs 82.2191 KOps/s 80.4860 KOps/s $\color{#35bf28}+2.15\%$
test_compile_clone_shallow[80-eager] 0.1199ms 61.3453μs 16.3012 KOps/s 16.5544 KOps/s $\color{#d91a1a}-1.53\%$
test_compile_clone_shallow[80-compile] 48.3610μs 15.0108μs 66.6188 KOps/s 67.1884 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_update_inplace[eager] 0.1069ms 57.7756μs 17.3083 KOps/s 17.3916 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_update_inplace[compile] 0.2039ms 0.1331ms 7.5106 KOps/s 7.0600 KOps/s $\textbf{\color{#35bf28}+6.38\%}$
test_mod_add[eager] 88.9220μs 47.2403μs 21.1684 KOps/s 21.3474 KOps/s $\color{#d91a1a}-0.84\%$
test_mod_add[compile] 0.5193ms 0.1003ms 9.9738 KOps/s 9.2071 KOps/s $\textbf{\color{#35bf28}+8.33\%}$
test_mod_add[compile-overhead] 0.3762ms 0.1477ms 6.7689 KOps/s 6.7233 KOps/s $\color{#35bf28}+0.68\%$
test_mod_wrap[eager] 0.3520ms 0.2809ms 3.5601 KOps/s 3.3860 KOps/s $\textbf{\color{#35bf28}+5.14\%}$
test_mod_wrap[compile] 0.5081ms 0.3405ms 2.9366 KOps/s 2.9176 KOps/s $\color{#35bf28}+0.65\%$
test_mod_wrap[compile-overhead] 7.3268ms 4.0242ms 248.4955 Ops/s 252.3634 Ops/s $\color{#d91a1a}-1.53\%$
test_mod_wrap_and_backward[eager] 1.6764ms 1.4742ms 678.3110 Ops/s 668.9375 Ops/s $\color{#35bf28}+1.40\%$
test_mod_wrap_and_backward[compile] 1.5428ms 1.4097ms 709.3700 Ops/s 705.8707 Ops/s $\color{#35bf28}+0.50\%$
test_mod_wrap_and_backward[compile-overhead] 1.2367ms 0.8709ms 1.1483 KOps/s 1.1411 KOps/s $\color{#35bf28}+0.63\%$
test_seq_add[eager] 0.2088ms 0.1473ms 6.7879 KOps/s 6.6645 KOps/s $\color{#35bf28}+1.85\%$
test_seq_add[compile] 0.2914ms 0.1099ms 9.1029 KOps/s 8.7859 KOps/s $\color{#35bf28}+3.61\%$
test_seq_add[compile-overhead] 0.2192ms 0.1490ms 6.7126 KOps/s 6.4874 KOps/s $\color{#35bf28}+3.47\%$
test_seq_wrap[eager] 0.5736ms 0.5052ms 1.9793 KOps/s 1.9879 KOps/s $\color{#d91a1a}-0.44\%$
test_seq_wrap[compile] 0.4343ms 0.3550ms 2.8173 KOps/s 2.7879 KOps/s $\color{#35bf28}+1.05\%$
test_seq_wrap[compile-overhead] 0.3265ms 0.2585ms 3.8687 KOps/s 3.8490 KOps/s $\color{#35bf28}+0.51\%$
test_func_call_runtime[False-eager] 0.8814ms 0.8063ms 1.2402 KOps/s 1.2462 KOps/s $\color{#d91a1a}-0.48\%$
test_func_call_runtime[False-compile] 0.9837ms 0.8785ms 1.1383 KOps/s 1.1298 KOps/s $\color{#35bf28}+0.75\%$
test_func_call_runtime[False-compile-overhead] 0.5202ms 0.4437ms 2.2536 KOps/s 2.2269 KOps/s $\color{#35bf28}+1.20\%$
test_func_call_runtime[True-eager] 1.1815ms 1.0375ms 963.8985 Ops/s 958.1684 Ops/s $\color{#35bf28}+0.60\%$
test_func_call_runtime[True-compile] 0.9594ms 0.8865ms 1.1281 KOps/s 1.0836 KOps/s $\color{#35bf28}+4.11\%$
test_func_call_runtime[True-compile-overhead] 0.5144ms 0.4550ms 2.1977 KOps/s 2.1543 KOps/s $\color{#35bf28}+2.02\%$
test_func_call_cm_runtime[False-eager] 1.1505ms 0.8059ms 1.2409 KOps/s 1.2331 KOps/s $\color{#35bf28}+0.63\%$
test_func_call_cm_runtime[False-compile] 0.9734ms 0.8788ms 1.1380 KOps/s 1.1258 KOps/s $\color{#35bf28}+1.08\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5000ms 0.4470ms 2.2373 KOps/s 2.2117 KOps/s $\color{#35bf28}+1.16\%$
test_func_call_cm_runtime[True-eager] 1.2881ms 1.1767ms 849.8311 Ops/s 831.0437 Ops/s $\color{#35bf28}+2.26\%$
test_func_call_cm_runtime[True-compile] 1.0643ms 0.9338ms 1.0709 KOps/s 1.0631 KOps/s $\color{#35bf28}+0.74\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5738ms 0.4873ms 2.0522 KOps/s 2.0066 KOps/s $\color{#35bf28}+2.27\%$
test_vmap_func_call_cm_runtime[eager] 2.7736ms 2.3063ms 433.6018 Ops/s 429.9375 Ops/s $\color{#35bf28}+0.85\%$
test_vmap_func_call_cm_runtime[compile] 1.0087ms 0.9400ms 1.0638 KOps/s 1.0470 KOps/s $\color{#35bf28}+1.61\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5340ms 0.4924ms 2.0307 KOps/s 1.9913 KOps/s $\color{#35bf28}+1.98\%$
test_distributed 2.9237ms 0.1564ms 6.3951 KOps/s 6.5930 KOps/s $\color{#d91a1a}-3.00\%$
test_tdmodule 0.7404ms 27.4274μs 36.4599 KOps/s 37.1106 KOps/s $\color{#d91a1a}-1.75\%$
test_tdmodule_dispatch 73.1310μs 43.8796μs 22.7896 KOps/s 22.5539 KOps/s $\color{#35bf28}+1.05\%$
test_tdseq 45.5310μs 25.9340μs 38.5594 KOps/s 38.1209 KOps/s $\color{#35bf28}+1.15\%$
test_tdseq_dispatch 79.3520μs 45.7623μs 21.8521 KOps/s 21.7433 KOps/s $\color{#35bf28}+0.50\%$
test_instantiation_functorch 2.0467ms 1.9741ms 506.5563 Ops/s 502.2256 Ops/s $\color{#35bf28}+0.86\%$
test_exec_functorch 0.2497ms 0.1781ms 5.6145 KOps/s 5.7718 KOps/s $\color{#d91a1a}-2.72\%$
test_exec_functional_call 0.1894ms 0.1558ms 6.4171 KOps/s 6.4885 KOps/s $\color{#d91a1a}-1.10\%$
test_exec_td_decorator 0.4306ms 0.2272ms 4.4011 KOps/s 4.3895 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed_decorator[True-True] 1.0045ms 0.7997ms 1.2504 KOps/s 1.2338 KOps/s $\color{#35bf28}+1.35\%$
test_vmap_mlp_speed_decorator[True-False] 0.9742ms 0.7981ms 1.2530 KOps/s 1.2371 KOps/s $\color{#35bf28}+1.28\%$
test_vmap_mlp_speed_decorator[False-True] 0.8865ms 0.6912ms 1.4468 KOps/s 1.4186 KOps/s $\color{#35bf28}+1.99\%$
test_vmap_mlp_speed_decorator[False-False] 0.8765ms 0.6930ms 1.4431 KOps/s 1.3940 KOps/s $\color{#35bf28}+3.52\%$
test_vmap_transformer_speed_decorator[True-True] 20.1729ms 20.0469ms 49.8831 Ops/s 49.4307 Ops/s $\color{#35bf28}+0.92\%$
test_vmap_transformer_speed_decorator[True-False] 20.2045ms 20.0664ms 49.8345 Ops/s 49.3949 Ops/s $\color{#35bf28}+0.89\%$
test_vmap_transformer_speed_decorator[False-True] 20.5757ms 19.8974ms 50.2577 Ops/s 49.8765 Ops/s $\color{#35bf28}+0.76\%$
test_vmap_transformer_speed_decorator[False-False] 20.3919ms 19.8943ms 50.2657 Ops/s 49.8832 Ops/s $\color{#35bf28}+0.77\%$
test_to_module_speed[True] 1.9485ms 1.4091ms 709.6781 Ops/s 700.2111 Ops/s $\color{#35bf28}+1.35\%$
test_to_module_speed[False] 1.8542ms 1.3738ms 727.9298 Ops/s 714.0486 Ops/s $\color{#35bf28}+1.94\%$
test_tc_init 70.8710μs 42.7568μs 23.3881 KOps/s 23.0945 KOps/s $\color{#35bf28}+1.27\%$
test_tc_init_tensor_only 33.9910μs 9.2502μs 108.1057 KOps/s 108.8304 KOps/s $\color{#d91a1a}-0.67\%$
test_tc_init_nested 0.3643ms 85.3820μs 11.7121 KOps/s 11.5583 KOps/s $\color{#35bf28}+1.33\%$
test_tc_init_many_fields 78.5810μs 15.7123μs 63.6446 KOps/s 64.2410 KOps/s $\color{#d91a1a}-0.93\%$
test_tc_first_layer_tensor 27.4810μs 1.7198μs 581.4536 KOps/s 584.9094 KOps/s $\color{#d91a1a}-0.59\%$
test_tc_first_layer_tensor_only 2.2700μs 0.3818μs 2.6192 MOps/s 2.5742 MOps/s $\color{#35bf28}+1.75\%$
test_tc_first_layer_tensor_set 39.8210μs 3.6915μs 270.8908 KOps/s 268.6168 KOps/s $\color{#35bf28}+0.85\%$
test_tc_first_layer_tensor_only_set 24.6710μs 3.0889μs 323.7362 KOps/s 319.5773 KOps/s $\color{#35bf28}+1.30\%$
test_tc_first_layer_nontensor 30.4800μs 5.8585μs 170.6914 KOps/s 172.3147 KOps/s $\color{#d91a1a}-0.94\%$
test_tc_second_layer_tensor 45.7010μs 4.1549μs 240.6797 KOps/s 235.7931 KOps/s $\color{#35bf28}+2.07\%$
test_tc_second_layer_nontensor 31.2710μs 8.3184μs 120.2154 KOps/s 120.7440 KOps/s $\color{#d91a1a}-0.44\%$
test_unbind 0.2639s 17.0437ms 58.6728 Ops/s 55.1392 Ops/s $\textbf{\color{#35bf28}+6.41\%}$
test_full_like 5.0348ms 4.4048ms 227.0243 Ops/s 59.3602 Ops/s $\textbf{\color{#35bf28}+282.45\%}$
test_zeros_like 4.9890ms 4.3868ms 227.9555 Ops/s 59.4565 Ops/s $\textbf{\color{#35bf28}+283.40\%}$
test_ones_like 4.8867ms 4.3952ms 227.5231 Ops/s 59.3844 Ops/s $\textbf{\color{#35bf28}+283.14\%}$
test_clone 6.8351ms 6.5965ms 151.5948 Ops/s 55.4774 Ops/s $\textbf{\color{#35bf28}+173.26\%}$
test_squeeze 0.1767ms 13.5540μs 73.7791 KOps/s 72.1011 KOps/s $\color{#35bf28}+2.33\%$
test_unsqueeze 0.1803ms 0.1093ms 9.1486 KOps/s 8.8205 KOps/s $\color{#35bf28}+3.72\%$
test_split 0.3444ms 0.1775ms 5.6348 KOps/s 5.4029 KOps/s $\color{#35bf28}+4.29\%$
test_permute 0.2627ms 0.2003ms 4.9934 KOps/s 4.8037 KOps/s $\color{#35bf28}+3.95\%$
test_stack 35.5903ms 35.2742ms 28.3493 Ops/s 19.1189 Ops/s $\textbf{\color{#35bf28}+48.28\%}$
test_cat 35.5180ms 35.1998ms 28.4093 Ops/s 19.1711 Ops/s $\textbf{\color{#35bf28}+48.19\%}$
test_sequential_tensordict 0.2649ms 0.2079ms 4.8103 KOps/s 4.6210 KOps/s $\color{#35bf28}+4.10\%$
test_sequential_graph_module 0.1610ms 0.1143ms 8.7508 KOps/s 8.2879 KOps/s $\textbf{\color{#35bf28}+5.59\%}$
test_nested_tensordict 0.3727ms 0.2780ms 3.5972 KOps/s 3.4827 KOps/s $\color{#35bf28}+3.29\%$
test_nested_graph_module 0.2242ms 0.1318ms 7.5857 KOps/s 7.6489 KOps/s $\color{#d91a1a}-0.83\%$

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 29.1210μs 14.9734μs 66.7849 KOps/s 66.7369 KOps/s $\color{#35bf28}+0.07\%$
test_plain_set_stack_nested 31.9500μs 15.2942μs 65.3845 KOps/s 66.2761 KOps/s $\color{#d91a1a}-1.35\%$
test_plain_set_nested_inplace 41.0610μs 16.7826μs 59.5855 KOps/s 59.1133 KOps/s $\color{#35bf28}+0.80\%$
test_plain_set_stack_nested_inplace 45.2710μs 16.8056μs 59.5041 KOps/s 59.7575 KOps/s $\color{#d91a1a}-0.42\%$
test_items 81.1620μs 5.9896μs 166.9553 KOps/s 166.2827 KOps/s $\color{#35bf28}+0.40\%$
test_items_nested 0.5164ms 0.4633ms 2.1585 KOps/s 2.1332 KOps/s $\color{#35bf28}+1.18\%$
test_items_nested_locked 0.5268ms 0.4702ms 2.1268 KOps/s 2.1316 KOps/s $\color{#d91a1a}-0.23\%$
test_items_nested_leaf 0.1298ms 99.6165μs 10.0385 KOps/s 10.0889 KOps/s $\color{#d91a1a}-0.50\%$
test_items_stack_nested 0.4983ms 0.4659ms 2.1466 KOps/s 2.1326 KOps/s $\color{#35bf28}+0.66\%$
test_items_stack_nested_leaf 0.1860ms 98.6724μs 10.1345 KOps/s 10.0941 KOps/s $\color{#35bf28}+0.40\%$
test_items_stack_nested_locked 0.5017ms 0.4738ms 2.1108 KOps/s 2.1326 KOps/s $\color{#d91a1a}-1.02\%$
test_keys 46.5000μs 4.2335μs 236.2131 KOps/s 234.4990 KOps/s $\color{#35bf28}+0.73\%$
test_keys_nested 0.1745ms 0.1292ms 7.7429 KOps/s 7.7069 KOps/s $\color{#35bf28}+0.47\%$
test_keys_nested_locked 2.1171ms 0.1389ms 7.1976 KOps/s 7.1915 KOps/s $\color{#35bf28}+0.09\%$
test_keys_nested_leaf 0.1552ms 0.1207ms 8.2817 KOps/s 8.2495 KOps/s $\color{#35bf28}+0.39\%$
test_keys_stack_nested 0.1647ms 0.1303ms 7.6745 KOps/s 7.7220 KOps/s $\color{#d91a1a}-0.61\%$
test_keys_stack_nested_leaf 0.1521ms 0.1208ms 8.2759 KOps/s 8.3017 KOps/s $\color{#d91a1a}-0.31\%$
test_keys_stack_nested_locked 0.1878ms 0.1380ms 7.2475 KOps/s 7.2418 KOps/s $\color{#35bf28}+0.08\%$
test_values 6.8760μs 1.0325μs 968.5682 KOps/s 976.4249 KOps/s $\color{#d91a1a}-0.80\%$
test_values_nested 78.7710μs 52.5351μs 19.0349 KOps/s 19.1824 KOps/s $\color{#d91a1a}-0.77\%$
test_values_nested_locked 81.5220μs 56.3740μs 17.7387 KOps/s 17.8245 KOps/s $\color{#d91a1a}-0.48\%$
test_values_nested_leaf 98.6220μs 59.9604μs 16.6777 KOps/s 16.5542 KOps/s $\color{#35bf28}+0.75\%$
test_values_stack_nested 91.7020μs 52.2525μs 19.1379 KOps/s 18.9917 KOps/s $\color{#35bf28}+0.77\%$
test_values_stack_nested_leaf 0.1115ms 59.6800μs 16.7560 KOps/s 16.5378 KOps/s $\color{#35bf28}+1.32\%$
test_values_stack_nested_locked 97.6920μs 55.8526μs 17.9043 KOps/s 18.1336 KOps/s $\color{#d91a1a}-1.26\%$
test_membership 6.2583μs 0.8464μs 1.1814 MOps/s 1.1733 MOps/s $\color{#35bf28}+0.69\%$
test_membership_nested 29.9200μs 2.8913μs 345.8608 KOps/s 347.6444 KOps/s $\color{#d91a1a}-0.51\%$
test_membership_nested_leaf 67.0910μs 2.9148μs 343.0713 KOps/s 346.2628 KOps/s $\color{#d91a1a}-0.92\%$
test_membership_stacked_nested 23.6400μs 2.9125μs 343.3479 KOps/s 344.1282 KOps/s $\color{#d91a1a}-0.23\%$
test_membership_stacked_nested_leaf 33.8510μs 2.9153μs 343.0157 KOps/s 347.8675 KOps/s $\color{#d91a1a}-1.39\%$
test_membership_nested_last 27.5000μs 4.2837μs 233.4443 KOps/s 231.1597 KOps/s $\color{#35bf28}+0.99\%$
test_membership_nested_leaf_last 24.4800μs 4.3562μs 229.5593 KOps/s 231.7868 KOps/s $\color{#d91a1a}-0.96\%$
test_membership_stacked_nested_last 28.9400μs 4.3651μs 229.0915 KOps/s 232.2604 KOps/s $\color{#d91a1a}-1.36\%$
test_membership_stacked_nested_leaf_last 46.9510μs 4.3266μs 231.1276 KOps/s 231.2833 KOps/s $\color{#d91a1a}-0.07\%$
test_nested_getleaf 90.0410μs 21.8823μs 45.6990 KOps/s 46.3581 KOps/s $\color{#d91a1a}-1.42\%$
test_nested_get 55.4810μs 20.3703μs 49.0910 KOps/s 49.2792 KOps/s $\color{#d91a1a}-0.38\%$
test_stacked_getleaf 53.5410μs 21.4981μs 46.5157 KOps/s 46.5158 KOps/s $-0.00\%$
test_stacked_get 51.9710μs 20.7002μs 48.3088 KOps/s 49.4772 KOps/s $\color{#d91a1a}-2.36\%$
test_nested_getitemleaf 0.1298ms 21.3850μs 46.7617 KOps/s 45.7059 KOps/s $\color{#35bf28}+2.31\%$
test_nested_getitem 53.4110μs 20.8258μs 48.0173 KOps/s 47.8463 KOps/s $\color{#35bf28}+0.36\%$
test_stacked_getitemleaf 46.3010μs 21.8770μs 45.7101 KOps/s 45.6299 KOps/s $\color{#35bf28}+0.18\%$
test_stacked_getitem 47.0410μs 20.7768μs 48.1306 KOps/s 47.2081 KOps/s $\color{#35bf28}+1.95\%$
test_lock_nested 0.5883ms 0.4777ms 2.0935 KOps/s 2.0905 KOps/s $\color{#35bf28}+0.14\%$
test_lock_stack_nested 0.5299ms 0.4797ms 2.0848 KOps/s 2.0637 KOps/s $\color{#35bf28}+1.02\%$
test_unlock_nested 0.4722ms 0.3896ms 2.5667 KOps/s 2.5609 KOps/s $\color{#35bf28}+0.23\%$
test_unlock_stack_nested 0.4531ms 0.3902ms 2.5627 KOps/s 2.5267 KOps/s $\color{#35bf28}+1.43\%$
test_flatten_speed 0.1770ms 0.1237ms 8.0814 KOps/s 8.1201 KOps/s $\color{#d91a1a}-0.48\%$
test_unflatten_speed 0.6208ms 0.5688ms 1.7580 KOps/s 1.7634 KOps/s $\color{#d91a1a}-0.30\%$
test_common_ops 0.9383ms 0.6938ms 1.4413 KOps/s 1.4177 KOps/s $\color{#35bf28}+1.67\%$
test_creation 0.1145ms 3.1748μs 314.9785 KOps/s 318.0065 KOps/s $\color{#d91a1a}-0.95\%$
test_creation_empty 33.1510μs 7.0011μs 142.8355 KOps/s 143.3830 KOps/s $\color{#d91a1a}-0.38\%$
test_creation_nested_1 41.4510μs 11.5752μs 86.3919 KOps/s 86.7919 KOps/s $\color{#d91a1a}-0.46\%$
test_creation_nested_2 40.6810μs 13.2396μs 75.5309 KOps/s 74.9807 KOps/s $\color{#35bf28}+0.73\%$
test_creation_many_keys[10] 64.2420μs 21.2317μs 47.0993 KOps/s 47.4688 KOps/s $\color{#d91a1a}-0.78\%$
test_creation_many_keys[50] 0.1366ms 92.2125μs 10.8445 KOps/s 10.9727 KOps/s $\color{#d91a1a}-1.17\%$
test_creation_many_keys[100] 0.2243ms 0.1809ms 5.5286 KOps/s 5.6273 KOps/s $\color{#d91a1a}-1.75\%$
test_creation_nested_many_keys[10] 72.9910μs 45.4590μs 21.9979 KOps/s 22.1264 KOps/s $\color{#d91a1a}-0.58\%$
test_creation_nested_many_keys[50] 0.2183ms 0.1852ms 5.3987 KOps/s 5.4211 KOps/s $\color{#d91a1a}-0.41\%$
test_clone 42.1500μs 13.1777μs 75.8859 KOps/s 74.1090 KOps/s $\color{#35bf28}+2.40\%$
test_getitem[int] 1.5547ms 15.0112μs 66.6170 KOps/s 59.8943 KOps/s $\textbf{\color{#35bf28}+11.22\%}$
test_getitem[slice_int] 0.1390ms 24.1160μs 41.4662 KOps/s 41.1270 KOps/s $\color{#35bf28}+0.82\%$
test_getitem[range] 0.1697ms 61.8070μs 16.1794 KOps/s 15.0874 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_getitem[tuple] 0.1521ms 23.9193μs 41.8072 KOps/s 41.9820 KOps/s $\color{#d91a1a}-0.42\%$
test_getitem[list] 0.1790ms 56.6272μs 17.6594 KOps/s 17.1190 KOps/s $\color{#35bf28}+3.16\%$
test_setitem_dim[int] 48.8310μs 25.1914μs 39.6961 KOps/s 37.8830 KOps/s $\color{#35bf28}+4.79\%$
test_setitem_dim[slice_int] 62.4910μs 41.7970μs 23.9252 KOps/s 23.2862 KOps/s $\color{#35bf28}+2.74\%$
test_setitem_dim[range] 0.1225ms 93.5775μs 10.6863 KOps/s 10.6868 KOps/s $-0.00\%$
test_setitem_dim[tuple] 66.3710μs 38.7307μs 25.8193 KOps/s 25.3527 KOps/s $\color{#35bf28}+1.84\%$
test_setitem 47.9010μs 17.3828μs 57.5280 KOps/s 55.8101 KOps/s $\color{#35bf28}+3.08\%$
test_set 40.2510μs 16.7394μs 59.7393 KOps/s 58.5777 KOps/s $\color{#35bf28}+1.98\%$
test_set_shared 0.6334ms 0.2067ms 4.8389 KOps/s 4.7085 KOps/s $\color{#35bf28}+2.77\%$
test_update 0.4515ms 21.3619μs 46.8124 KOps/s 45.4338 KOps/s $\color{#35bf28}+3.03\%$
test_update_nested 69.1210μs 32.7373μs 30.5462 KOps/s 29.9491 KOps/s $\color{#35bf28}+1.99\%$
test_update__nested 0.4889ms 33.3281μs 30.0047 KOps/s 28.6655 KOps/s $\color{#35bf28}+4.67\%$
test_set_nested 54.1710μs 18.7751μs 53.2622 KOps/s 52.4980 KOps/s $\color{#35bf28}+1.46\%$
test_set_nested_new 60.9510μs 23.7880μs 42.0381 KOps/s 41.8450 KOps/s $\color{#35bf28}+0.46\%$
test_select 73.2810μs 40.0874μs 24.9455 KOps/s 24.7061 KOps/s $\color{#35bf28}+0.97\%$
test_select_nested 0.1023ms 74.3953μs 13.4417 KOps/s 13.6216 KOps/s $\color{#d91a1a}-1.32\%$
test_exclude_nested 0.1377ms 91.1171μs 10.9749 KOps/s 11.0470 KOps/s $\color{#d91a1a}-0.65\%$
test_empty[True] 0.4259ms 0.3977ms 2.5142 KOps/s 2.5177 KOps/s $\color{#d91a1a}-0.14\%$
test_empty[False] 7.8102μs 1.3197μs 757.7758 KOps/s 773.5999 KOps/s $\color{#d91a1a}-2.05\%$
test_to 0.1056ms 72.7765μs 13.7407 KOps/s 13.7530 KOps/s $\color{#d91a1a}-0.09\%$
test_to_nonblocking 0.1131ms 65.3397μs 15.3046 KOps/s 15.1682 KOps/s $\color{#35bf28}+0.90\%$
test_unbind_speed 0.3717ms 0.3317ms 3.0149 KOps/s 2.9919 KOps/s $\color{#35bf28}+0.77\%$
test_unbind_speed_stack0 0.4030ms 0.3296ms 3.0337 KOps/s 3.0163 KOps/s $\color{#35bf28}+0.58\%$
test_unbind_speed_stack1 0.1068s 0.9201ms 1.0869 KOps/s 1.1714 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_split 1.1984ms 1.1397ms 877.3913 Ops/s 785.5224 Ops/s $\textbf{\color{#35bf28}+11.70\%}$
test_chunk 0.1072s 1.2137ms 823.9068 Ops/s 925.7417 Ops/s $\textbf{\color{#d91a1a}-11.00\%}$
test_to_cpu_blocking 18.9162ms 18.6117ms 53.7297 Ops/s 52.6000 Ops/s $\color{#35bf28}+2.15\%$
test_to_cpu_global_sync 11.3384ms 11.2231ms 89.1017 Ops/s 77.8927 Ops/s $\textbf{\color{#35bf28}+14.39\%}$
test_to_cpu_event_sync 12.4673ms 12.0596ms 82.9213 Ops/s 80.7280 Ops/s $\color{#35bf28}+2.72\%$
test_to_cpu_default 12.3148ms 12.0894ms 82.7173 Ops/s 80.6695 Ops/s $\color{#35bf28}+2.54\%$
test_consolidate[False-None] 4.3301ms 4.1199ms 242.7232 Ops/s 215.0100 Ops/s $\textbf{\color{#35bf28}+12.89\%}$
test_consolidate[default-None] 2.1291ms 2.0440ms 489.2291 Ops/s 479.3427 Ops/s $\color{#35bf28}+2.06\%$
test_consolidate[reduce-overhead-None] 2.0426ms 1.9467ms 513.6862 Ops/s 499.8178 Ops/s $\color{#35bf28}+2.77\%$
test_consolidate_njt[False-None] 8.8481ms 8.4786ms 117.9444 Ops/s 117.4216 Ops/s $\color{#35bf28}+0.45\%$
test_to[False-False-None] 2.2331ms 2.0537ms 486.9179 Ops/s 473.3487 Ops/s $\color{#35bf28}+2.87\%$
test_to[True-False-None] 2.2645ms 1.8980ms 526.8760 Ops/s 527.0959 Ops/s $\color{#d91a1a}-0.04\%$
test_to[within-False-None] 6.3199ms 6.1320ms 163.0782 Ops/s 162.9656 Ops/s $\color{#35bf28}+0.07\%$
test_to[True-default-None] 8.8644ms 8.6824ms 115.1754 Ops/s 112.8102 Ops/s $\color{#35bf28}+2.10\%$
test_to_njt[False-False-None] 8.5953ms 8.4503ms 118.3392 Ops/s 116.7291 Ops/s $\color{#35bf28}+1.38\%$
test_to_njt[True-False-None] 7.0457ms 6.9107ms 144.7037 Ops/s 141.4012 Ops/s $\color{#35bf28}+2.34\%$
test_to_njt[within-False-None] 15.8740ms 15.6648ms 63.8374 Ops/s 63.1720 Ops/s $\color{#35bf28}+1.05\%$
test_creation[device0] 0.3914ms 0.1156ms 8.6517 KOps/s 8.3176 KOps/s $\color{#35bf28}+4.02\%$
test_creation_from_tensor 0.4030ms 0.1125ms 8.8912 KOps/s 8.7836 KOps/s $\color{#35bf28}+1.23\%$
test_add_one[memmap_tensor0] 0.1383ms 6.4558μs 154.9006 KOps/s 149.6523 KOps/s $\color{#35bf28}+3.51\%$
test_contiguous[memmap_tensor0] 19.5710μs 0.6782μs 1.4745 MOps/s 2.1320 MOps/s $\textbf{\color{#d91a1a}-30.84\%}$
test_stack[memmap_tensor0] 31.3800μs 4.6482μs 215.1354 KOps/s 217.6022 KOps/s $\color{#d91a1a}-1.13\%$
test_memmaptd_index 1.0481ms 0.2712ms 3.6868 KOps/s 3.6748 KOps/s $\color{#35bf28}+0.33\%$
test_memmaptd_index_astensor 0.5374ms 0.3739ms 2.6742 KOps/s 2.6754 KOps/s $\color{#d91a1a}-0.05\%$
test_memmaptd_index_op 0.7635ms 0.6200ms 1.6128 KOps/s 1.5992 KOps/s $\color{#35bf28}+0.85\%$
test_serialize_model 0.1401s 0.1373s 7.2846 Ops/s 7.2912 Ops/s $\color{#d91a1a}-0.09\%$
test_serialize_model_pickle 1.3493s 1.2107s 0.8260 Ops/s 0.8261 Ops/s $\color{#d91a1a}-0.02\%$
test_serialize_weights 0.1376s 0.1359s 7.3583 Ops/s 7.3249 Ops/s $\color{#35bf28}+0.46\%$
test_serialize_weights_returnearly 0.4476s 94.9856ms 10.5279 Ops/s 14.8097 Ops/s $\textbf{\color{#d91a1a}-28.91\%}$
test_serialize_weights_pickle 1.3746s 1.1911s 0.8396 Ops/s 0.8181 Ops/s $\color{#35bf28}+2.62\%$
test_reshape_pytree 0.2067ms 32.8905μs 30.4039 KOps/s 30.4817 KOps/s $\color{#d91a1a}-0.26\%$
test_reshape_td 87.3120μs 46.1950μs 21.6474 KOps/s 21.9999 KOps/s $\color{#d91a1a}-1.60\%$
test_view_pytree 0.2258ms 32.8348μs 30.4555 KOps/s 31.0965 KOps/s $\color{#d91a1a}-2.06\%$
test_view_td 89.9420μs 54.5786μs 18.3222 KOps/s 19.0684 KOps/s $\color{#d91a1a}-3.91\%$
test_unbind_pytree 0.2371ms 36.1129μs 27.6910 KOps/s 27.3272 KOps/s $\color{#35bf28}+1.33\%$
test_unbind_td 0.1982ms 49.4892μs 20.2064 KOps/s 19.7120 KOps/s $\color{#35bf28}+2.51\%$
test_split_pytree 0.2550ms 42.2740μs 23.6552 KOps/s 23.8254 KOps/s $\color{#d91a1a}-0.71\%$
test_split_td 0.1689ms 63.4914μs 15.7502 KOps/s 15.5655 KOps/s $\color{#35bf28}+1.19\%$
test_add_pytree 0.2330ms 42.2940μs 23.6440 KOps/s 23.5172 KOps/s $\color{#35bf28}+0.54\%$
test_add_td 0.1110ms 54.9641μs 18.1937 KOps/s 17.9952 KOps/s $\color{#35bf28}+1.10\%$
test_compile_add_one_nested[tensordict-compile] 0.2022ms 0.1401ms 7.1358 KOps/s 6.8324 KOps/s $\color{#35bf28}+4.44\%$
test_compile_add_one_nested[tensordict-eager] 0.3122ms 0.2027ms 4.9334 KOps/s 4.9464 KOps/s $\color{#d91a1a}-0.26\%$
test_compile_add_one_nested[pytree-compile] 0.1603ms 0.1111ms 9.0020 KOps/s 8.9872 KOps/s $\color{#35bf28}+0.16\%$
test_compile_add_one_nested[pytree-eager] 0.4442ms 0.1782ms 5.6130 KOps/s 5.5894 KOps/s $\color{#35bf28}+0.42\%$
test_compile_copy_nested[tensordict-compile] 0.2559ms 11.1881μs 89.3810 KOps/s 98.7399 KOps/s $\textbf{\color{#d91a1a}-9.48\%}$
test_compile_copy_nested[tensordict-eager] 86.5120μs 53.6030μs 18.6557 KOps/s 18.4986 KOps/s $\color{#35bf28}+0.85\%$
test_compile_copy_nested[pytree-compile] 0.1642ms 9.7085μs 103.0030 KOps/s 103.1693 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_copy_nested[pytree-eager] 0.4712ms 68.1504μs 14.6734 KOps/s 14.6958 KOps/s $\color{#d91a1a}-0.15\%$
test_compile_add_one_flat[tensordict-compile] 0.3204ms 0.1766ms 5.6609 KOps/s 5.1586 KOps/s $\textbf{\color{#35bf28}+9.74\%}$
test_compile_add_one_flat[tensordict-eager] 0.3644ms 0.2809ms 3.5604 KOps/s 3.4914 KOps/s $\color{#35bf28}+1.98\%$
test_compile_add_one_flat[tensorclass-compile] 0.1973ms 0.1203ms 8.3134 KOps/s 7.8609 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1233ms 74.6816μs 13.3902 KOps/s 13.2373 KOps/s $\color{#35bf28}+1.16\%$
test_compile_add_one_flat[pytree-compile] 0.3857ms 0.1594ms 6.2724 KOps/s 5.9968 KOps/s $\color{#35bf28}+4.60\%$
test_compile_add_one_flat[pytree-eager] 0.8122ms 0.5194ms 1.9252 KOps/s 1.8973 KOps/s $\color{#35bf28}+1.47\%$
test_compile_add_self_flat[tensordict-eager] 0.3909ms 0.3328ms 3.0044 KOps/s 2.9569 KOps/s $\color{#35bf28}+1.61\%$
test_compile_add_self_flat[tensordict-compile] 0.2382ms 0.1798ms 5.5631 KOps/s 5.0841 KOps/s $\textbf{\color{#35bf28}+9.42\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1489ms 90.7747μs 11.0163 KOps/s 11.1786 KOps/s $\color{#d91a1a}-1.45\%$
test_compile_add_self_flat[tensorclass-compile] 0.2354ms 0.1223ms 8.1770 KOps/s 7.7289 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_compile_add_self_flat[pytree-eager] 0.6689ms 0.4348ms 2.3000 KOps/s 2.2967 KOps/s $\color{#35bf28}+0.14\%$
test_compile_add_self_flat[pytree-compile] 0.2164ms 0.1598ms 6.2586 KOps/s 6.0495 KOps/s $\color{#35bf28}+3.46\%$
test_compile_copy_flat[tensordict-compile] 0.1065ms 13.2543μs 75.4474 KOps/s 75.5765 KOps/s $\color{#d91a1a}-0.17\%$
test_compile_copy_flat[tensordict-eager] 85.7020μs 41.2387μs 24.2491 KOps/s 24.0866 KOps/s $\color{#35bf28}+0.67\%$
test_compile_copy_flat[pytree-compile] 0.1135ms 10.7158μs 93.3199 KOps/s 93.4844 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_copy_flat[pytree-eager] 0.4152ms 52.7723μs 18.9493 KOps/s 19.0813 KOps/s $\color{#d91a1a}-0.69\%$
test_compile_assign_and_add[tensordict-compile] 2.0142ms 0.1739ms 5.7501 KOps/s 5.4517 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_compile_assign_and_add[tensordict-eager] 3.4723ms 3.2949ms 303.5016 Ops/s 302.5634 Ops/s $\color{#35bf28}+0.31\%$
test_compile_assign_and_add[pytree-compile] 2.0119ms 0.1608ms 6.2186 KOps/s 5.9648 KOps/s $\color{#35bf28}+4.26\%$
test_compile_assign_and_add[pytree-eager] 2.8630ms 2.7515ms 363.4394 Ops/s 357.3338 Ops/s $\color{#35bf28}+1.71\%$
test_compile_indexing[tensor-tensordict-compile] 0.2253ms 0.1090ms 9.1724 KOps/s 8.3599 KOps/s $\textbf{\color{#35bf28}+9.72\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3164ms 73.0917μs 13.6814 KOps/s 13.4581 KOps/s $\color{#35bf28}+1.66\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1992ms 98.0979μs 10.1939 KOps/s 10.1075 KOps/s $\color{#35bf28}+0.85\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2600ms 45.4529μs 22.0008 KOps/s 22.8179 KOps/s $\color{#d91a1a}-3.58\%$
test_compile_indexing[tensor-pytree-compile] 0.1483ms 97.5423μs 10.2520 KOps/s 9.5309 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_compile_indexing[tensor-pytree-eager] 0.2564ms 44.2970μs 22.5749 KOps/s 22.0274 KOps/s $\color{#35bf28}+2.49\%$
test_compile_indexing[slice-tensordict-compile] 0.1103ms 54.9331μs 18.2040 KOps/s 17.2833 KOps/s $\textbf{\color{#35bf28}+5.33\%}$
test_compile_indexing[slice-tensordict-eager] 0.2204ms 27.2677μs 36.6735 KOps/s 36.9183 KOps/s $\color{#d91a1a}-0.66\%$
test_compile_indexing[slice-tensorclass-compile] 0.1463ms 43.2706μs 23.1104 KOps/s 21.7881 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_compile_indexing[slice-tensorclass-eager] 0.2572ms 22.1995μs 45.0460 KOps/s 45.0873 KOps/s $\color{#d91a1a}-0.09\%$
test_compile_indexing[slice-pytree-compile] 96.6320μs 45.3527μs 22.0494 KOps/s 21.1751 KOps/s $\color{#35bf28}+4.13\%$
test_compile_indexing[slice-pytree-eager] 0.2684ms 21.9986μs 45.4574 KOps/s 45.1415 KOps/s $\color{#35bf28}+0.70\%$
test_compile_indexing[int-tensordict-compile] 0.1061ms 55.0894μs 18.1523 KOps/s 16.9716 KOps/s $\textbf{\color{#35bf28}+6.96\%}$
test_compile_indexing[int-tensordict-eager] 0.2075ms 26.8142μs 37.2937 KOps/s 36.7996 KOps/s $\color{#35bf28}+1.34\%$
test_compile_indexing[int-tensorclass-compile] 92.1920μs 43.7046μs 22.8809 KOps/s 21.4838 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_compile_indexing[int-tensorclass-eager] 0.2634ms 22.1738μs 45.0982 KOps/s 45.2912 KOps/s $\color{#d91a1a}-0.43\%$
test_compile_indexing[int-pytree-compile] 0.1013ms 44.1522μs 22.6489 KOps/s 21.7056 KOps/s $\color{#35bf28}+4.35\%$
test_compile_indexing[int-pytree-eager] 0.2659ms 21.9895μs 45.4763 KOps/s 45.2488 KOps/s $\color{#35bf28}+0.50\%$
test_compile_replace[single-eager] 99.3420μs 46.6874μs 21.4191 KOps/s 21.5229 KOps/s $\color{#d91a1a}-0.48\%$
test_compile_replace[single-compile] 0.1842ms 0.1052ms 9.5024 KOps/s 9.2608 KOps/s $\color{#35bf28}+2.61\%$
test_compile_replace[multi-eager] 0.7133ms 0.5794ms 1.7258 KOps/s 1.7912 KOps/s $\color{#d91a1a}-3.65\%$
test_compile_replace[multi-compile] 0.2588ms 0.1117ms 8.9500 KOps/s 8.7469 KOps/s $\color{#35bf28}+2.32\%$
test_compile_tc_getattr_20[eager] 0.2239ms 0.1659ms 6.0263 KOps/s 6.0302 KOps/s $\color{#d91a1a}-0.06\%$
test_compile_tc_getattr_20[compile] 0.3271ms 0.1199ms 8.3412 KOps/s 8.2301 KOps/s $\color{#35bf28}+1.35\%$
test_compile_clone_shallow[20-eager] 47.8000μs 19.2590μs 51.9237 KOps/s 53.1324 KOps/s $\color{#d91a1a}-2.27\%$
test_compile_clone_shallow[20-compile] 62.0820μs 11.2650μs 88.7708 KOps/s 87.2189 KOps/s $\color{#35bf28}+1.78\%$
test_compile_clone_shallow[40-eager] 77.7310μs 33.4404μs 29.9040 KOps/s 29.6623 KOps/s $\color{#35bf28}+0.81\%$
test_compile_clone_shallow[40-compile] 63.9010μs 12.4582μs 80.2682 KOps/s 76.1360 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_compile_clone_shallow[80-eager] 99.2820μs 62.3292μs 16.0438 KOps/s 15.7490 KOps/s $\color{#35bf28}+1.87\%$
test_compile_clone_shallow[80-compile] 57.6610μs 14.7163μs 67.9521 KOps/s 69.5440 KOps/s $\color{#d91a1a}-2.29\%$
test_compile_update_inplace[eager] 0.1028ms 58.7403μs 17.0241 KOps/s 16.7911 KOps/s $\color{#35bf28}+1.39\%$
test_compile_update_inplace[compile] 0.2582ms 0.1389ms 7.1972 KOps/s 6.7085 KOps/s $\textbf{\color{#35bf28}+7.28\%}$
test_mod_add[eager] 0.1220ms 49.4539μs 20.2208 KOps/s 20.4172 KOps/s $\color{#d91a1a}-0.96\%$
test_mod_add[compile] 0.1721ms 0.1041ms 9.6031 KOps/s 9.3163 KOps/s $\color{#35bf28}+3.08\%$
test_mod_add[compile-overhead] 0.3135ms 0.1490ms 6.7125 KOps/s 6.4987 KOps/s $\color{#35bf28}+3.29\%$
test_mod_wrap[eager] 0.3646ms 0.2914ms 3.4318 KOps/s 3.4484 KOps/s $\color{#d91a1a}-0.48\%$
test_mod_wrap[compile] 0.6309ms 0.3583ms 2.7906 KOps/s 2.8147 KOps/s $\color{#d91a1a}-0.85\%$
test_mod_wrap[compile-overhead] 7.4590ms 4.1025ms 243.7534 Ops/s 248.2331 Ops/s $\color{#d91a1a}-1.80\%$
test_mod_wrap_and_backward[eager] 1.6271ms 1.5043ms 664.7544 Ops/s 671.4262 Ops/s $\color{#d91a1a}-0.99\%$
test_mod_wrap_and_backward[compile] 1.6054ms 1.4397ms 694.6018 Ops/s 689.4579 Ops/s $\color{#35bf28}+0.75\%$
test_mod_wrap_and_backward[compile-overhead] 1.2448ms 0.8885ms 1.1255 KOps/s 1.0995 KOps/s $\color{#35bf28}+2.37\%$
test_seq_add[eager] 0.2303ms 0.1569ms 6.3735 KOps/s 6.5007 KOps/s $\color{#d91a1a}-1.96\%$
test_seq_add[compile] 0.5575ms 0.1144ms 8.7397 KOps/s 8.1318 KOps/s $\textbf{\color{#35bf28}+7.48\%}$
test_seq_add[compile-overhead] 0.2111ms 0.1576ms 6.3463 KOps/s 5.9324 KOps/s $\textbf{\color{#35bf28}+6.98\%}$
test_seq_wrap[eager] 0.7037ms 0.5350ms 1.8691 KOps/s 1.8848 KOps/s $\color{#d91a1a}-0.84\%$
test_seq_wrap[compile] 0.4748ms 0.3743ms 2.6716 KOps/s 2.6332 KOps/s $\color{#35bf28}+1.46\%$
test_seq_wrap[compile-overhead] 0.3377ms 0.2708ms 3.6926 KOps/s 3.6915 KOps/s $\color{#35bf28}+0.03\%$
test_func_call_runtime[False-eager] 0.9710ms 0.8942ms 1.1184 KOps/s 1.1898 KOps/s $\textbf{\color{#d91a1a}-6.00\%}$
test_func_call_runtime[False-compile] 1.1608ms 0.9267ms 1.0792 KOps/s 1.0881 KOps/s $\color{#d91a1a}-0.82\%$
test_func_call_runtime[False-compile-overhead] 0.5167ms 0.4631ms 2.1596 KOps/s 2.1232 KOps/s $\color{#35bf28}+1.72\%$
test_func_call_runtime[True-eager] 1.1885ms 1.0776ms 928.0098 Ops/s 925.7568 Ops/s $\color{#35bf28}+0.24\%$
test_func_call_runtime[True-compile] 1.0103ms 0.9203ms 1.0866 KOps/s 1.0645 KOps/s $\color{#35bf28}+2.07\%$
test_func_call_runtime[True-compile-overhead] 0.5385ms 0.4762ms 2.1001 KOps/s 2.0600 KOps/s $\color{#35bf28}+1.95\%$
test_func_call_cm_runtime[False-eager] 1.2957ms 0.8865ms 1.1280 KOps/s 1.2007 KOps/s $\textbf{\color{#d91a1a}-6.05\%}$
test_func_call_cm_runtime[False-compile] 0.9970ms 0.9086ms 1.1006 KOps/s 1.0892 KOps/s $\color{#35bf28}+1.05\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5416ms 0.4675ms 2.1390 KOps/s 2.1286 KOps/s $\color{#35bf28}+0.49\%$
test_func_call_cm_runtime[True-eager] 1.3434ms 1.2310ms 812.3533 Ops/s 822.9822 Ops/s $\color{#d91a1a}-1.29\%$
test_func_call_cm_runtime[True-compile] 0.9965ms 0.9473ms 1.0557 KOps/s 1.0296 KOps/s $\color{#35bf28}+2.53\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5554ms 0.5079ms 1.9690 KOps/s 1.9106 KOps/s $\color{#35bf28}+3.06\%$
test_vmap_func_call_cm_runtime[eager] 2.8804ms 2.3742ms 421.1882 Ops/s 419.0351 Ops/s $\color{#35bf28}+0.51\%$
test_vmap_func_call_cm_runtime[compile] 1.0390ms 0.9704ms 1.0305 KOps/s 1.0093 KOps/s $\color{#35bf28}+2.11\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5661ms 0.5179ms 1.9309 KOps/s 1.8976 KOps/s $\color{#35bf28}+1.76\%$
test_distributed 0.8121ms 0.1532ms 6.5273 KOps/s 6.5408 KOps/s $\color{#d91a1a}-0.21\%$
test_tdmodule 0.3072ms 27.6923μs 36.1111 KOps/s 35.7193 KOps/s $\color{#35bf28}+1.10\%$
test_tdmodule_dispatch 72.6810μs 44.4640μs 22.4901 KOps/s 21.9698 KOps/s $\color{#35bf28}+2.37\%$
test_tdseq 48.6710μs 26.9789μs 37.0660 KOps/s 37.0605 KOps/s $\color{#35bf28}+0.01\%$
test_tdseq_dispatch 69.0010μs 47.6851μs 20.9709 KOps/s 21.1648 KOps/s $\color{#d91a1a}-0.92\%$
test_instantiation_functorch 2.1768ms 2.0700ms 483.0873 Ops/s 479.8831 Ops/s $\color{#35bf28}+0.67\%$
test_exec_functorch 0.2189ms 0.1781ms 5.6161 KOps/s 5.4980 KOps/s $\color{#35bf28}+2.15\%$
test_exec_functional_call 0.2071ms 0.1590ms 6.2910 KOps/s 6.2286 KOps/s $\color{#35bf28}+1.00\%$
test_exec_td_decorator 0.4426ms 0.2341ms 4.2715 KOps/s 4.2058 KOps/s $\color{#35bf28}+1.56\%$
test_vmap_mlp_speed_decorator[True-True] 1.0070ms 0.8260ms 1.2107 KOps/s 1.2090 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed_decorator[True-False] 1.0332ms 0.8274ms 1.2086 KOps/s 1.2165 KOps/s $\color{#d91a1a}-0.65\%$
test_vmap_mlp_speed_decorator[False-True] 0.9048ms 0.7131ms 1.4023 KOps/s 1.4119 KOps/s $\color{#d91a1a}-0.68\%$
test_vmap_mlp_speed_decorator[False-False] 0.8990ms 0.7135ms 1.4014 KOps/s 1.4083 KOps/s $\color{#d91a1a}-0.49\%$
test_vmap_transformer_speed_decorator[True-True] 21.1480ms 20.5223ms 48.7275 Ops/s 48.5774 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed_decorator[True-False] 21.3982ms 20.5646ms 48.6272 Ops/s 48.6261 Ops/s $+0.00\%$
test_vmap_transformer_speed_decorator[False-True] 20.9987ms 20.3596ms 49.1168 Ops/s 49.1641 Ops/s $\color{#d91a1a}-0.10\%$
test_vmap_transformer_speed_decorator[False-False] 21.1997ms 20.3957ms 49.0299 Ops/s 49.0477 Ops/s $\color{#d91a1a}-0.04\%$
test_to_module_speed[True] 1.5579ms 1.4819ms 674.8291 Ops/s 674.1735 Ops/s $\color{#35bf28}+0.10\%$
test_to_module_speed[False] 1.5642ms 1.4589ms 685.4508 Ops/s 689.5384 Ops/s $\color{#d91a1a}-0.59\%$
test_tc_init 68.3110μs 44.9895μs 22.2274 KOps/s 22.5047 KOps/s $\color{#d91a1a}-1.23\%$
test_tc_init_tensor_only 40.3110μs 9.7048μs 103.0422 KOps/s 101.3112 KOps/s $\color{#35bf28}+1.71\%$
test_tc_init_nested 0.1408ms 88.7637μs 11.2659 KOps/s 11.4285 KOps/s $\color{#d91a1a}-1.42\%$
test_tc_init_many_fields 41.9810μs 16.3301μs 61.2365 KOps/s 59.8829 KOps/s $\color{#35bf28}+2.26\%$
test_tc_first_layer_tensor 27.0500μs 1.8241μs 548.2130 KOps/s 540.5713 KOps/s $\color{#35bf28}+1.41\%$
test_tc_first_layer_tensor_only 2.7034μs 0.4055μs 2.4661 MOps/s 2.5418 MOps/s $\color{#d91a1a}-2.98\%$
test_tc_first_layer_tensor_set 42.1710μs 3.9508μs 253.1162 KOps/s 253.3258 KOps/s $\color{#d91a1a}-0.08\%$
test_tc_first_layer_tensor_only_set 30.5710μs 3.2812μs 304.7711 KOps/s 304.4631 KOps/s $\color{#35bf28}+0.10\%$
test_tc_first_layer_nontensor 34.0510μs 6.1591μs 162.3609 KOps/s 161.7195 KOps/s $\color{#35bf28}+0.40\%$
test_tc_second_layer_tensor 27.7510μs 4.4256μs 225.9563 KOps/s 224.8043 KOps/s $\color{#35bf28}+0.51\%$
test_tc_second_layer_nontensor 37.8410μs 8.6909μs 115.0624 KOps/s 114.1211 KOps/s $\color{#35bf28}+0.82\%$
test_unbind 0.2666s 18.3167ms 54.5949 Ops/s 55.2336 Ops/s $\color{#d91a1a}-1.16\%$
test_full_like 7.5427ms 4.4471ms 224.8650 Ops/s 225.1095 Ops/s $\color{#d91a1a}-0.11\%$
test_zeros_like 5.1286ms 4.4142ms 226.5406 Ops/s 226.1677 Ops/s $\color{#35bf28}+0.16\%$
test_ones_like 4.6109ms 4.4113ms 226.6917 Ops/s 225.5629 Ops/s $\color{#35bf28}+0.50\%$
test_clone 7.3167ms 6.7254ms 148.6903 Ops/s 148.4699 Ops/s $\color{#35bf28}+0.15\%$
test_squeeze 0.2300ms 14.2391μs 70.2291 KOps/s 69.2488 KOps/s $\color{#35bf28}+1.42\%$
test_unsqueeze 0.1645ms 0.1123ms 8.9061 KOps/s 8.9112 KOps/s $\color{#d91a1a}-0.06\%$
test_split 0.2351ms 0.1821ms 5.4909 KOps/s 5.4149 KOps/s $\color{#35bf28}+1.40\%$
test_permute 0.2703ms 0.2028ms 4.9302 KOps/s 4.5704 KOps/s $\textbf{\color{#35bf28}+7.87\%}$
test_stack 36.7227ms 35.7382ms 27.9813 Ops/s 18.5745 Ops/s $\textbf{\color{#35bf28}+50.64\%}$
test_cat 36.2006ms 35.6175ms 28.0761 Ops/s 19.2982 Ops/s $\textbf{\color{#35bf28}+45.49\%}$
test_sequential_tensordict 0.5246ms 0.2140ms 4.6738 KOps/s 4.4058 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_sequential_graph_module 0.2299ms 0.1217ms 8.2203 KOps/s 8.4604 KOps/s $\color{#d91a1a}-2.84\%$
test_nested_tensordict 0.6846ms 0.2811ms 3.5568 KOps/s 3.5104 KOps/s $\color{#35bf28}+1.32\%$
test_nested_graph_module 0.1849ms 0.1308ms 7.6426 KOps/s 7.6602 KOps/s $\color{#d91a1a}-0.23\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant