Skip to content

[DTensor] Add Strategy C (optimal P2P transfer using transfer plan)#1642

Open
vmoens wants to merge 1 commit intogh/vmoens/83/basefrom
gh/vmoens/83/head
Open

[DTensor] Add Strategy C (optimal P2P transfer using transfer plan)#1642
vmoens wants to merge 1 commit intogh/vmoens/83/basefrom
gh/vmoens/83/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Mar 6, 2026

Stack from ghstack (oldest at bottom):

Implement _dtensor_send_optimal and _dtensor_recv_optimal:

  • Sender: computes transfer plan from src/dst meshes and placements,
    extracts only the needed slices from local shards, and sends via P2P
  • Receiver: computes same plan, receives slices into the right positions
    of the local buffer, wraps as DTensor via from_local()
  • Both torch.distributed and UCXX transports supported

Update "auto" strategy resolution to pick "optimal" when dst_mesh/src_mesh
and dst_placements/src_placements are provided, falling back to "materialize"
otherwise.

Add _mesh_to_rank_map and _mesh_all_ranks helpers to _dtensor.py.

Made-with: Cursor

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Mar 6, 2026
Implement _dtensor_send_optimal and _dtensor_recv_optimal:
- Sender: computes transfer plan from src/dst meshes and placements,
  extracts only the needed slices from local shards, and sends via P2P
- Receiver: computes same plan, receives slices into the right positions
  of the local buffer, wraps as DTensor via from_local()
- Both torch.distributed and UCXX transports supported

Update "auto" strategy resolution to pick "optimal" when dst_mesh/src_mesh
and dst_placements/src_placements are provided, falling back to "materialize"
otherwise.

Add _mesh_to_rank_map and _mesh_all_ranks helpers to _dtensor.py.

Made-with: Cursor
ghstack-source-id: faa4852
Pull-Request: #1642
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add Strategy C (optimal P2P transfer using transfer plan)

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

PR Title Label Error

Unknown or invalid prefix [DTensor].

Current title: [DTensor] Add Strategy C (optimal P2P transfer using transfer plan)

Supported Prefixes

Your PR title must start with exactly one of these prefixes (case-insensitive):

Prefix Label Applied Example
[BugFix] or [Fix] bug [BugFix] Fix memory leak in TensorDict
[Feature] Feature [Feature] Add new storage backend
[Doc] or [Docs] documentation [Doc] Update installation guide
[Refactor] Refactor [Refactor] Clean up module imports
[CI] CI [CI] Fix workflow permissions
[Test] or [Tests] Test [Test] Add unit tests for nn module
[Compile] Compile [Compile] Fix torch.compile issue
[Performance] or [Perf] Performance [Perf] Optimize tensor operations
[Deprecation] Deprecation [Deprecation] Mark old function
[Setup] setup [Setup] Update build configuration
[Distributed] or [Dist] Distributed [Distributed] Add scatter collective
[Benchmark] or [Bench] Benchmarks [Benchmark] Add compile benchmark
[Typing] or [Type] Typing [Typing] Add type stubs
[BC-breaking] or [BC] BC-breaking [BC-breaking] Remove deprecated API
[Formatting] or [Format] Formatting [Format] Fix code style
[Quality] Quality [Quality] Improve error messages

Note: Matching is case-insensitive. Common variations (singular/plural) are supported.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}12$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 63.5410μs 14.8422μs 67.3756 KOps/s 67.4779 KOps/s $\color{#d91a1a}-0.15\%$
test_plain_set_stack_nested 39.9410μs 15.5317μs 64.3845 KOps/s 66.7030 KOps/s $\color{#d91a1a}-3.48\%$
test_plain_set_nested_inplace 41.0610μs 16.8204μs 59.4516 KOps/s 60.3784 KOps/s $\color{#d91a1a}-1.54\%$
test_plain_set_stack_nested_inplace 55.5010μs 16.7946μs 59.5431 KOps/s 60.1751 KOps/s $\color{#d91a1a}-1.05\%$
test_items 33.3110μs 6.1059μs 163.7771 KOps/s 165.9065 KOps/s $\color{#d91a1a}-1.28\%$
test_items_nested 0.5294ms 0.4744ms 2.1080 KOps/s 2.1435 KOps/s $\color{#d91a1a}-1.66\%$
test_items_nested_locked 0.5328ms 0.4783ms 2.0907 KOps/s 2.1279 KOps/s $\color{#d91a1a}-1.75\%$
test_items_nested_leaf 0.1352ms 98.9065μs 10.1106 KOps/s 10.0339 KOps/s $\color{#35bf28}+0.76\%$
test_items_stack_nested 0.5133ms 0.4714ms 2.1213 KOps/s 2.1330 KOps/s $\color{#d91a1a}-0.55\%$
test_items_stack_nested_leaf 0.1972ms 96.1778μs 10.3974 KOps/s 10.1982 KOps/s $\color{#35bf28}+1.95\%$
test_items_stack_nested_locked 0.5114ms 0.4751ms 2.1050 KOps/s 2.1110 KOps/s $\color{#d91a1a}-0.28\%$
test_keys 35.0210μs 4.2124μs 237.3921 KOps/s 236.6578 KOps/s $\color{#35bf28}+0.31\%$
test_keys_nested 0.1674ms 0.1323ms 7.5579 KOps/s 7.7783 KOps/s $\color{#d91a1a}-2.83\%$
test_keys_nested_locked 1.8076ms 0.1405ms 7.1154 KOps/s 7.2287 KOps/s $\color{#d91a1a}-1.57\%$
test_keys_nested_leaf 0.1710ms 0.1237ms 8.0838 KOps/s 8.3452 KOps/s $\color{#d91a1a}-3.13\%$
test_keys_stack_nested 0.1716ms 0.1333ms 7.5030 KOps/s 7.7013 KOps/s $\color{#d91a1a}-2.57\%$
test_keys_stack_nested_leaf 0.1614ms 0.1234ms 8.1043 KOps/s 8.2831 KOps/s $\color{#d91a1a}-2.16\%$
test_keys_stack_nested_locked 0.1795ms 0.1414ms 7.0705 KOps/s 7.3056 KOps/s $\color{#d91a1a}-3.22\%$
test_values 13.2150μs 1.0860μs 920.8045 KOps/s 983.7982 KOps/s $\textbf{\color{#d91a1a}-6.40\%}$
test_values_nested 78.8810μs 53.8739μs 18.5619 KOps/s 19.1392 KOps/s $\color{#d91a1a}-3.02\%$
test_values_nested_locked 90.6310μs 57.4366μs 17.4105 KOps/s 18.2288 KOps/s $\color{#d91a1a}-4.49\%$
test_values_nested_leaf 85.7920μs 62.4178μs 16.0211 KOps/s 16.5140 KOps/s $\color{#d91a1a}-2.98\%$
test_values_stack_nested 79.2020μs 53.9612μs 18.5318 KOps/s 18.9697 KOps/s $\color{#d91a1a}-2.31\%$
test_values_stack_nested_leaf 91.1420μs 62.1072μs 16.1012 KOps/s 16.5799 KOps/s $\color{#d91a1a}-2.89\%$
test_values_stack_nested_locked 85.7520μs 57.2675μs 17.4619 KOps/s 17.8699 KOps/s $\color{#d91a1a}-2.28\%$
test_membership 12.4355μs 0.9354μs 1.0691 MOps/s 1.1831 MOps/s $\textbf{\color{#d91a1a}-9.64\%}$
test_membership_nested 33.0910μs 2.9652μs 337.2423 KOps/s 352.0521 KOps/s $\color{#d91a1a}-4.21\%$
test_membership_nested_leaf 34.2800μs 2.9364μs 340.5509 KOps/s 361.7451 KOps/s $\textbf{\color{#d91a1a}-5.86\%}$
test_membership_stacked_nested 36.3910μs 2.9660μs 337.1579 KOps/s 347.7078 KOps/s $\color{#d91a1a}-3.03\%$
test_membership_stacked_nested_leaf 32.9300μs 2.9136μs 343.2191 KOps/s 350.3915 KOps/s $\color{#d91a1a}-2.05\%$
test_membership_nested_last 31.9400μs 4.4765μs 223.3887 KOps/s 230.3215 KOps/s $\color{#d91a1a}-3.01\%$
test_membership_nested_leaf_last 24.6700μs 4.4678μs 223.8229 KOps/s 241.3661 KOps/s $\textbf{\color{#d91a1a}-7.27\%}$
test_membership_stacked_nested_last 35.3610μs 4.4425μs 225.0969 KOps/s 233.6988 KOps/s $\color{#d91a1a}-3.68\%$
test_membership_stacked_nested_leaf_last 34.8710μs 4.4149μs 226.5074 KOps/s 232.8786 KOps/s $\color{#d91a1a}-2.74\%$
test_nested_getleaf 46.2210μs 22.1779μs 45.0900 KOps/s 46.3626 KOps/s $\color{#d91a1a}-2.74\%$
test_nested_get 46.7000μs 21.0231μs 47.5667 KOps/s 48.6833 KOps/s $\color{#d91a1a}-2.29\%$
test_stacked_getleaf 60.6510μs 21.9236μs 45.6130 KOps/s 46.8377 KOps/s $\color{#d91a1a}-2.61\%$
test_stacked_get 43.3110μs 20.8411μs 47.9822 KOps/s 48.7104 KOps/s $\color{#d91a1a}-1.49\%$
test_nested_getitemleaf 48.3010μs 22.4891μs 44.4659 KOps/s 45.3645 KOps/s $\color{#d91a1a}-1.98\%$
test_nested_getitem 49.2110μs 21.2888μs 46.9730 KOps/s 47.8138 KOps/s $\color{#d91a1a}-1.76\%$
test_stacked_getitemleaf 46.4400μs 22.3134μs 44.8161 KOps/s 45.4303 KOps/s $\color{#d91a1a}-1.35\%$
test_stacked_getitem 56.3310μs 21.4489μs 46.6224 KOps/s 47.4663 KOps/s $\color{#d91a1a}-1.78\%$
test_lock_nested 0.5827ms 0.4830ms 2.0704 KOps/s 2.0969 KOps/s $\color{#d91a1a}-1.26\%$
test_lock_stack_nested 0.5400ms 0.4842ms 2.0651 KOps/s 2.0589 KOps/s $\color{#35bf28}+0.30\%$
test_unlock_nested 0.4785ms 0.3951ms 2.5313 KOps/s 2.5649 KOps/s $\color{#d91a1a}-1.31\%$
test_unlock_stack_nested 0.4460ms 0.3928ms 2.5459 KOps/s 2.5287 KOps/s $\color{#35bf28}+0.68\%$
test_flatten_speed 0.1901ms 0.1226ms 8.1564 KOps/s 8.1743 KOps/s $\color{#d91a1a}-0.22\%$
test_unflatten_speed 0.6447ms 0.5780ms 1.7301 KOps/s 1.7577 KOps/s $\color{#d91a1a}-1.57\%$
test_common_ops 0.8491ms 0.7025ms 1.4235 KOps/s 1.4297 KOps/s $\color{#d91a1a}-0.44\%$
test_creation 69.3620μs 3.1856μs 313.9112 KOps/s 317.8948 KOps/s $\color{#d91a1a}-1.25\%$
test_creation_empty 34.2710μs 7.0478μs 141.8875 KOps/s 144.2943 KOps/s $\color{#d91a1a}-1.67\%$
test_creation_nested_1 39.6010μs 11.5792μs 86.3617 KOps/s 86.9181 KOps/s $\color{#d91a1a}-0.64\%$
test_creation_nested_2 37.8010μs 13.3786μs 74.7462 KOps/s 75.9618 KOps/s $\color{#d91a1a}-1.60\%$
test_creation_many_keys[10] 47.6200μs 21.0876μs 47.4211 KOps/s 47.8188 KOps/s $\color{#d91a1a}-0.83\%$
test_creation_many_keys[50] 0.1172ms 89.7460μs 11.1426 KOps/s 11.1709 KOps/s $\color{#d91a1a}-0.25\%$
test_creation_many_keys[100] 0.2226ms 0.1769ms 5.6544 KOps/s 5.6927 KOps/s $\color{#d91a1a}-0.67\%$
test_creation_nested_many_keys[10] 77.0310μs 44.8885μs 22.2774 KOps/s 22.5533 KOps/s $\color{#d91a1a}-1.22\%$
test_creation_nested_many_keys[50] 0.2271ms 0.1829ms 5.4680 KOps/s 5.4915 KOps/s $\color{#d91a1a}-0.43\%$
test_clone 52.2510μs 13.7557μs 72.6972 KOps/s 73.8053 KOps/s $\color{#d91a1a}-1.50\%$
test_getitem[int] 1.5721ms 15.1546μs 65.9865 KOps/s 61.2086 KOps/s $\textbf{\color{#35bf28}+7.81\%}$
test_getitem[slice_int] 0.1374ms 24.4245μs 40.9425 KOps/s 38.8856 KOps/s $\textbf{\color{#35bf28}+5.29\%}$
test_getitem[range] 0.1762ms 64.2365μs 15.5675 KOps/s 14.5892 KOps/s $\textbf{\color{#35bf28}+6.71\%}$
test_getitem[tuple] 0.1431ms 24.2605μs 41.2193 KOps/s 40.2311 KOps/s $\color{#35bf28}+2.46\%$
test_getitem[list] 0.1839ms 59.5900μs 16.7813 KOps/s 15.8133 KOps/s $\textbf{\color{#35bf28}+6.12\%}$
test_setitem_dim[int] 45.3100μs 26.7171μs 37.4292 KOps/s 36.2321 KOps/s $\color{#35bf28}+3.30\%$
test_setitem_dim[slice_int] 65.9610μs 43.8871μs 22.7858 KOps/s 22.9661 KOps/s $\color{#d91a1a}-0.79\%$
test_setitem_dim[range] 0.1205ms 95.9528μs 10.4218 KOps/s 10.0420 KOps/s $\color{#35bf28}+3.78\%$
test_setitem_dim[tuple] 63.5010μs 40.8413μs 24.4850 KOps/s 23.2801 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_setitem 49.8910μs 18.2063μs 54.9259 KOps/s 55.6095 KOps/s $\color{#d91a1a}-1.23\%$
test_set 48.0910μs 17.4474μs 57.3152 KOps/s 58.3236 KOps/s $\color{#d91a1a}-1.73\%$
test_set_shared 0.4937ms 0.2037ms 4.9101 KOps/s 4.9339 KOps/s $\color{#d91a1a}-0.48\%$
test_update 0.3562ms 22.1055μs 45.2376 KOps/s 45.9553 KOps/s $\color{#d91a1a}-1.56\%$
test_update_nested 68.5520μs 34.1211μs 29.3073 KOps/s 30.1087 KOps/s $\color{#d91a1a}-2.66\%$
test_update__nested 0.4458ms 34.4798μs 29.0025 KOps/s 28.8121 KOps/s $\color{#35bf28}+0.66\%$
test_set_nested 57.4710μs 19.4805μs 51.3334 KOps/s 52.3842 KOps/s $\color{#d91a1a}-2.01\%$
test_set_nested_new 61.7910μs 25.9744μs 38.4994 KOps/s 41.4958 KOps/s $\textbf{\color{#d91a1a}-7.22\%}$
test_select 74.0610μs 41.5471μs 24.0691 KOps/s 24.3320 KOps/s $\color{#d91a1a}-1.08\%$
test_select_nested 0.1073ms 74.1962μs 13.4778 KOps/s 13.3557 KOps/s $\color{#35bf28}+0.91\%$
test_exclude_nested 0.1259ms 91.2545μs 10.9584 KOps/s 10.9641 KOps/s $\color{#d91a1a}-0.05\%$
test_empty[True] 0.4594ms 0.4036ms 2.4777 KOps/s 2.5115 KOps/s $\color{#d91a1a}-1.35\%$
test_empty[False] 9.1552μs 1.3101μs 763.2897 KOps/s 770.1656 KOps/s $\color{#d91a1a}-0.89\%$
test_to 0.1039ms 71.8261μs 13.9225 KOps/s 13.7625 KOps/s $\color{#35bf28}+1.16\%$
test_to_nonblocking 0.1172ms 66.2219μs 15.1007 KOps/s 15.4876 KOps/s $\color{#d91a1a}-2.50\%$
test_unbind_speed 0.3640ms 0.3355ms 2.9803 KOps/s 2.9981 KOps/s $\color{#d91a1a}-0.59\%$
test_unbind_speed_stack0 0.3895ms 0.3352ms 2.9830 KOps/s 3.0068 KOps/s $\color{#d91a1a}-0.79\%$
test_unbind_speed_stack1 0.1040s 0.8402ms 1.1902 KOps/s 1.1853 KOps/s $\color{#35bf28}+0.41\%$
test_split 0.1038s 1.2729ms 785.5784 Ops/s 788.7417 Ops/s $\color{#d91a1a}-0.40\%$
test_chunk 0.1036s 1.2137ms 823.9101 Ops/s 927.7292 Ops/s $\textbf{\color{#d91a1a}-11.19\%}$
test_to_cpu_blocking 28.6590ms 28.5435ms 35.0342 Ops/s 35.0003 Ops/s $\color{#35bf28}+0.10\%$
test_to_cpu_global_sync 11.7856ms 11.6732ms 85.6662 Ops/s 77.3851 Ops/s $\textbf{\color{#35bf28}+10.70\%}$
test_to_cpu_event_sync 12.8666ms 12.6640ms 78.9640 Ops/s 79.3636 Ops/s $\color{#d91a1a}-0.50\%$
test_to_cpu_default 0.1151s 13.8692ms 72.1021 Ops/s 78.9556 Ops/s $\textbf{\color{#d91a1a}-8.68\%}$
test_consolidate[False-None] 4.4088ms 4.1830ms 239.0609 Ops/s 217.1123 Ops/s $\textbf{\color{#35bf28}+10.11\%}$
test_consolidate[default-None] 2.1884ms 2.0631ms 484.7183 Ops/s 480.0511 Ops/s $\color{#35bf28}+0.97\%$
test_consolidate[reduce-overhead-None] 2.0473ms 1.9635ms 509.2894 Ops/s 497.8494 Ops/s $\color{#35bf28}+2.30\%$
test_consolidate_njt[False-None] 8.6958ms 8.4975ms 117.6816 Ops/s 116.2700 Ops/s $\color{#35bf28}+1.21\%$
test_to[False-False-None] 2.2200ms 2.1294ms 469.6216 Ops/s 471.4270 Ops/s $\color{#d91a1a}-0.38\%$
test_to[True-False-None] 2.2191ms 1.9626ms 509.5318 Ops/s 516.5224 Ops/s $\color{#d91a1a}-1.35\%$
test_to[within-False-None] 6.3161ms 6.1877ms 161.6104 Ops/s 161.8663 Ops/s $\color{#d91a1a}-0.16\%$
test_to[True-default-None] 8.9547ms 8.8417ms 113.1002 Ops/s 111.6604 Ops/s $\color{#35bf28}+1.29\%$
test_to_njt[False-False-None] 8.5572ms 8.4922ms 117.7551 Ops/s 115.4276 Ops/s $\color{#35bf28}+2.02\%$
test_to_njt[True-False-None] 7.1220ms 6.9308ms 144.2843 Ops/s 141.1588 Ops/s $\color{#35bf28}+2.21\%$
test_to_njt[within-False-None] 16.0942ms 15.6255ms 63.9981 Ops/s 63.8558 Ops/s $\color{#35bf28}+0.22\%$
test_creation[device0] 0.3450ms 0.1131ms 8.8448 KOps/s 8.4564 KOps/s $\color{#35bf28}+4.59\%$
test_creation_from_tensor 0.3600ms 0.1107ms 9.0365 KOps/s 8.7000 KOps/s $\color{#35bf28}+3.87\%$
test_add_one[memmap_tensor0] 0.1620ms 6.6722μs 149.8745 KOps/s 148.2786 KOps/s $\color{#35bf28}+1.08\%$
test_contiguous[memmap_tensor0] 13.3300μs 0.6811μs 1.4681 MOps/s 2.1327 MOps/s $\textbf{\color{#d91a1a}-31.16\%}$
test_stack[memmap_tensor0] 30.9100μs 4.6728μs 214.0046 KOps/s 212.5163 KOps/s $\color{#35bf28}+0.70\%$
test_memmaptd_index 0.1709s 0.3556ms 2.8122 KOps/s 3.7513 KOps/s $\textbf{\color{#d91a1a}-25.04\%}$
test_memmaptd_index_astensor 0.5231ms 0.3748ms 2.6684 KOps/s 2.6842 KOps/s $\color{#d91a1a}-0.59\%$
test_memmaptd_index_op 0.9374ms 0.6293ms 1.5892 KOps/s 1.5961 KOps/s $\color{#d91a1a}-0.43\%$
test_serialize_model 0.1390s 0.1374s 7.2788 Ops/s 7.3184 Ops/s $\color{#d91a1a}-0.54\%$
test_serialize_model_pickle 1.3476s 1.1921s 0.8389 Ops/s 0.8243 Ops/s $\color{#35bf28}+1.76\%$
test_serialize_weights 0.1370s 0.1351s 7.4039 Ops/s 7.2912 Ops/s $\color{#35bf28}+1.54\%$
test_serialize_weights_returnearly 0.4709s 93.3962ms 10.7071 Ops/s 14.4919 Ops/s $\textbf{\color{#d91a1a}-26.12\%}$
test_serialize_weights_pickle 1.3645s 1.2134s 0.8241 Ops/s 0.8228 Ops/s $\color{#35bf28}+0.16\%$
test_reshape_pytree 0.2027ms 32.6470μs 30.6307 KOps/s 30.1148 KOps/s $\color{#35bf28}+1.71\%$
test_reshape_td 68.6010μs 46.2106μs 21.6400 KOps/s 21.8391 KOps/s $\color{#d91a1a}-0.91\%$
test_view_pytree 0.2216ms 32.9330μs 30.3647 KOps/s 30.4559 KOps/s $\color{#d91a1a}-0.30\%$
test_view_td 93.4820μs 53.7874μs 18.5917 KOps/s 18.3468 KOps/s $\color{#35bf28}+1.33\%$
test_unbind_pytree 0.2415ms 37.0172μs 27.0145 KOps/s 26.9044 KOps/s $\color{#35bf28}+0.41\%$
test_unbind_td 0.1575ms 51.0250μs 19.5982 KOps/s 19.8895 KOps/s $\color{#d91a1a}-1.46\%$
test_split_pytree 0.2001ms 42.9738μs 23.2700 KOps/s 23.3676 KOps/s $\color{#d91a1a}-0.42\%$
test_split_td 88.5720μs 66.0593μs 15.1379 KOps/s 15.2168 KOps/s $\color{#d91a1a}-0.52\%$
test_add_pytree 0.2110ms 43.6594μs 22.9046 KOps/s 22.5916 KOps/s $\color{#35bf28}+1.39\%$
test_add_td 91.9210μs 57.5167μs 17.3862 KOps/s 17.6338 KOps/s $\color{#d91a1a}-1.40\%$
test_compile_add_one_nested[tensordict-compile] 0.2066ms 0.1412ms 7.0817 KOps/s 6.9361 KOps/s $\color{#35bf28}+2.10\%$
test_compile_add_one_nested[tensordict-eager] 0.3096ms 0.2025ms 4.9387 KOps/s 4.9827 KOps/s $\color{#d91a1a}-0.88\%$
test_compile_add_one_nested[pytree-compile] 0.1622ms 0.1085ms 9.2140 KOps/s 9.0786 KOps/s $\color{#35bf28}+1.49\%$
test_compile_add_one_nested[pytree-eager] 0.4331ms 0.1892ms 5.2852 KOps/s 5.3851 KOps/s $\color{#d91a1a}-1.86\%$
test_compile_copy_nested[tensordict-compile] 0.2398ms 11.6822μs 85.6007 KOps/s 97.2557 KOps/s $\textbf{\color{#d91a1a}-11.98\%}$
test_compile_copy_nested[tensordict-eager] 0.1159ms 55.1620μs 18.1284 KOps/s 18.4383 KOps/s $\color{#d91a1a}-1.68\%$
test_compile_copy_nested[pytree-compile] 0.1184ms 9.8696μs 101.3208 KOps/s 102.0708 KOps/s $\color{#d91a1a}-0.73\%$
test_compile_copy_nested[pytree-eager] 0.4674ms 69.2978μs 14.4305 KOps/s 14.3771 KOps/s $\color{#35bf28}+0.37\%$
test_compile_add_one_flat[tensordict-compile] 0.3162ms 0.1771ms 5.6460 KOps/s 5.1893 KOps/s $\textbf{\color{#35bf28}+8.80\%}$
test_compile_add_one_flat[tensordict-eager] 0.3399ms 0.2825ms 3.5393 KOps/s 3.5154 KOps/s $\color{#35bf28}+0.68\%$
test_compile_add_one_flat[tensorclass-compile] 0.2021ms 0.1169ms 8.5576 KOps/s 8.0986 KOps/s $\textbf{\color{#35bf28}+5.67\%}$
test_compile_add_one_flat[tensorclass-eager] 0.1279ms 72.8292μs 13.7308 KOps/s 13.5797 KOps/s $\color{#35bf28}+1.11\%$
test_compile_add_one_flat[pytree-compile] 0.2282ms 0.1585ms 6.3088 KOps/s 6.1213 KOps/s $\color{#35bf28}+3.06\%$
test_compile_add_one_flat[pytree-eager] 0.8152ms 0.5299ms 1.8870 KOps/s 1.8391 KOps/s $\color{#35bf28}+2.61\%$
test_compile_add_self_flat[tensordict-eager] 0.4868ms 0.3377ms 2.9614 KOps/s 2.9804 KOps/s $\color{#d91a1a}-0.64\%$
test_compile_add_self_flat[tensordict-compile] 0.2262ms 0.1785ms 5.6023 KOps/s 5.1117 KOps/s $\textbf{\color{#35bf28}+9.60\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1321ms 88.9259μs 11.2453 KOps/s 11.3486 KOps/s $\color{#d91a1a}-0.91\%$
test_compile_add_self_flat[tensorclass-compile] 0.3623ms 0.1195ms 8.3717 KOps/s 7.8411 KOps/s $\textbf{\color{#35bf28}+6.77\%}$
test_compile_add_self_flat[pytree-eager] 0.6661ms 0.4396ms 2.2746 KOps/s 2.2122 KOps/s $\color{#35bf28}+2.82\%$
test_compile_add_self_flat[pytree-compile] 0.3084ms 0.1590ms 6.2900 KOps/s 6.1737 KOps/s $\color{#35bf28}+1.88\%$
test_compile_copy_flat[tensordict-compile] 0.1243ms 13.4474μs 74.3637 KOps/s 72.1188 KOps/s $\color{#35bf28}+3.11\%$
test_compile_copy_flat[tensordict-eager] 75.4110μs 41.0877μs 24.3382 KOps/s 23.9412 KOps/s $\color{#35bf28}+1.66\%$
test_compile_copy_flat[pytree-compile] 0.1211ms 10.8629μs 92.0563 KOps/s 91.6614 KOps/s $\color{#35bf28}+0.43\%$
test_compile_copy_flat[pytree-eager] 0.4111ms 52.8936μs 18.9059 KOps/s 19.1496 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_assign_and_add[tensordict-compile] 2.0176ms 0.1736ms 5.7588 KOps/s 5.4704 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_compile_assign_and_add[tensordict-eager] 3.4142ms 3.2996ms 303.0689 Ops/s 301.7799 Ops/s $\color{#35bf28}+0.43\%$
test_compile_assign_and_add[pytree-compile] 1.9750ms 0.1621ms 6.1701 KOps/s 6.1006 KOps/s $\color{#35bf28}+1.14\%$
test_compile_assign_and_add[pytree-eager] 3.0238ms 2.8237ms 354.1427 Ops/s 353.4710 Ops/s $\color{#35bf28}+0.19\%$
test_compile_indexing[tensor-tensordict-compile] 0.2026ms 0.1088ms 9.1896 KOps/s 8.6482 KOps/s $\textbf{\color{#35bf28}+6.26\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3436ms 74.3737μs 13.4456 KOps/s 12.5038 KOps/s $\textbf{\color{#35bf28}+7.53\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.2254ms 96.4756μs 10.3653 KOps/s 10.0652 KOps/s $\color{#35bf28}+2.98\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2522ms 44.7224μs 22.3601 KOps/s 20.7754 KOps/s $\textbf{\color{#35bf28}+7.63\%}$
test_compile_indexing[tensor-pytree-compile] 0.1360ms 96.7560μs 10.3353 KOps/s 10.0377 KOps/s $\color{#35bf28}+2.96\%$
test_compile_indexing[tensor-pytree-eager] 0.2698ms 44.6036μs 22.4197 KOps/s 20.7390 KOps/s $\textbf{\color{#35bf28}+8.10\%}$
test_compile_indexing[slice-tensordict-compile] 0.1504ms 56.9202μs 17.5685 KOps/s 16.4943 KOps/s $\textbf{\color{#35bf28}+6.51\%}$
test_compile_indexing[slice-tensordict-eager] 0.2149ms 27.6934μs 36.1096 KOps/s 33.5170 KOps/s $\textbf{\color{#35bf28}+7.74\%}$
test_compile_indexing[slice-tensorclass-compile] 0.1515ms 44.6358μs 22.4035 KOps/s 22.2835 KOps/s $\color{#35bf28}+0.54\%$
test_compile_indexing[slice-tensorclass-eager] 0.2687ms 22.6201μs 44.2084 KOps/s 44.2949 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_indexing[slice-pytree-compile] 91.9220μs 45.7958μs 21.8361 KOps/s 21.3554 KOps/s $\color{#35bf28}+2.25\%$
test_compile_indexing[slice-pytree-eager] 0.2798ms 22.6490μs 44.1521 KOps/s 44.5135 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_indexing[int-tensordict-compile] 0.1471ms 58.5126μs 17.0903 KOps/s 16.4705 KOps/s $\color{#35bf28}+3.76\%$
test_compile_indexing[int-tensordict-eager] 0.3319ms 27.3882μs 36.5120 KOps/s 34.3392 KOps/s $\textbf{\color{#35bf28}+6.33\%}$
test_compile_indexing[int-tensorclass-compile] 0.1912ms 45.5433μs 21.9571 KOps/s 21.8155 KOps/s $\color{#35bf28}+0.65\%$
test_compile_indexing[int-tensorclass-eager] 0.2640ms 22.6081μs 44.2319 KOps/s 44.0454 KOps/s $\color{#35bf28}+0.42\%$
test_compile_indexing[int-pytree-compile] 86.3520μs 45.9030μs 21.7851 KOps/s 21.5393 KOps/s $\color{#35bf28}+1.14\%$
test_compile_indexing[int-pytree-eager] 0.2731ms 22.6135μs 44.2213 KOps/s 44.4977 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_replace[single-eager] 91.0510μs 47.8190μs 20.9122 KOps/s 19.9907 KOps/s $\color{#35bf28}+4.61\%$
test_compile_replace[single-compile] 0.2162ms 0.1048ms 9.5380 KOps/s 9.3932 KOps/s $\color{#35bf28}+1.54\%$
test_compile_replace[multi-eager] 0.6176ms 0.5611ms 1.7823 KOps/s 1.7567 KOps/s $\color{#35bf28}+1.46\%$
test_compile_replace[multi-compile] 0.2630ms 0.1119ms 8.9344 KOps/s 8.6976 KOps/s $\color{#35bf28}+2.72\%$
test_compile_tc_getattr_20[eager] 0.3141ms 0.1692ms 5.9092 KOps/s 5.8152 KOps/s $\color{#35bf28}+1.62\%$
test_compile_tc_getattr_20[compile] 0.2657ms 0.1237ms 8.0860 KOps/s 7.9582 KOps/s $\color{#35bf28}+1.60\%$
test_compile_clone_shallow[20-eager] 53.8910μs 19.6056μs 51.0058 KOps/s 52.4525 KOps/s $\color{#d91a1a}-2.76\%$
test_compile_clone_shallow[20-compile] 61.2910μs 11.6813μs 85.6069 KOps/s 79.9701 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_compile_clone_shallow[40-eager] 61.3620μs 34.3717μs 29.0937 KOps/s 29.8226 KOps/s $\color{#d91a1a}-2.44\%$
test_compile_clone_shallow[40-compile] 0.1134ms 12.5638μs 79.5936 KOps/s 79.4856 KOps/s $\color{#35bf28}+0.14\%$
test_compile_clone_shallow[80-eager] 0.2315ms 64.3340μs 15.5439 KOps/s 15.9375 KOps/s $\color{#d91a1a}-2.47\%$
test_compile_clone_shallow[80-compile] 48.8110μs 15.1449μs 66.0290 KOps/s 67.0180 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_update_inplace[eager] 0.2551ms 59.8946μs 16.6960 KOps/s 17.0785 KOps/s $\color{#d91a1a}-2.24\%$
test_compile_update_inplace[compile] 0.2709ms 0.1392ms 7.1818 KOps/s 6.7086 KOps/s $\textbf{\color{#35bf28}+7.05\%}$
test_mod_add[eager] 96.7020μs 50.6818μs 19.7310 KOps/s 19.3515 KOps/s $\color{#35bf28}+1.96\%$
test_mod_add[compile] 0.3262ms 0.1044ms 9.5775 KOps/s 9.0296 KOps/s $\textbf{\color{#35bf28}+6.07\%}$
test_mod_add[compile-overhead] 0.2751ms 0.1489ms 6.7167 KOps/s 6.5807 KOps/s $\color{#35bf28}+2.07\%$
test_mod_wrap[eager] 0.3719ms 0.2895ms 3.4545 KOps/s 3.2564 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_mod_wrap[compile] 0.4902ms 0.3487ms 2.8682 KOps/s 2.7444 KOps/s $\color{#35bf28}+4.51\%$
test_mod_wrap[compile-overhead] 7.1621ms 3.9850ms 250.9409 Ops/s 248.8311 Ops/s $\color{#35bf28}+0.85\%$
test_mod_wrap_and_backward[eager] 1.6146ms 1.4963ms 668.3271 Ops/s 654.5776 Ops/s $\color{#35bf28}+2.10\%$
test_mod_wrap_and_backward[compile] 1.5608ms 1.4549ms 687.3145 Ops/s 632.0698 Ops/s $\textbf{\color{#35bf28}+8.74\%}$
test_mod_wrap_and_backward[compile-overhead] 1.2595ms 0.8902ms 1.1234 KOps/s 1.1035 KOps/s $\color{#35bf28}+1.80\%$
test_seq_add[eager] 0.2739ms 0.1533ms 6.5241 KOps/s 6.4744 KOps/s $\color{#35bf28}+0.77\%$
test_seq_add[compile] 0.5028ms 0.1135ms 8.8133 KOps/s 8.4993 KOps/s $\color{#35bf28}+3.69\%$
test_seq_add[compile-overhead] 0.2958ms 0.1527ms 6.5480 KOps/s 6.2288 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_seq_wrap[eager] 0.5919ms 0.5180ms 1.9305 KOps/s 1.9074 KOps/s $\color{#35bf28}+1.21\%$
test_seq_wrap[compile] 0.5218ms 0.3686ms 2.7133 KOps/s 2.6368 KOps/s $\color{#35bf28}+2.90\%$
test_seq_wrap[compile-overhead] 0.3440ms 0.2637ms 3.7929 KOps/s 3.7292 KOps/s $\color{#35bf28}+1.71\%$
test_func_call_runtime[False-eager] 0.9195ms 0.8423ms 1.1872 KOps/s 1.1901 KOps/s $\color{#d91a1a}-0.24\%$
test_func_call_runtime[False-compile] 0.9906ms 0.9088ms 1.1004 KOps/s 1.0384 KOps/s $\textbf{\color{#35bf28}+5.97\%}$
test_func_call_runtime[False-compile-overhead] 0.7294ms 0.4595ms 2.1764 KOps/s 2.1363 KOps/s $\color{#35bf28}+1.88\%$
test_func_call_runtime[True-eager] 1.1310ms 1.0665ms 937.6522 Ops/s 917.5741 Ops/s $\color{#35bf28}+2.19\%$
test_func_call_runtime[True-compile] 0.9769ms 0.9233ms 1.0830 KOps/s 1.0691 KOps/s $\color{#35bf28}+1.30\%$
test_func_call_runtime[True-compile-overhead] 0.7372ms 0.4798ms 2.0842 KOps/s 2.0730 KOps/s $\color{#35bf28}+0.54\%$
test_func_call_cm_runtime[False-eager] 0.9177ms 0.8435ms 1.1855 KOps/s 1.1853 KOps/s $\color{#35bf28}+0.02\%$
test_func_call_cm_runtime[False-compile] 1.1350ms 0.9335ms 1.0712 KOps/s 1.0812 KOps/s $\color{#d91a1a}-0.92\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5821ms 0.4618ms 2.1656 KOps/s 2.1238 KOps/s $\color{#35bf28}+1.97\%$
test_func_call_cm_runtime[True-eager] 1.6368ms 1.2276ms 814.5876 Ops/s 814.9435 Ops/s $\color{#d91a1a}-0.04\%$
test_func_call_cm_runtime[True-compile] 1.4737ms 0.9583ms 1.0435 KOps/s 1.0328 KOps/s $\color{#35bf28}+1.03\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5930ms 0.5091ms 1.9641 KOps/s 1.9335 KOps/s $\color{#35bf28}+1.58\%$
test_vmap_func_call_cm_runtime[eager] 2.9345ms 2.3832ms 419.6015 Ops/s 419.4754 Ops/s $\color{#35bf28}+0.03\%$
test_vmap_func_call_cm_runtime[compile] 1.0814ms 0.9724ms 1.0284 KOps/s 963.3089 Ops/s $\textbf{\color{#35bf28}+6.76\%}$
test_vmap_func_call_cm_runtime[compile-overhead] 0.6461ms 0.5162ms 1.9371 KOps/s 1.9054 KOps/s $\color{#35bf28}+1.66\%$
test_distributed 3.2129ms 0.1695ms 5.8994 KOps/s 6.5102 KOps/s $\textbf{\color{#d91a1a}-9.38\%}$
test_tdmodule 0.2703ms 27.4591μs 36.4179 KOps/s 37.1385 KOps/s $\color{#d91a1a}-1.94\%$
test_tdmodule_dispatch 90.0910μs 45.4110μs 22.0211 KOps/s 22.2808 KOps/s $\color{#d91a1a}-1.17\%$
test_tdseq 46.3010μs 26.7053μs 37.4458 KOps/s 37.4146 KOps/s $\color{#35bf28}+0.08\%$
test_tdseq_dispatch 0.1364ms 47.0997μs 21.2315 KOps/s 21.2377 KOps/s $\color{#d91a1a}-0.03\%$
test_instantiation_functorch 2.2369ms 2.0935ms 477.6617 Ops/s 479.5420 Ops/s $\color{#d91a1a}-0.39\%$
test_exec_functorch 0.2308ms 0.1800ms 5.5571 KOps/s 5.5496 KOps/s $\color{#35bf28}+0.13\%$
test_exec_functional_call 0.2176ms 0.1596ms 6.2649 KOps/s 6.2435 KOps/s $\color{#35bf28}+0.34\%$
test_exec_td_decorator 0.4671ms 0.2372ms 4.2162 KOps/s 4.2160 KOps/s $+0.00\%$
test_vmap_mlp_speed_decorator[True-True] 1.0075ms 0.8240ms 1.2136 KOps/s 1.2060 KOps/s $\color{#35bf28}+0.63\%$
test_vmap_mlp_speed_decorator[True-False] 1.0221ms 0.8225ms 1.2158 KOps/s 1.2088 KOps/s $\color{#35bf28}+0.58\%$
test_vmap_mlp_speed_decorator[False-True] 0.9003ms 0.7113ms 1.4059 KOps/s 1.3988 KOps/s $\color{#35bf28}+0.51\%$
test_vmap_mlp_speed_decorator[False-False] 0.8811ms 0.7111ms 1.4062 KOps/s 1.3963 KOps/s $\color{#35bf28}+0.71\%$
test_vmap_transformer_speed_decorator[True-True] 21.3825ms 20.5767ms 48.5987 Ops/s 48.4373 Ops/s $\color{#35bf28}+0.33\%$
test_vmap_transformer_speed_decorator[True-False] 20.7464ms 20.5913ms 48.5641 Ops/s 48.4412 Ops/s $\color{#35bf28}+0.25\%$
test_vmap_transformer_speed_decorator[False-True] 20.5477ms 20.3827ms 49.0613 Ops/s 48.9332 Ops/s $\color{#35bf28}+0.26\%$
test_vmap_transformer_speed_decorator[False-False] 20.4996ms 20.3878ms 49.0489 Ops/s 48.9159 Ops/s $\color{#35bf28}+0.27\%$
test_to_module_speed[True] 1.5864ms 1.4756ms 677.7009 Ops/s 669.4012 Ops/s $\color{#35bf28}+1.24\%$
test_to_module_speed[False] 1.6326ms 1.4411ms 693.8908 Ops/s 686.1291 Ops/s $\color{#35bf28}+1.13\%$
test_tc_init 0.2100ms 45.0815μs 22.1821 KOps/s 22.5393 KOps/s $\color{#d91a1a}-1.58\%$
test_tc_init_tensor_only 36.8500μs 9.7658μs 102.3981 KOps/s 101.7992 KOps/s $\color{#35bf28}+0.59\%$
test_tc_init_nested 0.5260ms 88.8865μs 11.2503 KOps/s 11.3275 KOps/s $\color{#d91a1a}-0.68\%$
test_tc_init_many_fields 48.5200μs 16.3637μs 61.1109 KOps/s 61.0956 KOps/s $\color{#35bf28}+0.03\%$
test_tc_first_layer_tensor 0.4269ms 1.8179μs 550.0940 KOps/s 546.6224 KOps/s $\color{#35bf28}+0.64\%$
test_tc_first_layer_tensor_only 1.7320μs 0.3964μs 2.5227 MOps/s 2.4689 MOps/s $\color{#35bf28}+2.18\%$
test_tc_first_layer_tensor_set 0.4406ms 3.9148μs 255.4414 KOps/s 255.1185 KOps/s $\color{#35bf28}+0.13\%$
test_tc_first_layer_tensor_only_set 25.8300μs 3.2698μs 305.8329 KOps/s 306.2298 KOps/s $\color{#d91a1a}-0.13\%$
test_tc_first_layer_nontensor 2.4663ms 6.2800μs 159.2350 KOps/s 162.3029 KOps/s $\color{#d91a1a}-1.89\%$
test_tc_second_layer_tensor 0.4281ms 4.4691μs 223.7572 KOps/s 227.2128 KOps/s $\color{#d91a1a}-1.52\%$
test_tc_second_layer_nontensor 43.5210μs 8.8416μs 113.1019 KOps/s 115.8964 KOps/s $\color{#d91a1a}-2.41\%$
test_unbind 0.2496s 16.3745ms 61.0706 Ops/s 54.9962 Ops/s $\textbf{\color{#35bf28}+11.05\%}$
test_full_like 7.5608ms 4.3848ms 228.0602 Ops/s 228.6130 Ops/s $\color{#d91a1a}-0.24\%$
test_zeros_like 5.0299ms 4.3715ms 228.7548 Ops/s 229.1119 Ops/s $\color{#d91a1a}-0.16\%$
test_ones_like 4.5415ms 4.3706ms 228.7991 Ops/s 229.0394 Ops/s $\color{#d91a1a}-0.10\%$
test_clone 6.8644ms 6.4402ms 155.2754 Ops/s 155.5356 Ops/s $\color{#d91a1a}-0.17\%$
test_squeeze 66.8110μs 14.0007μs 71.4250 KOps/s 70.5515 KOps/s $\color{#35bf28}+1.24\%$
test_unsqueeze 0.2163ms 0.1109ms 9.0197 KOps/s 8.9653 KOps/s $\color{#35bf28}+0.61\%$
test_split 0.6225ms 0.1860ms 5.3759 KOps/s 5.3609 KOps/s $\color{#35bf28}+0.28\%$
test_permute 0.6474ms 0.2126ms 4.7043 KOps/s 4.8513 KOps/s $\color{#d91a1a}-3.03\%$
test_stack 51.7995ms 51.0945ms 19.5716 Ops/s 19.5461 Ops/s $\color{#35bf28}+0.13\%$
test_cat 51.4962ms 51.1000ms 19.5695 Ops/s 19.6933 Ops/s $\color{#d91a1a}-0.63\%$
test_sequential_tensordict 0.3103ms 0.2256ms 4.4325 KOps/s 4.6270 KOps/s $\color{#d91a1a}-4.20\%$
test_sequential_graph_module 0.5733ms 0.1232ms 8.1200 KOps/s 8.4949 KOps/s $\color{#d91a1a}-4.41\%$
test_nested_tensordict 0.3469ms 0.2867ms 3.4874 KOps/s 3.5459 KOps/s $\color{#d91a1a}-1.65\%$
test_nested_graph_module 0.5769ms 0.1292ms 7.7420 KOps/s 7.8311 KOps/s $\color{#d91a1a}-1.14\%$

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 261. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 31.3110μs 14.9589μs 66.8497 KOps/s 66.9739 KOps/s $\color{#d91a1a}-0.19\%$
test_plain_set_stack_nested 36.9200μs 15.2725μs 65.4774 KOps/s 65.7984 KOps/s $\color{#d91a1a}-0.49\%$
test_plain_set_nested_inplace 49.3400μs 16.7306μs 59.7708 KOps/s 59.5671 KOps/s $\color{#35bf28}+0.34\%$
test_plain_set_stack_nested_inplace 45.4710μs 16.5006μs 60.6037 KOps/s 59.7506 KOps/s $\color{#35bf28}+1.43\%$
test_items 59.6610μs 6.1481μs 162.6522 KOps/s 168.5419 KOps/s $\color{#d91a1a}-3.49\%$
test_items_nested 0.7017ms 0.4607ms 2.1704 KOps/s 2.1456 KOps/s $\color{#35bf28}+1.16\%$
test_items_nested_locked 0.5949ms 0.4578ms 2.1845 KOps/s 2.1450 KOps/s $\color{#35bf28}+1.84\%$
test_items_nested_leaf 0.1391ms 97.4976μs 10.2567 KOps/s 10.1355 KOps/s $\color{#35bf28}+1.19\%$
test_items_stack_nested 0.5118ms 0.4657ms 2.1473 KOps/s 2.1443 KOps/s $\color{#35bf28}+0.14\%$
test_items_stack_nested_leaf 0.1428ms 98.7465μs 10.1269 KOps/s 10.1307 KOps/s $\color{#d91a1a}-0.04\%$
test_items_stack_nested_locked 0.5204ms 0.4697ms 2.1290 KOps/s 2.1371 KOps/s $\color{#d91a1a}-0.38\%$
test_keys 31.0810μs 4.2653μs 234.4474 KOps/s 235.6464 KOps/s $\color{#d91a1a}-0.51\%$
test_keys_nested 0.1757ms 0.1297ms 7.7121 KOps/s 7.6793 KOps/s $\color{#35bf28}+0.43\%$
test_keys_nested_locked 2.1215ms 0.1388ms 7.2048 KOps/s 7.2458 KOps/s $\color{#d91a1a}-0.57\%$
test_keys_nested_leaf 0.1558ms 0.1207ms 8.2859 KOps/s 8.3375 KOps/s $\color{#d91a1a}-0.62\%$
test_keys_stack_nested 0.1806ms 0.1308ms 7.6426 KOps/s 7.6784 KOps/s $\color{#d91a1a}-0.47\%$
test_keys_stack_nested_leaf 0.1766ms 0.1206ms 8.2927 KOps/s 8.3237 KOps/s $\color{#d91a1a}-0.37\%$
test_keys_stack_nested_locked 0.4874ms 0.1376ms 7.2671 KOps/s 7.2767 KOps/s $\color{#d91a1a}-0.13\%$
test_values 6.2600μs 1.0228μs 977.6608 KOps/s 970.9442 KOps/s $\color{#35bf28}+0.69\%$
test_values_nested 86.0110μs 52.6117μs 19.0072 KOps/s 18.9158 KOps/s $\color{#35bf28}+0.48\%$
test_values_nested_locked 96.4810μs 55.9524μs 17.8723 KOps/s 17.8080 KOps/s $\color{#35bf28}+0.36\%$
test_values_nested_leaf 88.5410μs 60.2512μs 16.5972 KOps/s 16.6423 KOps/s $\color{#d91a1a}-0.27\%$
test_values_stack_nested 0.1192ms 52.9216μs 18.8959 KOps/s 18.9446 KOps/s $\color{#d91a1a}-0.26\%$
test_values_stack_nested_leaf 96.1210μs 60.3939μs 16.5580 KOps/s 16.5248 KOps/s $\color{#35bf28}+0.20\%$
test_values_stack_nested_locked 86.6010μs 56.1280μs 17.8164 KOps/s 17.7084 KOps/s $\color{#35bf28}+0.61\%$
test_membership 4.9352μs 0.8461μs 1.1819 MOps/s 1.1661 MOps/s $\color{#35bf28}+1.36\%$
test_membership_nested 28.5100μs 2.8817μs 347.0200 KOps/s 342.5253 KOps/s $\color{#35bf28}+1.31\%$
test_membership_nested_leaf 69.4610μs 2.8757μs 347.7432 KOps/s 343.9320 KOps/s $\color{#35bf28}+1.11\%$
test_membership_stacked_nested 28.7500μs 2.9044μs 344.3086 KOps/s 343.0598 KOps/s $\color{#35bf28}+0.36\%$
test_membership_stacked_nested_leaf 34.4110μs 2.9045μs 344.2876 KOps/s 344.3498 KOps/s $\color{#d91a1a}-0.02\%$
test_membership_nested_last 33.1110μs 4.3591μs 229.4052 KOps/s 228.2860 KOps/s $\color{#35bf28}+0.49\%$
test_membership_nested_leaf_last 33.0310μs 4.3392μs 230.4546 KOps/s 227.0248 KOps/s $\color{#35bf28}+1.51\%$
test_membership_stacked_nested_last 23.7500μs 4.3538μs 229.6823 KOps/s 228.2042 KOps/s $\color{#35bf28}+0.65\%$
test_membership_stacked_nested_leaf_last 35.6000μs 4.3500μs 229.8854 KOps/s 229.2143 KOps/s $\color{#35bf28}+0.29\%$
test_nested_getleaf 49.3910μs 21.5490μs 46.4058 KOps/s 46.0766 KOps/s $\color{#35bf28}+0.71\%$
test_nested_get 87.7620μs 20.5254μs 48.7200 KOps/s 49.1354 KOps/s $\color{#d91a1a}-0.85\%$
test_stacked_getleaf 50.2800μs 21.7439μs 45.9899 KOps/s 46.7903 KOps/s $\color{#d91a1a}-1.71\%$
test_stacked_get 53.0010μs 20.4945μs 48.7935 KOps/s 48.7607 KOps/s $\color{#35bf28}+0.07\%$
test_nested_getitemleaf 59.6510μs 22.3346μs 44.7735 KOps/s 45.3222 KOps/s $\color{#d91a1a}-1.21\%$
test_nested_getitem 50.1900μs 21.2031μs 47.1630 KOps/s 47.6541 KOps/s $\color{#d91a1a}-1.03\%$
test_stacked_getitemleaf 81.4210μs 22.1912μs 45.0628 KOps/s 45.3895 KOps/s $\color{#d91a1a}-0.72\%$
test_stacked_getitem 52.2300μs 20.9385μs 47.7590 KOps/s 47.4580 KOps/s $\color{#35bf28}+0.63\%$
test_lock_nested 0.5715ms 0.4782ms 2.0913 KOps/s 2.0939 KOps/s $\color{#d91a1a}-0.12\%$
test_lock_stack_nested 0.5339ms 0.4830ms 2.0705 KOps/s 2.0592 KOps/s $\color{#35bf28}+0.55\%$
test_unlock_nested 0.4740ms 0.3913ms 2.5555 KOps/s 2.5862 KOps/s $\color{#d91a1a}-1.19\%$
test_unlock_stack_nested 0.4358ms 0.3905ms 2.5608 KOps/s 2.5332 KOps/s $\color{#35bf28}+1.09\%$
test_flatten_speed 0.1856ms 0.1227ms 8.1529 KOps/s 8.2302 KOps/s $\color{#d91a1a}-0.94\%$
test_unflatten_speed 0.6675ms 0.5625ms 1.7778 KOps/s 1.7453 KOps/s $\color{#35bf28}+1.86\%$
test_common_ops 0.8381ms 0.6919ms 1.4452 KOps/s 1.4258 KOps/s $\color{#35bf28}+1.36\%$
test_creation 0.1152ms 3.1716μs 315.3020 KOps/s 315.8970 KOps/s $\color{#d91a1a}-0.19\%$
test_creation_empty 35.8000μs 7.0239μs 142.3705 KOps/s 143.2856 KOps/s $\color{#d91a1a}-0.64\%$
test_creation_nested_1 33.5010μs 11.5968μs 86.2310 KOps/s 86.7691 KOps/s $\color{#d91a1a}-0.62\%$
test_creation_nested_2 71.1910μs 12.9987μs 76.9305 KOps/s 74.6518 KOps/s $\color{#35bf28}+3.05\%$
test_creation_many_keys[10] 89.5910μs 20.8677μs 47.9209 KOps/s 47.0632 KOps/s $\color{#35bf28}+1.82\%$
test_creation_many_keys[50] 0.1268ms 90.1610μs 11.0913 KOps/s 11.0571 KOps/s $\color{#35bf28}+0.31\%$
test_creation_many_keys[100] 0.2169ms 0.1787ms 5.5950 KOps/s 5.5534 KOps/s $\color{#35bf28}+0.75\%$
test_creation_nested_many_keys[10] 82.2720μs 44.9926μs 22.2259 KOps/s 22.2481 KOps/s $\color{#d91a1a}-0.10\%$
test_creation_nested_many_keys[50] 0.2371ms 0.1846ms 5.4162 KOps/s 5.3792 KOps/s $\color{#35bf28}+0.69\%$
test_clone 45.7210μs 13.0367μs 76.7065 KOps/s 74.4932 KOps/s $\color{#35bf28}+2.97\%$
test_getitem[int] 1.5701ms 15.0049μs 66.6447 KOps/s 59.2168 KOps/s $\textbf{\color{#35bf28}+12.54\%}$
test_getitem[slice_int] 0.1867ms 24.0843μs 41.5208 KOps/s 41.4808 KOps/s $\color{#35bf28}+0.10\%$
test_getitem[range] 0.1763ms 62.7427μs 15.9381 KOps/s 15.7272 KOps/s $\color{#35bf28}+1.34\%$
test_getitem[tuple] 0.1561ms 23.9904μs 41.6833 KOps/s 41.7991 KOps/s $\color{#d91a1a}-0.28\%$
test_getitem[list] 0.1873ms 57.8321μs 17.2914 KOps/s 16.9304 KOps/s $\color{#35bf28}+2.13\%$
test_setitem_dim[int] 46.4110μs 25.0069μs 39.9890 KOps/s 38.1343 KOps/s $\color{#35bf28}+4.86\%$
test_setitem_dim[slice_int] 65.3400μs 42.5058μs 23.5262 KOps/s 23.1396 KOps/s $\color{#35bf28}+1.67\%$
test_setitem_dim[range] 0.1302ms 94.6489μs 10.5654 KOps/s 10.4794 KOps/s $\color{#35bf28}+0.82\%$
test_setitem_dim[tuple] 61.3710μs 38.7480μs 25.8078 KOps/s 25.4353 KOps/s $\color{#35bf28}+1.46\%$
test_setitem 63.1110μs 17.5591μs 56.9507 KOps/s 56.4609 KOps/s $\color{#35bf28}+0.87\%$
test_set 48.0410μs 16.9195μs 59.1035 KOps/s 59.1777 KOps/s $\color{#d91a1a}-0.13\%$
test_set_shared 0.5176ms 0.2046ms 4.8882 KOps/s 4.9501 KOps/s $\color{#d91a1a}-1.25\%$
test_update 0.2142ms 21.5381μs 46.4294 KOps/s 46.0206 KOps/s $\color{#35bf28}+0.89\%$
test_update_nested 77.2310μs 33.1792μs 30.1394 KOps/s 30.0595 KOps/s $\color{#35bf28}+0.27\%$
test_update__nested 0.4767ms 33.3850μs 29.9536 KOps/s 28.4934 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_set_nested 46.5900μs 18.8781μs 52.9714 KOps/s 52.7685 KOps/s $\color{#35bf28}+0.38\%$
test_set_nested_new 76.7010μs 23.6654μs 42.2557 KOps/s 41.9693 KOps/s $\color{#35bf28}+0.68\%$
test_select 87.0210μs 40.2810μs 24.8256 KOps/s 24.0524 KOps/s $\color{#35bf28}+3.21\%$
test_select_nested 0.1109ms 76.1037μs 13.1400 KOps/s 13.4474 KOps/s $\color{#d91a1a}-2.29\%$
test_exclude_nested 0.1312ms 92.3902μs 10.8237 KOps/s 10.9120 KOps/s $\color{#d91a1a}-0.81\%$
test_empty[True] 0.4720ms 0.4005ms 2.4966 KOps/s 2.5196 KOps/s $\color{#d91a1a}-0.91\%$
test_empty[False] 7.8850μs 1.3308μs 751.4357 KOps/s 755.1165 KOps/s $\color{#d91a1a}-0.49\%$
test_to 0.1024ms 70.5132μs 14.1817 KOps/s 13.7785 KOps/s $\color{#35bf28}+2.93\%$
test_to_nonblocking 0.1056ms 64.4452μs 15.5171 KOps/s 15.0035 KOps/s $\color{#35bf28}+3.42\%$
test_unbind_speed 0.3810ms 0.3339ms 2.9951 KOps/s 3.0172 KOps/s $\color{#d91a1a}-0.73\%$
test_unbind_speed_stack0 0.3818ms 0.3328ms 3.0046 KOps/s 3.0619 KOps/s $\color{#d91a1a}-1.87\%$
test_unbind_speed_stack1 0.1083s 1.0435ms 958.3441 Ops/s 1.1873 KOps/s $\textbf{\color{#d91a1a}-19.28\%}$
test_split 1.2174ms 1.1481ms 871.0328 Ops/s 784.3383 Ops/s $\textbf{\color{#35bf28}+11.05\%}$
test_chunk 0.1068s 1.2092ms 826.9837 Ops/s 923.6165 Ops/s $\textbf{\color{#d91a1a}-10.46\%}$
test_to_cpu_blocking 19.5535ms 18.8242ms 53.1230 Ops/s 35.6530 Ops/s $\textbf{\color{#35bf28}+49.00\%}$
test_to_cpu_global_sync 11.6320ms 11.3391ms 88.1905 Ops/s 78.6527 Ops/s $\textbf{\color{#35bf28}+12.13\%}$
test_to_cpu_event_sync 0.1188s 13.5140ms 73.9974 Ops/s 81.2933 Ops/s $\textbf{\color{#d91a1a}-8.97\%}$
test_to_cpu_default 12.5201ms 12.2166ms 81.8555 Ops/s 81.1669 Ops/s $\color{#35bf28}+0.85\%$
test_consolidate[False-None] 4.2309ms 4.1581ms 240.4940 Ops/s 217.2359 Ops/s $\textbf{\color{#35bf28}+10.71\%}$
test_consolidate[default-None] 2.1224ms 2.0222ms 494.4995 Ops/s 486.6645 Ops/s $\color{#35bf28}+1.61\%$
test_consolidate[reduce-overhead-None] 2.0230ms 1.9362ms 516.4633 Ops/s 506.1639 Ops/s $\color{#35bf28}+2.03\%$
test_consolidate_njt[False-None] 8.7913ms 8.5495ms 116.9656 Ops/s 116.9459 Ops/s $\color{#35bf28}+0.02\%$
test_to[False-False-None] 2.1704ms 2.0648ms 484.3051 Ops/s 476.2822 Ops/s $\color{#35bf28}+1.68\%$
test_to[True-False-None] 2.1649ms 1.9182ms 521.3289 Ops/s 514.9413 Ops/s $\color{#35bf28}+1.24\%$
test_to[within-False-None] 6.3173ms 6.2139ms 160.9291 Ops/s 163.2282 Ops/s $\color{#d91a1a}-1.41\%$
test_to[True-default-None] 9.2819ms 9.0577ms 110.4027 Ops/s 106.9845 Ops/s $\color{#35bf28}+3.20\%$
test_to_njt[False-False-None] 8.7890ms 8.5020ms 117.6190 Ops/s 117.0489 Ops/s $\color{#35bf28}+0.49\%$
test_to_njt[True-False-None] 7.1543ms 6.9454ms 143.9793 Ops/s 143.7610 Ops/s $\color{#35bf28}+0.15\%$
test_to_njt[within-False-None] 16.2207ms 15.7313ms 63.5675 Ops/s 62.9926 Ops/s $\color{#35bf28}+0.91\%$
test_creation[device0] 0.4174ms 0.1164ms 8.5909 KOps/s 8.7385 KOps/s $\color{#d91a1a}-1.69\%$
test_creation_from_tensor 0.4110ms 0.1172ms 8.5350 KOps/s 8.9076 KOps/s $\color{#d91a1a}-4.18\%$
test_add_one[memmap_tensor0] 0.4032ms 6.6726μs 149.8655 KOps/s 141.7120 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_contiguous[memmap_tensor0] 34.3600μs 0.7287μs 1.3723 MOps/s 1.9614 MOps/s $\textbf{\color{#d91a1a}-30.04\%}$
test_stack[memmap_tensor0] 37.8310μs 4.5860μs 218.0546 KOps/s 218.3655 KOps/s $\color{#d91a1a}-0.14\%$
test_memmaptd_index 1.0560ms 0.2717ms 3.6806 KOps/s 3.5824 KOps/s $\color{#35bf28}+2.74\%$
test_memmaptd_index_astensor 0.5361ms 0.3767ms 2.6548 KOps/s 2.5427 KOps/s $\color{#35bf28}+4.41\%$
test_memmaptd_index_op 0.9632ms 0.6297ms 1.5879 KOps/s 1.4865 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_serialize_model 0.3123s 0.1614s 6.1974 Ops/s 7.3495 Ops/s $\textbf{\color{#d91a1a}-15.68\%}$
test_serialize_model_pickle 1.3474s 1.2113s 0.8256 Ops/s 0.8261 Ops/s $\color{#d91a1a}-0.06\%$
test_serialize_weights 0.1391s 0.1372s 7.2876 Ops/s 7.3700 Ops/s $\color{#d91a1a}-1.12\%$
test_serialize_weights_returnearly 0.4707s 92.3803ms 10.8248 Ops/s 10.7005 Ops/s $\color{#35bf28}+1.16\%$
test_serialize_weights_pickle 1.3656s 1.2218s 0.8185 Ops/s 0.8196 Ops/s $\color{#d91a1a}-0.13\%$
test_reshape_pytree 0.2071ms 32.7113μs 30.5705 KOps/s 30.3445 KOps/s $\color{#35bf28}+0.74\%$
test_reshape_td 77.9410μs 46.3069μs 21.5951 KOps/s 21.7670 KOps/s $\color{#d91a1a}-0.79\%$
test_view_pytree 0.2140ms 32.0493μs 31.2019 KOps/s 30.7011 KOps/s $\color{#35bf28}+1.63\%$
test_view_td 88.2810μs 53.9406μs 18.5389 KOps/s 18.8808 KOps/s $\color{#d91a1a}-1.81\%$
test_unbind_pytree 0.2246ms 36.2781μs 27.5648 KOps/s 27.6440 KOps/s $\color{#d91a1a}-0.29\%$
test_unbind_td 0.2168ms 50.1725μs 19.9312 KOps/s 20.2493 KOps/s $\color{#d91a1a}-1.57\%$
test_split_pytree 0.2468ms 42.4255μs 23.5707 KOps/s 23.5327 KOps/s $\color{#35bf28}+0.16\%$
test_split_td 0.1738ms 65.4171μs 15.2865 KOps/s 15.6303 KOps/s $\color{#d91a1a}-2.20\%$
test_add_pytree 0.1931ms 42.4455μs 23.5596 KOps/s 23.4464 KOps/s $\color{#35bf28}+0.48\%$
test_add_td 0.1176ms 55.3943μs 18.0524 KOps/s 18.0783 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_add_one_nested[tensordict-compile] 0.2143ms 0.1490ms 6.7095 KOps/s 6.8361 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_add_one_nested[tensordict-eager] 0.4140ms 0.2002ms 4.9944 KOps/s 4.9959 KOps/s $\color{#d91a1a}-0.03\%$
test_compile_add_one_nested[pytree-compile] 0.2237ms 0.1081ms 9.2518 KOps/s 9.2070 KOps/s $\color{#35bf28}+0.49\%$
test_compile_add_one_nested[pytree-eager] 0.4312ms 0.1790ms 5.5851 KOps/s 5.5564 KOps/s $\color{#35bf28}+0.52\%$
test_compile_copy_nested[tensordict-compile] 0.3259ms 10.2884μs 97.1969 KOps/s 99.1380 KOps/s $\color{#d91a1a}-1.96\%$
test_compile_copy_nested[tensordict-eager] 0.1265ms 54.1213μs 18.4770 KOps/s 18.4361 KOps/s $\color{#35bf28}+0.22\%$
test_compile_copy_nested[pytree-compile] 0.1283ms 9.8629μs 101.3902 KOps/s 103.3057 KOps/s $\color{#d91a1a}-1.85\%$
test_compile_copy_nested[pytree-eager] 0.4368ms 68.1385μs 14.6760 KOps/s 14.3637 KOps/s $\color{#35bf28}+2.17\%$
test_compile_add_one_flat[tensordict-compile] 0.2378ms 0.1803ms 5.5461 KOps/s 5.2925 KOps/s $\color{#35bf28}+4.79\%$
test_compile_add_one_flat[tensordict-eager] 0.3445ms 0.2808ms 3.5617 KOps/s 3.5125 KOps/s $\color{#35bf28}+1.40\%$
test_compile_add_one_flat[tensorclass-compile] 0.3399ms 0.1206ms 8.2912 KOps/s 8.1740 KOps/s $\color{#35bf28}+1.43\%$
test_compile_add_one_flat[tensorclass-eager] 0.1291ms 74.3801μs 13.4445 KOps/s 13.6176 KOps/s $\color{#d91a1a}-1.27\%$
test_compile_add_one_flat[pytree-compile] 0.2047ms 0.1585ms 6.3094 KOps/s 6.2229 KOps/s $\color{#35bf28}+1.39\%$
test_compile_add_one_flat[pytree-eager] 0.8081ms 0.5270ms 1.8977 KOps/s 1.9131 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_add_self_flat[tensordict-eager] 0.4639ms 0.3342ms 2.9919 KOps/s 2.9731 KOps/s $\color{#35bf28}+0.63\%$
test_compile_add_self_flat[tensordict-compile] 0.2702ms 0.1838ms 5.4402 KOps/s 5.3150 KOps/s $\color{#35bf28}+2.35\%$
test_compile_add_self_flat[tensorclass-eager] 0.1546ms 93.8667μs 10.6534 KOps/s 11.2171 KOps/s $\textbf{\color{#d91a1a}-5.03\%}$
test_compile_add_self_flat[tensorclass-compile] 0.3998ms 0.1236ms 8.0910 KOps/s 8.0098 KOps/s $\color{#35bf28}+1.01\%$
test_compile_add_self_flat[pytree-eager] 0.7109ms 0.4422ms 2.2614 KOps/s 2.2754 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_add_self_flat[pytree-compile] 0.3243ms 0.1589ms 6.2938 KOps/s 6.2212 KOps/s $\color{#35bf28}+1.17\%$
test_compile_copy_flat[tensordict-compile] 0.1153ms 14.2303μs 70.2727 KOps/s 75.0994 KOps/s $\textbf{\color{#d91a1a}-6.43\%}$
test_compile_copy_flat[tensordict-eager] 83.5010μs 41.1625μs 24.2940 KOps/s 23.8189 KOps/s $\color{#35bf28}+1.99\%$
test_compile_copy_flat[pytree-compile] 0.1300ms 10.6103μs 94.2480 KOps/s 93.7626 KOps/s $\color{#35bf28}+0.52\%$
test_compile_copy_flat[pytree-eager] 0.4105ms 52.7329μs 18.9635 KOps/s 18.9901 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_assign_and_add[tensordict-compile] 2.0134ms 0.1729ms 5.7849 KOps/s 5.4381 KOps/s $\textbf{\color{#35bf28}+6.38\%}$
test_compile_assign_and_add[tensordict-eager] 3.4170ms 3.2885ms 304.0886 Ops/s 287.8791 Ops/s $\textbf{\color{#35bf28}+5.63\%}$
test_compile_assign_and_add[pytree-compile] 1.9758ms 0.1616ms 6.1892 KOps/s 6.0822 KOps/s $\color{#35bf28}+1.76\%$
test_compile_assign_and_add[pytree-eager] 2.9197ms 2.7737ms 360.5231 Ops/s 343.5056 Ops/s $\color{#35bf28}+4.95\%$
test_compile_indexing[tensor-tensordict-compile] 0.1573ms 0.1096ms 9.1224 KOps/s 8.6739 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3177ms 73.4552μs 13.6137 KOps/s 13.3368 KOps/s $\color{#35bf28}+2.08\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1656ms 97.2952μs 10.2780 KOps/s 10.4201 KOps/s $\color{#d91a1a}-1.36\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2528ms 43.9939μs 22.7304 KOps/s 22.4954 KOps/s $\color{#35bf28}+1.04\%$
test_compile_indexing[tensor-pytree-compile] 0.1544ms 98.1532μs 10.1882 KOps/s 10.3182 KOps/s $\color{#d91a1a}-1.26\%$
test_compile_indexing[tensor-pytree-eager] 0.2625ms 43.9583μs 22.7488 KOps/s 21.2364 KOps/s $\textbf{\color{#35bf28}+7.12\%}$
test_compile_indexing[slice-tensordict-compile] 0.2269ms 56.2791μs 17.7686 KOps/s 17.0230 KOps/s $\color{#35bf28}+4.38\%$
test_compile_indexing[slice-tensordict-eager] 0.2139ms 27.1671μs 36.8092 KOps/s 36.2176 KOps/s $\color{#35bf28}+1.63\%$
test_compile_indexing[slice-tensorclass-compile] 82.1310μs 44.4330μs 22.5058 KOps/s 22.3215 KOps/s $\color{#35bf28}+0.83\%$
test_compile_indexing[slice-tensorclass-eager] 0.2792ms 22.3219μs 44.7991 KOps/s 44.1125 KOps/s $\color{#35bf28}+1.56\%$
test_compile_indexing[slice-pytree-compile] 83.0810μs 44.0181μs 22.7179 KOps/s 22.3506 KOps/s $\color{#35bf28}+1.64\%$
test_compile_indexing[slice-pytree-eager] 0.2769ms 22.3985μs 44.6459 KOps/s 43.9830 KOps/s $\color{#35bf28}+1.51\%$
test_compile_indexing[int-tensordict-compile] 0.1040ms 55.9407μs 17.8761 KOps/s 17.4092 KOps/s $\color{#35bf28}+2.68\%$
test_compile_indexing[int-tensordict-eager] 0.2055ms 26.3326μs 37.9758 KOps/s 36.4648 KOps/s $\color{#35bf28}+4.14\%$
test_compile_indexing[int-tensorclass-compile] 88.8910μs 44.8750μs 22.2841 KOps/s 22.0333 KOps/s $\color{#35bf28}+1.14\%$
test_compile_indexing[int-tensorclass-eager] 0.2924ms 22.3266μs 44.7896 KOps/s 44.3433 KOps/s $\color{#35bf28}+1.01\%$
test_compile_indexing[int-pytree-compile] 80.3510μs 44.5609μs 22.4412 KOps/s 21.9488 KOps/s $\color{#35bf28}+2.24\%$
test_compile_indexing[int-pytree-eager] 0.2574ms 22.1480μs 45.1507 KOps/s 44.3996 KOps/s $\color{#35bf28}+1.69\%$
test_compile_replace[single-eager] 0.1199ms 46.9380μs 21.3047 KOps/s 21.1875 KOps/s $\color{#35bf28}+0.55\%$
test_compile_replace[single-compile] 0.2415ms 0.1054ms 9.4888 KOps/s 9.0967 KOps/s $\color{#35bf28}+4.31\%$
test_compile_replace[multi-eager] 0.6090ms 0.5615ms 1.7808 KOps/s 1.7133 KOps/s $\color{#35bf28}+3.94\%$
test_compile_replace[multi-compile] 0.1813ms 0.1120ms 8.9299 KOps/s 8.5644 KOps/s $\color{#35bf28}+4.27\%$
test_compile_tc_getattr_20[eager] 0.2131ms 0.1669ms 5.9924 KOps/s 5.9983 KOps/s $\color{#d91a1a}-0.10\%$
test_compile_tc_getattr_20[compile] 0.4133ms 0.1200ms 8.3349 KOps/s 8.0781 KOps/s $\color{#35bf28}+3.18\%$
test_compile_clone_shallow[20-eager] 84.4510μs 19.2458μs 51.9594 KOps/s 53.0138 KOps/s $\color{#d91a1a}-1.99\%$
test_compile_clone_shallow[20-compile] 62.4510μs 11.5981μs 86.2207 KOps/s 92.0748 KOps/s $\textbf{\color{#d91a1a}-6.36\%}$
test_compile_clone_shallow[40-eager] 65.2510μs 34.1628μs 29.2716 KOps/s 29.5080 KOps/s $\color{#d91a1a}-0.80\%$
test_compile_clone_shallow[40-compile] 61.6010μs 12.7534μs 78.4103 KOps/s 81.1723 KOps/s $\color{#d91a1a}-3.40\%$
test_compile_clone_shallow[80-eager] 0.1371ms 62.4622μs 16.0097 KOps/s 15.6902 KOps/s $\color{#35bf28}+2.04\%$
test_compile_clone_shallow[80-compile] 59.8100μs 15.2559μs 65.5485 KOps/s 67.4377 KOps/s $\color{#d91a1a}-2.80\%$
test_compile_update_inplace[eager] 94.3320μs 59.1258μs 16.9131 KOps/s 16.9138 KOps/s $-0.00\%$
test_compile_update_inplace[compile] 0.2871ms 0.1394ms 7.1736 KOps/s 6.7247 KOps/s $\textbf{\color{#35bf28}+6.68\%}$
test_mod_add[eager] 93.1410μs 51.0641μs 19.5832 KOps/s 20.4551 KOps/s $\color{#d91a1a}-4.26\%$
test_mod_add[compile] 0.1416ms 0.1033ms 9.6768 KOps/s 9.4041 KOps/s $\color{#35bf28}+2.90\%$
test_mod_add[compile-overhead] 0.2328ms 0.1482ms 6.7492 KOps/s 6.6125 KOps/s $\color{#35bf28}+2.07\%$
test_mod_wrap[eager] 0.3649ms 0.2886ms 3.4652 KOps/s 3.3460 KOps/s $\color{#35bf28}+3.56\%$
test_mod_wrap[compile] 0.5026ms 0.3529ms 2.8334 KOps/s 2.7931 KOps/s $\color{#35bf28}+1.44\%$
test_mod_wrap[compile-overhead] 7.3260ms 4.0345ms 247.8591 Ops/s 248.0903 Ops/s $\color{#d91a1a}-0.09\%$
test_mod_wrap_and_backward[eager] 1.6036ms 1.5041ms 664.8375 Ops/s 661.6995 Ops/s $\color{#35bf28}+0.47\%$
test_mod_wrap_and_backward[compile] 1.5641ms 1.4390ms 694.9172 Ops/s 683.5378 Ops/s $\color{#35bf28}+1.66\%$
test_mod_wrap_and_backward[compile-overhead] 1.2508ms 0.8900ms 1.1236 KOps/s 994.3950 Ops/s $\textbf{\color{#35bf28}+12.99\%}$
test_seq_add[eager] 0.2093ms 0.1543ms 6.4801 KOps/s 6.1472 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_seq_add[compile] 0.1896ms 0.1125ms 8.8875 KOps/s 8.1445 KOps/s $\textbf{\color{#35bf28}+9.12\%}$
test_seq_add[compile-overhead] 0.1952ms 0.1536ms 6.5113 KOps/s 6.3386 KOps/s $\color{#35bf28}+2.72\%$
test_seq_wrap[eager] 0.5908ms 0.5234ms 1.9105 KOps/s 1.8614 KOps/s $\color{#35bf28}+2.64\%$
test_seq_wrap[compile] 0.4424ms 0.3629ms 2.7558 KOps/s 2.5605 KOps/s $\textbf{\color{#35bf28}+7.63\%}$
test_seq_wrap[compile-overhead] 0.3258ms 0.2672ms 3.7429 KOps/s 3.5936 KOps/s $\color{#35bf28}+4.15\%$
test_func_call_runtime[False-eager] 0.9033ms 0.8358ms 1.1965 KOps/s 1.1122 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_func_call_runtime[False-compile] 1.0746ms 0.9016ms 1.1092 KOps/s 1.0415 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_func_call_runtime[False-compile-overhead] 0.5206ms 0.4613ms 2.1678 KOps/s 2.1216 KOps/s $\color{#35bf28}+2.18\%$
test_func_call_runtime[True-eager] 1.2476ms 1.0747ms 930.4935 Ops/s 917.6295 Ops/s $\color{#35bf28}+1.40\%$
test_func_call_runtime[True-compile] 0.9805ms 0.9156ms 1.0922 KOps/s 1.0602 KOps/s $\color{#35bf28}+3.02\%$
test_func_call_runtime[True-compile-overhead] 0.5170ms 0.4759ms 2.1012 KOps/s 2.0611 KOps/s $\color{#35bf28}+1.94\%$
test_func_call_cm_runtime[False-eager] 0.9154ms 0.8302ms 1.2045 KOps/s 1.1064 KOps/s $\textbf{\color{#35bf28}+8.86\%}$
test_func_call_cm_runtime[False-compile] 0.9902ms 0.9031ms 1.1073 KOps/s 1.0430 KOps/s $\textbf{\color{#35bf28}+6.16\%}$
test_func_call_cm_runtime[False-compile-overhead] 0.5367ms 0.4640ms 2.1553 KOps/s 2.1265 KOps/s $\color{#35bf28}+1.36\%$
test_func_call_cm_runtime[True-eager] 1.3229ms 1.2215ms 818.6780 Ops/s 815.4817 Ops/s $\color{#35bf28}+0.39\%$
test_func_call_cm_runtime[True-compile] 1.0981ms 0.9499ms 1.0528 KOps/s 987.0900 Ops/s $\textbf{\color{#35bf28}+6.66\%}$
test_func_call_cm_runtime[True-compile-overhead] 0.5595ms 0.5079ms 1.9691 KOps/s 1.9283 KOps/s $\color{#35bf28}+2.12\%$
test_vmap_func_call_cm_runtime[eager] 2.8774ms 2.3801ms 420.1436 Ops/s 417.1371 Ops/s $\color{#35bf28}+0.72\%$
test_vmap_func_call_cm_runtime[compile] 1.0576ms 0.9729ms 1.0278 KOps/s 1.0129 KOps/s $\color{#35bf28}+1.47\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.6079ms 0.5153ms 1.9407 KOps/s 1.9280 KOps/s $\color{#35bf28}+0.66\%$
test_distributed 2.6095ms 0.1675ms 5.9688 KOps/s 6.4998 KOps/s $\textbf{\color{#d91a1a}-8.17\%}$
test_tdmodule 46.6800μs 27.1856μs 36.7841 KOps/s 34.8266 KOps/s $\textbf{\color{#35bf28}+5.62\%}$
test_tdmodule_dispatch 74.7310μs 45.1485μs 22.1491 KOps/s 21.0183 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_tdseq 48.7010μs 26.9747μs 37.0717 KOps/s 36.1905 KOps/s $\color{#35bf28}+2.43\%$
test_tdseq_dispatch 74.3710μs 47.1955μs 21.1885 KOps/s 20.7823 KOps/s $\color{#35bf28}+1.95\%$
test_instantiation_functorch 2.3041ms 2.0650ms 484.2536 Ops/s 477.4496 Ops/s $\color{#35bf28}+1.43\%$
test_exec_functorch 0.2232ms 0.1788ms 5.5939 KOps/s 5.5186 KOps/s $\color{#35bf28}+1.36\%$
test_exec_functional_call 0.2129ms 0.1582ms 6.3204 KOps/s 6.1701 KOps/s $\color{#35bf28}+2.44\%$
test_exec_td_decorator 0.4416ms 0.2365ms 4.2282 KOps/s 4.1899 KOps/s $\color{#35bf28}+0.92\%$
test_vmap_mlp_speed_decorator[True-True] 1.0467ms 0.8268ms 1.2095 KOps/s 1.2079 KOps/s $\color{#35bf28}+0.14\%$
test_vmap_mlp_speed_decorator[True-False] 1.0089ms 0.8253ms 1.2117 KOps/s 1.2080 KOps/s $\color{#35bf28}+0.31\%$
test_vmap_mlp_speed_decorator[False-True] 0.8920ms 0.7125ms 1.4034 KOps/s 1.3985 KOps/s $\color{#35bf28}+0.35\%$
test_vmap_mlp_speed_decorator[False-False] 0.8953ms 0.7107ms 1.4070 KOps/s 1.4015 KOps/s $\color{#35bf28}+0.39\%$
test_vmap_transformer_speed_decorator[True-True] 20.9473ms 20.5104ms 48.7557 Ops/s 48.5871 Ops/s $\color{#35bf28}+0.35\%$
test_vmap_transformer_speed_decorator[True-False] 21.1469ms 20.5193ms 48.7347 Ops/s 48.5294 Ops/s $\color{#35bf28}+0.42\%$
test_vmap_transformer_speed_decorator[False-True] 21.0960ms 20.3112ms 49.2339 Ops/s 49.2114 Ops/s $\color{#35bf28}+0.05\%$
test_vmap_transformer_speed_decorator[False-False] 21.0325ms 20.3180ms 49.2174 Ops/s 49.0513 Ops/s $\color{#35bf28}+0.34\%$
test_to_module_speed[True] 1.6893ms 1.4837ms 674.0051 Ops/s 677.8840 Ops/s $\color{#d91a1a}-0.57\%$
test_to_module_speed[False] 1.9447ms 1.4651ms 682.5291 Ops/s 698.7708 Ops/s $\color{#d91a1a}-2.32\%$
test_tc_init 0.1013ms 44.6288μs 22.4070 KOps/s 22.0344 KOps/s $\color{#35bf28}+1.69\%$
test_tc_init_tensor_only 33.8710μs 9.6712μs 103.3998 KOps/s 102.7695 KOps/s $\color{#35bf28}+0.61\%$
test_tc_init_nested 0.1449ms 87.0039μs 11.4937 KOps/s 11.0915 KOps/s $\color{#35bf28}+3.63\%$
test_tc_init_many_fields 41.7610μs 16.2410μs 61.5725 KOps/s 60.7864 KOps/s $\color{#35bf28}+1.29\%$
test_tc_first_layer_tensor 31.5800μs 1.8166μs 550.4796 KOps/s 555.1157 KOps/s $\color{#d91a1a}-0.84\%$
test_tc_first_layer_tensor_only 1.4185μs 0.3959μs 2.5259 MOps/s 2.5589 MOps/s $\color{#d91a1a}-1.29\%$
test_tc_first_layer_tensor_set 45.7200μs 3.9761μs 251.5050 KOps/s 255.2568 KOps/s $\color{#d91a1a}-1.47\%$
test_tc_first_layer_tensor_only_set 87.1210μs 3.2762μs 305.2276 KOps/s 303.9209 KOps/s $\color{#35bf28}+0.43\%$
test_tc_first_layer_nontensor 49.9410μs 6.1513μs 162.5670 KOps/s 161.6222 KOps/s $\color{#35bf28}+0.58\%$
test_tc_second_layer_tensor 25.7900μs 4.4161μs 226.4434 KOps/s 229.1020 KOps/s $\color{#d91a1a}-1.16\%$
test_tc_second_layer_nontensor 0.1193ms 8.6894μs 115.0831 KOps/s 115.7815 KOps/s $\color{#d91a1a}-0.60\%$
test_unbind 0.2697s 16.7863ms 59.5724 Ops/s 53.7523 Ops/s $\textbf{\color{#35bf28}+10.83\%}$
test_full_like 4.9549ms 4.4719ms 223.6176 Ops/s 214.6760 Ops/s $\color{#35bf28}+4.17\%$
test_zeros_like 5.0079ms 4.4323ms 225.6165 Ops/s 133.7640 Ops/s $\textbf{\color{#35bf28}+68.67\%}$
test_ones_like 4.6806ms 4.4426ms 225.0947 Ops/s 230.3493 Ops/s $\color{#d91a1a}-2.28\%$
test_clone 7.7022ms 6.8505ms 145.9746 Ops/s 147.1398 Ops/s $\color{#d91a1a}-0.79\%$
test_squeeze 0.1757ms 14.5722μs 68.6238 KOps/s 71.8232 KOps/s $\color{#d91a1a}-4.45\%$
test_unsqueeze 0.1655ms 0.1106ms 9.0425 KOps/s 8.8943 KOps/s $\color{#35bf28}+1.67\%$
test_split 0.2709ms 0.1855ms 5.3921 KOps/s 5.4091 KOps/s $\color{#d91a1a}-0.31\%$
test_permute 0.2590ms 0.2052ms 4.8745 KOps/s 4.8998 KOps/s $\color{#d91a1a}-0.52\%$
test_stack 36.7458ms 35.8936ms 27.8601 Ops/s 19.1144 Ops/s $\textbf{\color{#35bf28}+45.75\%}$
test_cat 36.4606ms 35.7396ms 27.9801 Ops/s 19.1662 Ops/s $\textbf{\color{#35bf28}+45.99\%}$
test_sequential_tensordict 0.6191ms 0.2198ms 4.5502 KOps/s 4.4543 KOps/s $\color{#35bf28}+2.15\%$
test_sequential_graph_module 0.1698ms 0.1175ms 8.5097 KOps/s 8.3660 KOps/s $\color{#35bf28}+1.72\%$
test_nested_tensordict 0.5583ms 0.2800ms 3.5711 KOps/s 3.4116 KOps/s $\color{#35bf28}+4.68\%$
test_nested_graph_module 0.1762ms 0.1274ms 7.8486 KOps/s 7.7544 KOps/s $\color{#35bf28}+1.22\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant