Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Pass type directly during reduction #1225

Merged
merged 1 commit into from
Feb 20, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 20, 2025

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Feb 20, 2025
ghstack-source-id: 737f03775bc98090172e397ad4a65c8e777302e5
Pull Request resolved: #1225
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2025
@vmoens vmoens merged commit a60eae3 into gh/vmoens/49/base Feb 20, 2025
29 of 35 checks passed
vmoens added a commit that referenced this pull request Feb 20, 2025
ghstack-source-id: 737f03775bc98090172e397ad4a65c8e777302e5
Pull Request resolved: #1225
@vmoens vmoens deleted the gh/vmoens/49/head branch February 20, 2025 10:52
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 217. Improved: $\large\color{#35bf28}14$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 73.3680μs 21.1358μs 47.3131 KOps/s 45.3598 KOps/s $\color{#35bf28}+4.31\%$
test_plain_set_stack_nested 65.0730μs 21.5173μs 46.4743 KOps/s 45.3447 KOps/s $\color{#35bf28}+2.49\%$
test_plain_set_nested_inplace 65.0220μs 23.5463μs 42.4695 KOps/s 42.1919 KOps/s $\color{#35bf28}+0.66\%$
test_plain_set_stack_nested_inplace 53.0900μs 23.2974μs 42.9233 KOps/s 42.3885 KOps/s $\color{#35bf28}+1.26\%$
test_items 41.1970μs 4.1558μs 240.6296 KOps/s 240.6172 KOps/s $+0.01\%$
test_items_nested 0.6608ms 0.4022ms 2.4866 KOps/s 2.4608 KOps/s $\color{#35bf28}+1.05\%$
test_items_nested_locked 0.5789ms 0.4008ms 2.4952 KOps/s 2.4487 KOps/s $\color{#35bf28}+1.90\%$
test_items_nested_leaf 0.1560ms 75.8704μs 13.1804 KOps/s 12.9523 KOps/s $\color{#35bf28}+1.76\%$
test_items_stack_nested 0.5842ms 0.4044ms 2.4726 KOps/s 2.4417 KOps/s $\color{#35bf28}+1.27\%$
test_items_stack_nested_leaf 0.1450ms 78.7569μs 12.6973 KOps/s 12.4639 KOps/s $\color{#35bf28}+1.87\%$
test_items_stack_nested_locked 0.7446ms 0.4029ms 2.4819 KOps/s 2.4368 KOps/s $\color{#35bf28}+1.85\%$
test_keys 41.1180μs 3.5950μs 278.1604 KOps/s 288.1791 KOps/s $\color{#d91a1a}-3.48\%$
test_keys_nested 0.2541ms 0.1632ms 6.1259 KOps/s 6.0375 KOps/s $\color{#35bf28}+1.47\%$
test_keys_nested_locked 1.8175ms 0.1694ms 5.9030 KOps/s 5.7900 KOps/s $\color{#35bf28}+1.95\%$
test_keys_nested_leaf 0.2237ms 0.1424ms 7.0208 KOps/s 6.9448 KOps/s $\color{#35bf28}+1.09\%$
test_keys_stack_nested 0.2960ms 0.1622ms 6.1639 KOps/s 6.0936 KOps/s $\color{#35bf28}+1.15\%$
test_keys_stack_nested_leaf 0.2349ms 0.1401ms 7.1384 KOps/s 7.1117 KOps/s $\color{#35bf28}+0.38\%$
test_keys_stack_nested_locked 0.2592ms 0.1680ms 5.9510 KOps/s 5.8003 KOps/s $\color{#35bf28}+2.60\%$
test_values 8.5080μs 1.0284μs 972.3776 KOps/s 912.5684 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_values_nested 0.1252ms 62.3406μs 16.0409 KOps/s 15.8741 KOps/s $\color{#35bf28}+1.05\%$
test_values_nested_locked 0.1155ms 62.1109μs 16.1002 KOps/s 15.9515 KOps/s $\color{#35bf28}+0.93\%$
test_values_nested_leaf 0.1236ms 70.5429μs 14.1758 KOps/s 13.6231 KOps/s $\color{#35bf28}+4.06\%$
test_values_stack_nested 0.1128ms 63.3351μs 15.7890 KOps/s 15.4835 KOps/s $\color{#35bf28}+1.97\%$
test_values_stack_nested_leaf 0.1199ms 70.3727μs 14.2101 KOps/s 14.0675 KOps/s $\color{#35bf28}+1.01\%$
test_values_stack_nested_locked 0.1100ms 63.2012μs 15.8225 KOps/s 15.6124 KOps/s $\color{#35bf28}+1.35\%$
test_membership 42.7300μs 0.8522μs 1.1734 MOps/s 1.1647 MOps/s $\color{#35bf28}+0.75\%$
test_membership_nested 0.1128ms 2.9906μs 334.3798 KOps/s 353.1233 KOps/s $\textbf{\color{#d91a1a}-5.31\%}$
test_membership_nested_leaf 39.5540μs 2.9144μs 343.1227 KOps/s 350.8865 KOps/s $\color{#d91a1a}-2.21\%$
test_membership_stacked_nested 16.4710μs 2.8775μs 347.5288 KOps/s 355.3770 KOps/s $\color{#d91a1a}-2.21\%$
test_membership_stacked_nested_leaf 41.2780μs 2.9157μs 342.9715 KOps/s 355.6710 KOps/s $\color{#d91a1a}-3.57\%$
test_membership_nested_last 31.5990μs 4.3774μs 228.4451 KOps/s 232.9084 KOps/s $\color{#d91a1a}-1.92\%$
test_membership_nested_leaf_last 39.6140μs 4.3769μs 228.4730 KOps/s 224.8871 KOps/s $\color{#35bf28}+1.59\%$
test_membership_stacked_nested_last 27.2420μs 4.3527μs 229.7434 KOps/s 110.7715 KOps/s $\textbf{\color{#35bf28}+107.40\%}$
test_membership_stacked_nested_leaf_last 29.6350μs 4.3687μs 228.9014 KOps/s 110.9579 KOps/s $\textbf{\color{#35bf28}+106.30\%}$
test_nested_getleaf 47.6100μs 10.7119μs 93.3540 KOps/s 93.9481 KOps/s $\color{#d91a1a}-0.63\%$
test_nested_get 0.1862ms 10.1827μs 98.2059 KOps/s 99.1510 KOps/s $\color{#d91a1a}-0.95\%$
test_stacked_getleaf 35.7670μs 10.5035μs 95.2060 KOps/s 94.3618 KOps/s $\color{#35bf28}+0.89\%$
test_stacked_get 36.0280μs 10.0725μs 99.2799 KOps/s 98.7608 KOps/s $\color{#35bf28}+0.53\%$
test_nested_getitemleaf 55.1640μs 11.3541μs 88.0736 KOps/s 88.8758 KOps/s $\color{#d91a1a}-0.90\%$
test_nested_getitem 37.5310μs 10.6763μs 93.6650 KOps/s 92.7299 KOps/s $\color{#35bf28}+1.01\%$
test_stacked_getitemleaf 41.7380μs 11.1099μs 90.0094 KOps/s 89.3541 KOps/s $\color{#35bf28}+0.73\%$
test_stacked_getitem 54.9930μs 10.6815μs 93.6194 KOps/s 93.9636 KOps/s $\color{#d91a1a}-0.37\%$
test_lock_nested 0.6442ms 0.4073ms 2.4549 KOps/s 2.4462 KOps/s $\color{#35bf28}+0.36\%$
test_lock_stack_nested 0.6709ms 0.4125ms 2.4240 KOps/s 2.4174 KOps/s $\color{#35bf28}+0.28\%$
test_unlock_nested 0.4276ms 0.3294ms 3.0360 KOps/s 2.9672 KOps/s $\color{#35bf28}+2.32\%$
test_unlock_stack_nested 0.5217ms 0.3347ms 2.9876 KOps/s 2.9786 KOps/s $\color{#35bf28}+0.30\%$
test_flatten_speed 0.1703ms 98.7328μs 10.1283 KOps/s 9.7420 KOps/s $\color{#35bf28}+3.97\%$
test_unflatten_speed 0.9193ms 0.5181ms 1.9302 KOps/s 1.9349 KOps/s $\color{#d91a1a}-0.24\%$
test_common_ops 4.7546ms 0.7912ms 1.2638 KOps/s 1.2356 KOps/s $\color{#35bf28}+2.28\%$
test_creation 46.9780μs 2.6902μs 371.7235 KOps/s 394.5068 KOps/s $\textbf{\color{#d91a1a}-5.78\%}$
test_creation_empty 41.2480μs 11.7597μs 85.0365 KOps/s 77.6438 KOps/s $\textbf{\color{#35bf28}+9.52\%}$
test_creation_nested_1 46.3070μs 14.5521μs 68.7185 KOps/s 62.4425 KOps/s $\textbf{\color{#35bf28}+10.05\%}$
test_creation_nested_2 74.4700μs 19.1495μs 52.2206 KOps/s 48.3439 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_clone 0.1020ms 13.5874μs 73.5978 KOps/s 73.1852 KOps/s $\color{#35bf28}+0.56\%$
test_getitem[int] 0.7758ms 12.8211μs 77.9963 KOps/s 79.9601 KOps/s $\color{#d91a1a}-2.46\%$
test_getitem[slice_int] 0.1324ms 23.9618μs 41.7331 KOps/s 42.5067 KOps/s $\color{#d91a1a}-1.82\%$
test_getitem[range] 0.1598ms 49.7175μs 20.1136 KOps/s 19.9864 KOps/s $\color{#35bf28}+0.64\%$
test_getitem[tuple] 0.1215ms 20.1111μs 49.7237 KOps/s 50.4114 KOps/s $\color{#d91a1a}-1.36\%$
test_getitem[list] 0.1591ms 45.7504μs 21.8578 KOps/s 21.6441 KOps/s $\color{#35bf28}+0.99\%$
test_setitem_dim[int] 51.2470μs 25.6388μs 39.0034 KOps/s 39.6802 KOps/s $\color{#d91a1a}-1.71\%$
test_setitem_dim[slice_int] 97.7240μs 52.2575μs 19.1360 KOps/s 19.5637 KOps/s $\color{#d91a1a}-2.19\%$
test_setitem_dim[range] 0.1658ms 76.7426μs 13.0306 KOps/s 12.7591 KOps/s $\color{#35bf28}+2.13\%$
test_setitem_dim[tuple] 79.1690μs 41.6185μs 24.0278 KOps/s 24.6314 KOps/s $\color{#d91a1a}-2.45\%$
test_setitem 0.1113ms 20.5314μs 48.7058 KOps/s 44.1266 KOps/s $\textbf{\color{#35bf28}+10.38\%}$
test_set 0.1261ms 19.9291μs 50.1778 KOps/s 49.1116 KOps/s $\color{#35bf28}+2.17\%$
test_set_shared 4.1007ms 0.1848ms 5.4127 KOps/s 5.5576 KOps/s $\color{#d91a1a}-2.61\%$
test_update 0.1453ms 23.1352μs 43.2242 KOps/s 42.0405 KOps/s $\color{#35bf28}+2.82\%$
test_update_nested 0.1495ms 34.0490μs 29.3695 KOps/s 29.7640 KOps/s $\color{#d91a1a}-1.33\%$
test_update__nested 0.4420ms 34.4351μs 29.0401 KOps/s 30.0909 KOps/s $\color{#d91a1a}-3.49\%$
test_set_nested 83.7580μs 22.2912μs 44.8608 KOps/s 44.9926 KOps/s $\color{#d91a1a}-0.29\%$
test_set_nested_new 0.1335ms 27.1632μs 36.8145 KOps/s 37.6892 KOps/s $\color{#d91a1a}-2.32\%$
test_select 0.1204ms 42.3684μs 23.6025 KOps/s 23.1762 KOps/s $\color{#35bf28}+1.84\%$
test_select_nested 0.1255ms 62.8537μs 15.9100 KOps/s 16.0196 KOps/s $\color{#d91a1a}-0.68\%$
test_exclude_nested 0.1356ms 82.0479μs 12.1880 KOps/s 12.2676 KOps/s $\color{#d91a1a}-0.65\%$
test_empty[True] 0.5884ms 0.4053ms 2.4672 KOps/s 2.4744 KOps/s $\color{#d91a1a}-0.29\%$
test_empty[False] 9.3820μs 1.4258μs 701.3465 KOps/s 721.2848 KOps/s $\color{#d91a1a}-2.76\%$
test_unbind_speed 0.4107ms 0.2682ms 3.7291 KOps/s 3.7231 KOps/s $\color{#35bf28}+0.16\%$
test_unbind_speed_stack0 0.6100ms 0.2645ms 3.7804 KOps/s 3.8210 KOps/s $\color{#d91a1a}-1.06\%$
test_unbind_speed_stack1 0.1033s 0.7177ms 1.3933 KOps/s 1.3979 KOps/s $\color{#d91a1a}-0.33\%$
test_split 97.1256ms 1.7119ms 584.1535 Ops/s 518.6044 Ops/s $\textbf{\color{#35bf28}+12.64\%}$
test_chunk 0.1010s 1.7185ms 581.9057 Ops/s 633.7025 Ops/s $\textbf{\color{#d91a1a}-8.17\%}$
test_consolidate_njt[False-None] 8.2517ms 8.0410ms 124.3630 Ops/s 110.9241 Ops/s $\textbf{\color{#35bf28}+12.12\%}$
test_creation[device0] 0.2212ms 92.1591μs 10.8508 KOps/s 10.7421 KOps/s $\color{#35bf28}+1.01\%$
test_creation_from_tensor 0.2273ms 93.9182μs 10.6476 KOps/s 10.5887 KOps/s $\color{#35bf28}+0.56\%$
test_add_one[memmap_tensor0] 86.3230μs 4.8856μs 204.6824 KOps/s 201.4157 KOps/s $\color{#35bf28}+1.62\%$
test_contiguous[memmap_tensor0] 7.8840μs 0.5048μs 1.9809 MOps/s 1.9491 MOps/s $\color{#35bf28}+1.63\%$
test_stack[memmap_tensor0] 50.1840μs 3.3511μs 298.4089 KOps/s 298.4609 KOps/s $\color{#d91a1a}-0.02\%$
test_memmaptd_index 0.3193ms 0.2319ms 4.3116 KOps/s 4.4980 KOps/s $\color{#d91a1a}-4.14\%$
test_memmaptd_index_astensor 0.7355ms 0.3176ms 3.1485 KOps/s 3.2425 KOps/s $\color{#d91a1a}-2.90\%$
test_memmaptd_index_op 0.7790ms 0.5785ms 1.7287 KOps/s 1.7067 KOps/s $\color{#35bf28}+1.29\%$
test_serialize_model 0.2092s 0.1293s 7.7313 Ops/s 8.5487 Ops/s $\textbf{\color{#d91a1a}-9.56\%}$
test_serialize_model_pickle 0.5023s 0.4074s 2.4545 Ops/s 2.5190 Ops/s $\color{#d91a1a}-2.56\%$
test_serialize_weights 0.1234s 0.1149s 8.7030 Ops/s 8.2882 Ops/s $\textbf{\color{#35bf28}+5.00\%}$
test_serialize_weights_returnearly 0.1817s 0.1589s 6.2923 Ops/s 6.0984 Ops/s $\color{#35bf28}+3.18\%$
test_serialize_weights_pickle 1.0098s 0.7333s 1.3637 Ops/s 2.3827 Ops/s $\textbf{\color{#d91a1a}-42.77\%}$
test_serialize_weights_filesystem 0.2361s 0.1550s 6.4497 Ops/s 6.2814 Ops/s $\color{#35bf28}+2.68\%$
test_serialize_model_filesystem 0.1523s 0.1419s 7.0487 Ops/s 6.5652 Ops/s $\textbf{\color{#35bf28}+7.36\%}$
test_reshape_pytree 78.7000μs 26.2062μs 38.1589 KOps/s 37.7948 KOps/s $\color{#35bf28}+0.96\%$
test_reshape_td 81.7040μs 32.3632μs 30.8993 KOps/s 30.3990 KOps/s $\color{#35bf28}+1.65\%$
test_view_pytree 58.0190μs 25.9782μs 38.4938 KOps/s 38.1041 KOps/s $\color{#35bf28}+1.02\%$
test_view_td 97.9140μs 40.2615μs 24.8377 KOps/s 24.9388 KOps/s $\color{#d91a1a}-0.41\%$
test_unbind_pytree 72.5560μs 29.7452μs 33.6188 KOps/s 34.2928 KOps/s $\color{#d91a1a}-1.97\%$
test_unbind_td 0.3660ms 39.8114μs 25.1184 KOps/s 25.1713 KOps/s $\color{#d91a1a}-0.21\%$
test_split_pytree 70.4020μs 28.8539μs 34.6574 KOps/s 34.7229 KOps/s $\color{#d91a1a}-0.19\%$
test_split_td 0.5477ms 44.6268μs 22.4081 KOps/s 22.6094 KOps/s $\color{#d91a1a}-0.89\%$
test_add_pytree 0.1011ms 35.1322μs 28.4639 KOps/s 28.0532 KOps/s $\color{#35bf28}+1.46\%$
test_add_td 0.1710ms 55.9808μs 17.8633 KOps/s 16.8212 KOps/s $\textbf{\color{#35bf28}+6.19\%}$
test_compile_add_one_nested[tensordict-compile] 0.1306ms 66.4415μs 15.0508 KOps/s 15.1489 KOps/s $\color{#d91a1a}-0.65\%$
test_compile_add_one_nested[tensordict-eager] 1.4046ms 0.1703ms 5.8725 KOps/s 5.8281 KOps/s $\color{#35bf28}+0.76\%$
test_compile_add_one_nested[pytree-compile] 87.8250μs 45.6847μs 21.8892 KOps/s 21.7597 KOps/s $\color{#35bf28}+0.59\%$
test_compile_add_one_nested[pytree-eager] 0.2019ms 0.1177ms 8.4984 KOps/s 8.4135 KOps/s $\color{#35bf28}+1.01\%$
test_compile_copy_nested[tensordict-compile] 82.6350μs 27.9906μs 35.7263 KOps/s 36.2986 KOps/s $\color{#d91a1a}-1.58\%$
test_compile_copy_nested[tensordict-eager] 0.1159ms 58.7293μs 17.0273 KOps/s 17.0511 KOps/s $\color{#d91a1a}-0.14\%$
test_compile_copy_nested[pytree-compile] 0.1709ms 78.5992μs 12.7228 KOps/s 12.5507 KOps/s $\color{#35bf28}+1.37\%$
test_compile_copy_nested[pytree-eager] 0.1312ms 66.1031μs 15.1279 KOps/s 14.8380 KOps/s $\color{#35bf28}+1.95\%$
test_compile_add_one_flat[tensordict-compile] 0.1987ms 0.1083ms 9.2339 KOps/s 9.2857 KOps/s $\color{#d91a1a}-0.56\%$
test_compile_add_one_flat[tensordict-eager] 0.4016ms 0.2139ms 4.6743 KOps/s 4.6126 KOps/s $\color{#35bf28}+1.34\%$
test_compile_add_one_flat[tensorclass-compile] 0.1830ms 47.0642μs 21.2476 KOps/s 21.1639 KOps/s $\color{#35bf28}+0.40\%$
test_compile_add_one_flat[tensorclass-eager] 0.1410ms 65.1917μs 15.3394 KOps/s 14.9966 KOps/s $\color{#35bf28}+2.29\%$
test_compile_add_one_flat[pytree-compile] 0.2439ms 0.1029ms 9.7227 KOps/s 9.9226 KOps/s $\color{#d91a1a}-2.01\%$
test_compile_add_one_flat[pytree-eager] 0.4268ms 0.2029ms 4.9289 KOps/s 4.9291 KOps/s $-0.00\%$
test_compile_add_self_flat[tensordict-eager] 0.4313ms 0.2320ms 4.3102 KOps/s 4.2746 KOps/s $\color{#35bf28}+0.83\%$
test_compile_add_self_flat[tensordict-compile] 0.4054ms 0.1151ms 8.6851 KOps/s 9.2672 KOps/s $\textbf{\color{#d91a1a}-6.28\%}$
test_compile_add_self_flat[tensorclass-eager] 0.2084ms 61.5999μs 16.2338 KOps/s 16.0743 KOps/s $\color{#35bf28}+0.99\%$
test_compile_add_self_flat[tensorclass-compile] 0.1627ms 47.3931μs 21.1001 KOps/s 20.6972 KOps/s $\color{#35bf28}+1.95\%$
test_compile_add_self_flat[pytree-eager] 0.3433ms 0.1572ms 6.3595 KOps/s 6.3959 KOps/s $\color{#d91a1a}-0.57\%$
test_compile_add_self_flat[pytree-compile] 0.4360ms 0.1020ms 9.8027 KOps/s 9.8645 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_copy_flat[tensordict-compile] 73.9690μs 21.1314μs 47.3229 KOps/s 47.7182 KOps/s $\color{#d91a1a}-0.83\%$
test_compile_copy_flat[tensordict-eager] 0.1462ms 67.8538μs 14.7376 KOps/s 14.4543 KOps/s $\color{#35bf28}+1.96\%$
test_compile_copy_flat[pytree-compile] 0.1526ms 83.9161μs 11.9167 KOps/s 12.2173 KOps/s $\color{#d91a1a}-2.46\%$
test_compile_copy_flat[pytree-eager] 0.1529ms 68.3169μs 14.6377 KOps/s 15.1121 KOps/s $\color{#d91a1a}-3.14\%$
test_compile_assign_and_add[tensordict-compile] 0.3028ms 0.2196ms 4.5541 KOps/s 4.6308 KOps/s $\color{#d91a1a}-1.66\%$
test_compile_assign_and_add[tensordict-eager] 2.6519ms 1.3622ms 734.1234 Ops/s 732.7073 Ops/s $\color{#35bf28}+0.19\%$
test_compile_assign_and_add[pytree-compile] 0.2839ms 0.2160ms 4.6293 KOps/s 4.6876 KOps/s $\color{#d91a1a}-1.25\%$
test_compile_assign_and_add[pytree-eager] 1.2274ms 0.8200ms 1.2196 KOps/s 1.2271 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_assign_and_add_stack[compile] 0.6458ms 0.4662ms 2.1452 KOps/s 2.1847 KOps/s $\color{#d91a1a}-1.81\%$
test_compile_assign_and_add_stack[eager] 2.9150ms 2.6282ms 380.4955 Ops/s 358.1458 Ops/s $\textbf{\color{#35bf28}+6.24\%}$
test_compile_indexing[tensor-tensordict-compile] 93.3660μs 37.8243μs 26.4381 KOps/s 25.8872 KOps/s $\color{#35bf28}+2.13\%$
test_compile_indexing[tensor-tensordict-eager] 0.5544ms 33.0900μs 30.2206 KOps/s 30.5639 KOps/s $\color{#d91a1a}-1.12\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1174ms 30.7676μs 32.5017 KOps/s 32.8333 KOps/s $\color{#d91a1a}-1.01\%$
test_compile_indexing[tensor-tensorclass-eager] 79.2100μs 23.2979μs 42.9223 KOps/s 43.3388 KOps/s $\color{#d91a1a}-0.96\%$
test_compile_indexing[tensor-pytree-compile] 0.1189ms 31.4903μs 31.7558 KOps/s 31.4984 KOps/s $\color{#35bf28}+0.82\%$
test_compile_indexing[tensor-pytree-eager] 0.1373ms 24.1744μs 41.3661 KOps/s 43.4761 KOps/s $\color{#d91a1a}-4.85\%$
test_compile_indexing[slice-tensordict-compile] 0.1261ms 54.2853μs 18.4212 KOps/s 18.5150 KOps/s $\color{#d91a1a}-0.51\%$
test_compile_indexing[slice-tensordict-eager] 0.3307ms 20.3382μs 49.1687 KOps/s 50.7323 KOps/s $\color{#d91a1a}-3.08\%$
test_compile_indexing[slice-tensorclass-compile] 0.1061ms 47.1833μs 21.1939 KOps/s 21.5121 KOps/s $\color{#d91a1a}-1.48\%$
test_compile_indexing[slice-tensorclass-eager] 63.9500μs 18.4151μs 54.3033 KOps/s 54.4553 KOps/s $\color{#d91a1a}-0.28\%$
test_compile_indexing[slice-pytree-compile] 0.1267ms 47.1081μs 21.2278 KOps/s 20.7580 KOps/s $\color{#35bf28}+2.26\%$
test_compile_indexing[slice-pytree-eager] 83.8880μs 18.3140μs 54.6031 KOps/s 53.6559 KOps/s $\color{#35bf28}+1.77\%$
test_compile_indexing[int-tensordict-compile] 0.1055ms 54.8565μs 18.2294 KOps/s 18.2699 KOps/s $\color{#d91a1a}-0.22\%$
test_compile_indexing[int-tensordict-eager] 0.8771ms 19.8710μs 50.3246 KOps/s 51.8446 KOps/s $\color{#d91a1a}-2.93\%$
test_compile_indexing[int-tensorclass-compile] 0.1075ms 47.0439μs 21.2567 KOps/s 21.7539 KOps/s $\color{#d91a1a}-2.29\%$
test_compile_indexing[int-tensorclass-eager] 47.9400μs 18.2035μs 54.9345 KOps/s 55.3269 KOps/s $\color{#d91a1a}-0.71\%$
test_compile_indexing[int-pytree-compile] 0.1358ms 47.3930μs 21.1002 KOps/s 21.8220 KOps/s $\color{#d91a1a}-3.31\%$
test_compile_indexing[int-pytree-eager] 74.7810μs 18.3269μs 54.5646 KOps/s 55.1995 KOps/s $\color{#d91a1a}-1.15\%$
test_mod_add[eager] 80.9430μs 34.6324μs 28.8747 KOps/s 27.7389 KOps/s $\color{#35bf28}+4.09\%$
test_mod_add[compile] 0.1332ms 66.4537μs 15.0481 KOps/s 15.9242 KOps/s $\textbf{\color{#d91a1a}-5.50\%}$
test_mod_add[compile-overhead] 0.1636ms 66.5532μs 15.0256 KOps/s 15.6896 KOps/s $\color{#d91a1a}-4.23\%$
test_mod_wrap[eager] 0.3912ms 0.2255ms 4.4350 KOps/s 4.5273 KOps/s $\color{#d91a1a}-2.04\%$
test_mod_wrap[compile] 2.3750ms 0.2343ms 4.2682 KOps/s 4.3994 KOps/s $\color{#d91a1a}-2.98\%$
test_mod_wrap[compile-overhead] 0.4083ms 0.2264ms 4.4172 KOps/s 4.4274 KOps/s $\color{#d91a1a}-0.23\%$
test_mod_wrap_and_backward[eager] 19.4238ms 13.4911ms 74.1231 Ops/s 74.9291 Ops/s $\color{#d91a1a}-1.08\%$
test_mod_wrap_and_backward[compile] 13.1499ms 11.3408ms 88.1771 Ops/s 87.7428 Ops/s $\color{#35bf28}+0.49\%$
test_mod_wrap_and_backward[compile-overhead] 12.0926ms 11.1710ms 89.5179 Ops/s 89.8320 Ops/s $\color{#d91a1a}-0.35\%$
test_seq_add[eager] 0.2210ms 0.1191ms 8.3970 KOps/s 8.4611 KOps/s $\color{#d91a1a}-0.76\%$
test_seq_add[compile] 0.1997ms 80.4299μs 12.4332 KOps/s 13.4032 KOps/s $\textbf{\color{#d91a1a}-7.24\%}$
test_seq_add[compile-overhead] 0.1428ms 76.9224μs 13.0001 KOps/s 13.4634 KOps/s $\color{#d91a1a}-3.44\%$
test_seq_wrap[eager] 0.8222ms 0.4460ms 2.2421 KOps/s 2.2462 KOps/s $\color{#d91a1a}-0.18\%$
test_seq_wrap[compile] 0.4481ms 0.2487ms 4.0217 KOps/s 4.1618 KOps/s $\color{#d91a1a}-3.37\%$
test_seq_wrap[compile-overhead] 0.3313ms 0.2501ms 3.9984 KOps/s 4.2099 KOps/s $\textbf{\color{#d91a1a}-5.02\%}$
test_func_call_runtime[False-eager] 0.7561ms 0.5420ms 1.8452 KOps/s 1.8494 KOps/s $\color{#d91a1a}-0.23\%$
test_func_call_runtime[False-compile] 0.5698ms 0.4505ms 2.2198 KOps/s 2.2500 KOps/s $\color{#d91a1a}-1.34\%$
test_func_call_runtime[False-compile-overhead] 0.8287ms 0.4512ms 2.2164 KOps/s 2.2920 KOps/s $\color{#d91a1a}-3.30\%$
test_func_call_runtime[True-eager] 0.9000ms 0.7619ms 1.3124 KOps/s 1.3582 KOps/s $\color{#d91a1a}-3.37\%$
test_func_call_runtime[True-compile] 0.6256ms 0.4699ms 2.1282 KOps/s 2.1494 KOps/s $\color{#d91a1a}-0.99\%$
test_func_call_runtime[True-compile-overhead] 0.6160ms 0.4699ms 2.1283 KOps/s 2.1657 KOps/s $\color{#d91a1a}-1.73\%$
test_func_call_cm_runtime[False-eager] 0.8265ms 0.5479ms 1.8253 KOps/s 1.8730 KOps/s $\color{#d91a1a}-2.55\%$
test_func_call_cm_runtime[False-compile] 0.8214ms 0.4521ms 2.2119 KOps/s 2.2626 KOps/s $\color{#d91a1a}-2.24\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5781ms 0.4474ms 2.2354 KOps/s 2.2326 KOps/s $\color{#35bf28}+0.13\%$
test_func_call_cm_runtime[True-eager] 1.5846ms 0.8975ms 1.1142 KOps/s 1.1183 KOps/s $\color{#d91a1a}-0.37\%$
test_func_call_cm_runtime[True-compile] 1.0849ms 0.7906ms 1.2649 KOps/s 1.2566 KOps/s $\color{#35bf28}+0.66\%$
test_func_call_cm_runtime[True-compile-overhead] 1.4683ms 0.8007ms 1.2489 KOps/s 1.2425 KOps/s $\color{#35bf28}+0.51\%$
test_vmap_func_call_cm_runtime[eager] 3.1397ms 1.9428ms 514.7152 Ops/s 521.3969 Ops/s $\color{#d91a1a}-1.28\%$
test_vmap_func_call_cm_runtime[compile] 0.6900ms 0.5462ms 1.8310 KOps/s 1.8434 KOps/s $\color{#d91a1a}-0.67\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.7357ms 0.5500ms 1.8181 KOps/s 1.8362 KOps/s $\color{#d91a1a}-0.98\%$
test_distributed 0.2565ms 0.1264ms 7.9120 KOps/s 7.8942 KOps/s $\color{#35bf28}+0.23\%$
test_tdmodule 0.1180ms 27.3747μs 36.5300 KOps/s 36.8614 KOps/s $\color{#d91a1a}-0.90\%$
test_tdmodule_dispatch 77.4650μs 50.4470μs 19.8228 KOps/s 20.2867 KOps/s $\color{#d91a1a}-2.29\%$
test_tdseq 50.5050μs 29.4125μs 33.9992 KOps/s 34.3308 KOps/s $\color{#d91a1a}-0.97\%$
test_tdseq_dispatch 94.8680μs 54.1311μs 18.4737 KOps/s 18.0873 KOps/s $\color{#35bf28}+2.14\%$
test_instantiation_functorch 2.3338ms 1.5267ms 655.0289 Ops/s 654.4930 Ops/s $\color{#35bf28}+0.08\%$
test_exec_functorch 0.3209ms 0.1754ms 5.7015 KOps/s 5.7577 KOps/s $\color{#d91a1a}-0.98\%$
test_exec_functional_call 0.3480ms 0.1714ms 5.8341 KOps/s 5.9462 KOps/s $\color{#d91a1a}-1.88\%$
test_exec_td_decorator 0.5495ms 0.2350ms 4.2547 KOps/s 4.3809 KOps/s $\color{#d91a1a}-2.88\%$
test_vmap_mlp_speed_decorator[True-True] 0.8612ms 0.6597ms 1.5159 KOps/s 1.4930 KOps/s $\color{#35bf28}+1.53\%$
test_vmap_mlp_speed_decorator[True-False] 0.8635ms 0.6591ms 1.5173 KOps/s 1.5241 KOps/s $\color{#d91a1a}-0.45\%$
test_vmap_mlp_speed_decorator[False-True] 0.8636ms 0.5330ms 1.8762 KOps/s 1.8856 KOps/s $\color{#d91a1a}-0.50\%$
test_vmap_mlp_speed_decorator[False-False] 0.7554ms 0.5342ms 1.8720 KOps/s 1.8903 KOps/s $\color{#d91a1a}-0.97\%$
test_to_module_speed[True] 1.9207ms 1.3410ms 745.6889 Ops/s 756.7163 Ops/s $\color{#d91a1a}-1.46\%$
test_to_module_speed[False] 1.8672ms 1.3070ms 765.1127 Ops/s 761.4716 Ops/s $\color{#35bf28}+0.48\%$
test_tc_init 0.1032ms 48.6976μs 20.5349 KOps/s 20.4788 KOps/s $\color{#35bf28}+0.27\%$
test_tc_init_nested 0.1682ms 94.2443μs 10.6107 KOps/s 10.3317 KOps/s $\color{#35bf28}+2.70\%$
test_tc_first_layer_tensor 27.3310μs 1.5374μs 650.4461 KOps/s 667.5568 KOps/s $\color{#d91a1a}-2.56\%$
test_tc_first_layer_nontensor 55.2040μs 4.6578μs 214.6950 KOps/s 208.9653 KOps/s $\color{#35bf28}+2.74\%$
test_tc_second_layer_tensor 29.8450μs 2.8476μs 351.1668 KOps/s 355.4703 KOps/s $\color{#d91a1a}-1.21\%$
test_tc_second_layer_nontensor 53.8090μs 6.0841μs 164.3627 KOps/s 159.6604 KOps/s $\color{#35bf28}+2.95\%$
test_unbind 0.2461s 13.8782ms 72.0552 Ops/s 61.5748 Ops/s $\textbf{\color{#35bf28}+17.02\%}$
test_full_like 9.8828ms 8.9548ms 111.6722 Ops/s 132.8169 Ops/s $\textbf{\color{#d91a1a}-15.92\%}$
test_zeros_like 6.1890ms 3.2957ms 303.4220 Ops/s 366.5040 Ops/s $\textbf{\color{#d91a1a}-17.21\%}$
test_ones_like 4.7759ms 3.5759ms 279.6519 Ops/s 301.7850 Ops/s $\textbf{\color{#d91a1a}-7.33\%}$
test_clone 10.9172ms 6.8794ms 145.3619 Ops/s 197.5532 Ops/s $\textbf{\color{#d91a1a}-26.42\%}$
test_squeeze 83.1470μs 12.7846μs 78.2190 KOps/s 78.7710 KOps/s $\color{#d91a1a}-0.70\%$
test_unsqueeze 0.3757ms 96.2666μs 10.3878 KOps/s 10.9264 KOps/s $\color{#d91a1a}-4.93\%$
test_split 0.3958ms 0.1971ms 5.0731 KOps/s 5.2796 KOps/s $\color{#d91a1a}-3.91\%$
test_permute 0.3257ms 0.2086ms 4.7948 KOps/s 4.9780 KOps/s $\color{#d91a1a}-3.68\%$
test_stack 29.9063ms 24.9307ms 40.1112 Ops/s 39.0256 Ops/s $\color{#35bf28}+2.78\%$
test_cat 29.7383ms 25.2118ms 39.6640 Ops/s 38.3236 Ops/s $\color{#35bf28}+3.50\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants