-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Pass type directly during reduction #1225
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vmoens
added a commit
that referenced
this pull request
Feb 20, 2025
ghstack-source-id: 737f03775bc98090172e397ad4a65c8e777302e5 Pull Request resolved: #1225
vmoens
added a commit
that referenced
this pull request
Feb 20, 2025
ghstack-source-id: 737f03775bc98090172e397ad4a65c8e777302e5 Pull Request resolved: #1225
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_plain_set_nested | 73.3680μs | 21.1358μs | 47.3131 KOps/s | 45.3598 KOps/s | |
test_plain_set_stack_nested | 65.0730μs | 21.5173μs | 46.4743 KOps/s | 45.3447 KOps/s | |
test_plain_set_nested_inplace | 65.0220μs | 23.5463μs | 42.4695 KOps/s | 42.1919 KOps/s | |
test_plain_set_stack_nested_inplace | 53.0900μs | 23.2974μs | 42.9233 KOps/s | 42.3885 KOps/s | |
test_items | 41.1970μs | 4.1558μs | 240.6296 KOps/s | 240.6172 KOps/s | |
test_items_nested | 0.6608ms | 0.4022ms | 2.4866 KOps/s | 2.4608 KOps/s | |
test_items_nested_locked | 0.5789ms | 0.4008ms | 2.4952 KOps/s | 2.4487 KOps/s | |
test_items_nested_leaf | 0.1560ms | 75.8704μs | 13.1804 KOps/s | 12.9523 KOps/s | |
test_items_stack_nested | 0.5842ms | 0.4044ms | 2.4726 KOps/s | 2.4417 KOps/s | |
test_items_stack_nested_leaf | 0.1450ms | 78.7569μs | 12.6973 KOps/s | 12.4639 KOps/s | |
test_items_stack_nested_locked | 0.7446ms | 0.4029ms | 2.4819 KOps/s | 2.4368 KOps/s | |
test_keys | 41.1180μs | 3.5950μs | 278.1604 KOps/s | 288.1791 KOps/s | |
test_keys_nested | 0.2541ms | 0.1632ms | 6.1259 KOps/s | 6.0375 KOps/s | |
test_keys_nested_locked | 1.8175ms | 0.1694ms | 5.9030 KOps/s | 5.7900 KOps/s | |
test_keys_nested_leaf | 0.2237ms | 0.1424ms | 7.0208 KOps/s | 6.9448 KOps/s | |
test_keys_stack_nested | 0.2960ms | 0.1622ms | 6.1639 KOps/s | 6.0936 KOps/s | |
test_keys_stack_nested_leaf | 0.2349ms | 0.1401ms | 7.1384 KOps/s | 7.1117 KOps/s | |
test_keys_stack_nested_locked | 0.2592ms | 0.1680ms | 5.9510 KOps/s | 5.8003 KOps/s | |
test_values | 8.5080μs | 1.0284μs | 972.3776 KOps/s | 912.5684 KOps/s | |
test_values_nested | 0.1252ms | 62.3406μs | 16.0409 KOps/s | 15.8741 KOps/s | |
test_values_nested_locked | 0.1155ms | 62.1109μs | 16.1002 KOps/s | 15.9515 KOps/s | |
test_values_nested_leaf | 0.1236ms | 70.5429μs | 14.1758 KOps/s | 13.6231 KOps/s | |
test_values_stack_nested | 0.1128ms | 63.3351μs | 15.7890 KOps/s | 15.4835 KOps/s | |
test_values_stack_nested_leaf | 0.1199ms | 70.3727μs | 14.2101 KOps/s | 14.0675 KOps/s | |
test_values_stack_nested_locked | 0.1100ms | 63.2012μs | 15.8225 KOps/s | 15.6124 KOps/s | |
test_membership | 42.7300μs | 0.8522μs | 1.1734 MOps/s | 1.1647 MOps/s | |
test_membership_nested | 0.1128ms | 2.9906μs | 334.3798 KOps/s | 353.1233 KOps/s | |
test_membership_nested_leaf | 39.5540μs | 2.9144μs | 343.1227 KOps/s | 350.8865 KOps/s | |
test_membership_stacked_nested | 16.4710μs | 2.8775μs | 347.5288 KOps/s | 355.3770 KOps/s | |
test_membership_stacked_nested_leaf | 41.2780μs | 2.9157μs | 342.9715 KOps/s | 355.6710 KOps/s | |
test_membership_nested_last | 31.5990μs | 4.3774μs | 228.4451 KOps/s | 232.9084 KOps/s | |
test_membership_nested_leaf_last | 39.6140μs | 4.3769μs | 228.4730 KOps/s | 224.8871 KOps/s | |
test_membership_stacked_nested_last | 27.2420μs | 4.3527μs | 229.7434 KOps/s | 110.7715 KOps/s | |
test_membership_stacked_nested_leaf_last | 29.6350μs | 4.3687μs | 228.9014 KOps/s | 110.9579 KOps/s | |
test_nested_getleaf | 47.6100μs | 10.7119μs | 93.3540 KOps/s | 93.9481 KOps/s | |
test_nested_get | 0.1862ms | 10.1827μs | 98.2059 KOps/s | 99.1510 KOps/s | |
test_stacked_getleaf | 35.7670μs | 10.5035μs | 95.2060 KOps/s | 94.3618 KOps/s | |
test_stacked_get | 36.0280μs | 10.0725μs | 99.2799 KOps/s | 98.7608 KOps/s | |
test_nested_getitemleaf | 55.1640μs | 11.3541μs | 88.0736 KOps/s | 88.8758 KOps/s | |
test_nested_getitem | 37.5310μs | 10.6763μs | 93.6650 KOps/s | 92.7299 KOps/s | |
test_stacked_getitemleaf | 41.7380μs | 11.1099μs | 90.0094 KOps/s | 89.3541 KOps/s | |
test_stacked_getitem | 54.9930μs | 10.6815μs | 93.6194 KOps/s | 93.9636 KOps/s | |
test_lock_nested | 0.6442ms | 0.4073ms | 2.4549 KOps/s | 2.4462 KOps/s | |
test_lock_stack_nested | 0.6709ms | 0.4125ms | 2.4240 KOps/s | 2.4174 KOps/s | |
test_unlock_nested | 0.4276ms | 0.3294ms | 3.0360 KOps/s | 2.9672 KOps/s | |
test_unlock_stack_nested | 0.5217ms | 0.3347ms | 2.9876 KOps/s | 2.9786 KOps/s | |
test_flatten_speed | 0.1703ms | 98.7328μs | 10.1283 KOps/s | 9.7420 KOps/s | |
test_unflatten_speed | 0.9193ms | 0.5181ms | 1.9302 KOps/s | 1.9349 KOps/s | |
test_common_ops | 4.7546ms | 0.7912ms | 1.2638 KOps/s | 1.2356 KOps/s | |
test_creation | 46.9780μs | 2.6902μs | 371.7235 KOps/s | 394.5068 KOps/s | |
test_creation_empty | 41.2480μs | 11.7597μs | 85.0365 KOps/s | 77.6438 KOps/s | |
test_creation_nested_1 | 46.3070μs | 14.5521μs | 68.7185 KOps/s | 62.4425 KOps/s | |
test_creation_nested_2 | 74.4700μs | 19.1495μs | 52.2206 KOps/s | 48.3439 KOps/s | |
test_clone | 0.1020ms | 13.5874μs | 73.5978 KOps/s | 73.1852 KOps/s | |
test_getitem[int] | 0.7758ms | 12.8211μs | 77.9963 KOps/s | 79.9601 KOps/s | |
test_getitem[slice_int] | 0.1324ms | 23.9618μs | 41.7331 KOps/s | 42.5067 KOps/s | |
test_getitem[range] | 0.1598ms | 49.7175μs | 20.1136 KOps/s | 19.9864 KOps/s | |
test_getitem[tuple] | 0.1215ms | 20.1111μs | 49.7237 KOps/s | 50.4114 KOps/s | |
test_getitem[list] | 0.1591ms | 45.7504μs | 21.8578 KOps/s | 21.6441 KOps/s | |
test_setitem_dim[int] | 51.2470μs | 25.6388μs | 39.0034 KOps/s | 39.6802 KOps/s | |
test_setitem_dim[slice_int] | 97.7240μs | 52.2575μs | 19.1360 KOps/s | 19.5637 KOps/s | |
test_setitem_dim[range] | 0.1658ms | 76.7426μs | 13.0306 KOps/s | 12.7591 KOps/s | |
test_setitem_dim[tuple] | 79.1690μs | 41.6185μs | 24.0278 KOps/s | 24.6314 KOps/s | |
test_setitem | 0.1113ms | 20.5314μs | 48.7058 KOps/s | 44.1266 KOps/s | |
test_set | 0.1261ms | 19.9291μs | 50.1778 KOps/s | 49.1116 KOps/s | |
test_set_shared | 4.1007ms | 0.1848ms | 5.4127 KOps/s | 5.5576 KOps/s | |
test_update | 0.1453ms | 23.1352μs | 43.2242 KOps/s | 42.0405 KOps/s | |
test_update_nested | 0.1495ms | 34.0490μs | 29.3695 KOps/s | 29.7640 KOps/s | |
test_update__nested | 0.4420ms | 34.4351μs | 29.0401 KOps/s | 30.0909 KOps/s | |
test_set_nested | 83.7580μs | 22.2912μs | 44.8608 KOps/s | 44.9926 KOps/s | |
test_set_nested_new | 0.1335ms | 27.1632μs | 36.8145 KOps/s | 37.6892 KOps/s | |
test_select | 0.1204ms | 42.3684μs | 23.6025 KOps/s | 23.1762 KOps/s | |
test_select_nested | 0.1255ms | 62.8537μs | 15.9100 KOps/s | 16.0196 KOps/s | |
test_exclude_nested | 0.1356ms | 82.0479μs | 12.1880 KOps/s | 12.2676 KOps/s | |
test_empty[True] | 0.5884ms | 0.4053ms | 2.4672 KOps/s | 2.4744 KOps/s | |
test_empty[False] | 9.3820μs | 1.4258μs | 701.3465 KOps/s | 721.2848 KOps/s | |
test_unbind_speed | 0.4107ms | 0.2682ms | 3.7291 KOps/s | 3.7231 KOps/s | |
test_unbind_speed_stack0 | 0.6100ms | 0.2645ms | 3.7804 KOps/s | 3.8210 KOps/s | |
test_unbind_speed_stack1 | 0.1033s | 0.7177ms | 1.3933 KOps/s | 1.3979 KOps/s | |
test_split | 97.1256ms | 1.7119ms | 584.1535 Ops/s | 518.6044 Ops/s | |
test_chunk | 0.1010s | 1.7185ms | 581.9057 Ops/s | 633.7025 Ops/s | |
test_consolidate_njt[False-None] | 8.2517ms | 8.0410ms | 124.3630 Ops/s | 110.9241 Ops/s | |
test_creation[device0] | 0.2212ms | 92.1591μs | 10.8508 KOps/s | 10.7421 KOps/s | |
test_creation_from_tensor | 0.2273ms | 93.9182μs | 10.6476 KOps/s | 10.5887 KOps/s | |
test_add_one[memmap_tensor0] | 86.3230μs | 4.8856μs | 204.6824 KOps/s | 201.4157 KOps/s | |
test_contiguous[memmap_tensor0] | 7.8840μs | 0.5048μs | 1.9809 MOps/s | 1.9491 MOps/s | |
test_stack[memmap_tensor0] | 50.1840μs | 3.3511μs | 298.4089 KOps/s | 298.4609 KOps/s | |
test_memmaptd_index | 0.3193ms | 0.2319ms | 4.3116 KOps/s | 4.4980 KOps/s | |
test_memmaptd_index_astensor | 0.7355ms | 0.3176ms | 3.1485 KOps/s | 3.2425 KOps/s | |
test_memmaptd_index_op | 0.7790ms | 0.5785ms | 1.7287 KOps/s | 1.7067 KOps/s | |
test_serialize_model | 0.2092s | 0.1293s | 7.7313 Ops/s | 8.5487 Ops/s | |
test_serialize_model_pickle | 0.5023s | 0.4074s | 2.4545 Ops/s | 2.5190 Ops/s | |
test_serialize_weights | 0.1234s | 0.1149s | 8.7030 Ops/s | 8.2882 Ops/s | |
test_serialize_weights_returnearly | 0.1817s | 0.1589s | 6.2923 Ops/s | 6.0984 Ops/s | |
test_serialize_weights_pickle | 1.0098s | 0.7333s | 1.3637 Ops/s | 2.3827 Ops/s | |
test_serialize_weights_filesystem | 0.2361s | 0.1550s | 6.4497 Ops/s | 6.2814 Ops/s | |
test_serialize_model_filesystem | 0.1523s | 0.1419s | 7.0487 Ops/s | 6.5652 Ops/s | |
test_reshape_pytree | 78.7000μs | 26.2062μs | 38.1589 KOps/s | 37.7948 KOps/s | |
test_reshape_td | 81.7040μs | 32.3632μs | 30.8993 KOps/s | 30.3990 KOps/s | |
test_view_pytree | 58.0190μs | 25.9782μs | 38.4938 KOps/s | 38.1041 KOps/s | |
test_view_td | 97.9140μs | 40.2615μs | 24.8377 KOps/s | 24.9388 KOps/s | |
test_unbind_pytree | 72.5560μs | 29.7452μs | 33.6188 KOps/s | 34.2928 KOps/s | |
test_unbind_td | 0.3660ms | 39.8114μs | 25.1184 KOps/s | 25.1713 KOps/s | |
test_split_pytree | 70.4020μs | 28.8539μs | 34.6574 KOps/s | 34.7229 KOps/s | |
test_split_td | 0.5477ms | 44.6268μs | 22.4081 KOps/s | 22.6094 KOps/s | |
test_add_pytree | 0.1011ms | 35.1322μs | 28.4639 KOps/s | 28.0532 KOps/s | |
test_add_td | 0.1710ms | 55.9808μs | 17.8633 KOps/s | 16.8212 KOps/s | |
test_compile_add_one_nested[tensordict-compile] | 0.1306ms | 66.4415μs | 15.0508 KOps/s | 15.1489 KOps/s | |
test_compile_add_one_nested[tensordict-eager] | 1.4046ms | 0.1703ms | 5.8725 KOps/s | 5.8281 KOps/s | |
test_compile_add_one_nested[pytree-compile] | 87.8250μs | 45.6847μs | 21.8892 KOps/s | 21.7597 KOps/s | |
test_compile_add_one_nested[pytree-eager] | 0.2019ms | 0.1177ms | 8.4984 KOps/s | 8.4135 KOps/s | |
test_compile_copy_nested[tensordict-compile] | 82.6350μs | 27.9906μs | 35.7263 KOps/s | 36.2986 KOps/s | |
test_compile_copy_nested[tensordict-eager] | 0.1159ms | 58.7293μs | 17.0273 KOps/s | 17.0511 KOps/s | |
test_compile_copy_nested[pytree-compile] | 0.1709ms | 78.5992μs | 12.7228 KOps/s | 12.5507 KOps/s | |
test_compile_copy_nested[pytree-eager] | 0.1312ms | 66.1031μs | 15.1279 KOps/s | 14.8380 KOps/s | |
test_compile_add_one_flat[tensordict-compile] | 0.1987ms | 0.1083ms | 9.2339 KOps/s | 9.2857 KOps/s | |
test_compile_add_one_flat[tensordict-eager] | 0.4016ms | 0.2139ms | 4.6743 KOps/s | 4.6126 KOps/s | |
test_compile_add_one_flat[tensorclass-compile] | 0.1830ms | 47.0642μs | 21.2476 KOps/s | 21.1639 KOps/s | |
test_compile_add_one_flat[tensorclass-eager] | 0.1410ms | 65.1917μs | 15.3394 KOps/s | 14.9966 KOps/s | |
test_compile_add_one_flat[pytree-compile] | 0.2439ms | 0.1029ms | 9.7227 KOps/s | 9.9226 KOps/s | |
test_compile_add_one_flat[pytree-eager] | 0.4268ms | 0.2029ms | 4.9289 KOps/s | 4.9291 KOps/s | |
test_compile_add_self_flat[tensordict-eager] | 0.4313ms | 0.2320ms | 4.3102 KOps/s | 4.2746 KOps/s | |
test_compile_add_self_flat[tensordict-compile] | 0.4054ms | 0.1151ms | 8.6851 KOps/s | 9.2672 KOps/s | |
test_compile_add_self_flat[tensorclass-eager] | 0.2084ms | 61.5999μs | 16.2338 KOps/s | 16.0743 KOps/s | |
test_compile_add_self_flat[tensorclass-compile] | 0.1627ms | 47.3931μs | 21.1001 KOps/s | 20.6972 KOps/s | |
test_compile_add_self_flat[pytree-eager] | 0.3433ms | 0.1572ms | 6.3595 KOps/s | 6.3959 KOps/s | |
test_compile_add_self_flat[pytree-compile] | 0.4360ms | 0.1020ms | 9.8027 KOps/s | 9.8645 KOps/s | |
test_compile_copy_flat[tensordict-compile] | 73.9690μs | 21.1314μs | 47.3229 KOps/s | 47.7182 KOps/s | |
test_compile_copy_flat[tensordict-eager] | 0.1462ms | 67.8538μs | 14.7376 KOps/s | 14.4543 KOps/s | |
test_compile_copy_flat[pytree-compile] | 0.1526ms | 83.9161μs | 11.9167 KOps/s | 12.2173 KOps/s | |
test_compile_copy_flat[pytree-eager] | 0.1529ms | 68.3169μs | 14.6377 KOps/s | 15.1121 KOps/s | |
test_compile_assign_and_add[tensordict-compile] | 0.3028ms | 0.2196ms | 4.5541 KOps/s | 4.6308 KOps/s | |
test_compile_assign_and_add[tensordict-eager] | 2.6519ms | 1.3622ms | 734.1234 Ops/s | 732.7073 Ops/s | |
test_compile_assign_and_add[pytree-compile] | 0.2839ms | 0.2160ms | 4.6293 KOps/s | 4.6876 KOps/s | |
test_compile_assign_and_add[pytree-eager] | 1.2274ms | 0.8200ms | 1.2196 KOps/s | 1.2271 KOps/s | |
test_compile_assign_and_add_stack[compile] | 0.6458ms | 0.4662ms | 2.1452 KOps/s | 2.1847 KOps/s | |
test_compile_assign_and_add_stack[eager] | 2.9150ms | 2.6282ms | 380.4955 Ops/s | 358.1458 Ops/s | |
test_compile_indexing[tensor-tensordict-compile] | 93.3660μs | 37.8243μs | 26.4381 KOps/s | 25.8872 KOps/s | |
test_compile_indexing[tensor-tensordict-eager] | 0.5544ms | 33.0900μs | 30.2206 KOps/s | 30.5639 KOps/s | |
test_compile_indexing[tensor-tensorclass-compile] | 0.1174ms | 30.7676μs | 32.5017 KOps/s | 32.8333 KOps/s | |
test_compile_indexing[tensor-tensorclass-eager] | 79.2100μs | 23.2979μs | 42.9223 KOps/s | 43.3388 KOps/s | |
test_compile_indexing[tensor-pytree-compile] | 0.1189ms | 31.4903μs | 31.7558 KOps/s | 31.4984 KOps/s | |
test_compile_indexing[tensor-pytree-eager] | 0.1373ms | 24.1744μs | 41.3661 KOps/s | 43.4761 KOps/s | |
test_compile_indexing[slice-tensordict-compile] | 0.1261ms | 54.2853μs | 18.4212 KOps/s | 18.5150 KOps/s | |
test_compile_indexing[slice-tensordict-eager] | 0.3307ms | 20.3382μs | 49.1687 KOps/s | 50.7323 KOps/s | |
test_compile_indexing[slice-tensorclass-compile] | 0.1061ms | 47.1833μs | 21.1939 KOps/s | 21.5121 KOps/s | |
test_compile_indexing[slice-tensorclass-eager] | 63.9500μs | 18.4151μs | 54.3033 KOps/s | 54.4553 KOps/s | |
test_compile_indexing[slice-pytree-compile] | 0.1267ms | 47.1081μs | 21.2278 KOps/s | 20.7580 KOps/s | |
test_compile_indexing[slice-pytree-eager] | 83.8880μs | 18.3140μs | 54.6031 KOps/s | 53.6559 KOps/s | |
test_compile_indexing[int-tensordict-compile] | 0.1055ms | 54.8565μs | 18.2294 KOps/s | 18.2699 KOps/s | |
test_compile_indexing[int-tensordict-eager] | 0.8771ms | 19.8710μs | 50.3246 KOps/s | 51.8446 KOps/s | |
test_compile_indexing[int-tensorclass-compile] | 0.1075ms | 47.0439μs | 21.2567 KOps/s | 21.7539 KOps/s | |
test_compile_indexing[int-tensorclass-eager] | 47.9400μs | 18.2035μs | 54.9345 KOps/s | 55.3269 KOps/s | |
test_compile_indexing[int-pytree-compile] | 0.1358ms | 47.3930μs | 21.1002 KOps/s | 21.8220 KOps/s | |
test_compile_indexing[int-pytree-eager] | 74.7810μs | 18.3269μs | 54.5646 KOps/s | 55.1995 KOps/s | |
test_mod_add[eager] | 80.9430μs | 34.6324μs | 28.8747 KOps/s | 27.7389 KOps/s | |
test_mod_add[compile] | 0.1332ms | 66.4537μs | 15.0481 KOps/s | 15.9242 KOps/s | |
test_mod_add[compile-overhead] | 0.1636ms | 66.5532μs | 15.0256 KOps/s | 15.6896 KOps/s | |
test_mod_wrap[eager] | 0.3912ms | 0.2255ms | 4.4350 KOps/s | 4.5273 KOps/s | |
test_mod_wrap[compile] | 2.3750ms | 0.2343ms | 4.2682 KOps/s | 4.3994 KOps/s | |
test_mod_wrap[compile-overhead] | 0.4083ms | 0.2264ms | 4.4172 KOps/s | 4.4274 KOps/s | |
test_mod_wrap_and_backward[eager] | 19.4238ms | 13.4911ms | 74.1231 Ops/s | 74.9291 Ops/s | |
test_mod_wrap_and_backward[compile] | 13.1499ms | 11.3408ms | 88.1771 Ops/s | 87.7428 Ops/s | |
test_mod_wrap_and_backward[compile-overhead] | 12.0926ms | 11.1710ms | 89.5179 Ops/s | 89.8320 Ops/s | |
test_seq_add[eager] | 0.2210ms | 0.1191ms | 8.3970 KOps/s | 8.4611 KOps/s | |
test_seq_add[compile] | 0.1997ms | 80.4299μs | 12.4332 KOps/s | 13.4032 KOps/s | |
test_seq_add[compile-overhead] | 0.1428ms | 76.9224μs | 13.0001 KOps/s | 13.4634 KOps/s | |
test_seq_wrap[eager] | 0.8222ms | 0.4460ms | 2.2421 KOps/s | 2.2462 KOps/s | |
test_seq_wrap[compile] | 0.4481ms | 0.2487ms | 4.0217 KOps/s | 4.1618 KOps/s | |
test_seq_wrap[compile-overhead] | 0.3313ms | 0.2501ms | 3.9984 KOps/s | 4.2099 KOps/s | |
test_func_call_runtime[False-eager] | 0.7561ms | 0.5420ms | 1.8452 KOps/s | 1.8494 KOps/s | |
test_func_call_runtime[False-compile] | 0.5698ms | 0.4505ms | 2.2198 KOps/s | 2.2500 KOps/s | |
test_func_call_runtime[False-compile-overhead] | 0.8287ms | 0.4512ms | 2.2164 KOps/s | 2.2920 KOps/s | |
test_func_call_runtime[True-eager] | 0.9000ms | 0.7619ms | 1.3124 KOps/s | 1.3582 KOps/s | |
test_func_call_runtime[True-compile] | 0.6256ms | 0.4699ms | 2.1282 KOps/s | 2.1494 KOps/s | |
test_func_call_runtime[True-compile-overhead] | 0.6160ms | 0.4699ms | 2.1283 KOps/s | 2.1657 KOps/s | |
test_func_call_cm_runtime[False-eager] | 0.8265ms | 0.5479ms | 1.8253 KOps/s | 1.8730 KOps/s | |
test_func_call_cm_runtime[False-compile] | 0.8214ms | 0.4521ms | 2.2119 KOps/s | 2.2626 KOps/s | |
test_func_call_cm_runtime[False-compile-overhead] | 0.5781ms | 0.4474ms | 2.2354 KOps/s | 2.2326 KOps/s | |
test_func_call_cm_runtime[True-eager] | 1.5846ms | 0.8975ms | 1.1142 KOps/s | 1.1183 KOps/s | |
test_func_call_cm_runtime[True-compile] | 1.0849ms | 0.7906ms | 1.2649 KOps/s | 1.2566 KOps/s | |
test_func_call_cm_runtime[True-compile-overhead] | 1.4683ms | 0.8007ms | 1.2489 KOps/s | 1.2425 KOps/s | |
test_vmap_func_call_cm_runtime[eager] | 3.1397ms | 1.9428ms | 514.7152 Ops/s | 521.3969 Ops/s | |
test_vmap_func_call_cm_runtime[compile] | 0.6900ms | 0.5462ms | 1.8310 KOps/s | 1.8434 KOps/s | |
test_vmap_func_call_cm_runtime[compile-overhead] | 0.7357ms | 0.5500ms | 1.8181 KOps/s | 1.8362 KOps/s | |
test_distributed | 0.2565ms | 0.1264ms | 7.9120 KOps/s | 7.8942 KOps/s | |
test_tdmodule | 0.1180ms | 27.3747μs | 36.5300 KOps/s | 36.8614 KOps/s | |
test_tdmodule_dispatch | 77.4650μs | 50.4470μs | 19.8228 KOps/s | 20.2867 KOps/s | |
test_tdseq | 50.5050μs | 29.4125μs | 33.9992 KOps/s | 34.3308 KOps/s | |
test_tdseq_dispatch | 94.8680μs | 54.1311μs | 18.4737 KOps/s | 18.0873 KOps/s | |
test_instantiation_functorch | 2.3338ms | 1.5267ms | 655.0289 Ops/s | 654.4930 Ops/s | |
test_exec_functorch | 0.3209ms | 0.1754ms | 5.7015 KOps/s | 5.7577 KOps/s | |
test_exec_functional_call | 0.3480ms | 0.1714ms | 5.8341 KOps/s | 5.9462 KOps/s | |
test_exec_td_decorator | 0.5495ms | 0.2350ms | 4.2547 KOps/s | 4.3809 KOps/s | |
test_vmap_mlp_speed_decorator[True-True] | 0.8612ms | 0.6597ms | 1.5159 KOps/s | 1.4930 KOps/s | |
test_vmap_mlp_speed_decorator[True-False] | 0.8635ms | 0.6591ms | 1.5173 KOps/s | 1.5241 KOps/s | |
test_vmap_mlp_speed_decorator[False-True] | 0.8636ms | 0.5330ms | 1.8762 KOps/s | 1.8856 KOps/s | |
test_vmap_mlp_speed_decorator[False-False] | 0.7554ms | 0.5342ms | 1.8720 KOps/s | 1.8903 KOps/s | |
test_to_module_speed[True] | 1.9207ms | 1.3410ms | 745.6889 Ops/s | 756.7163 Ops/s | |
test_to_module_speed[False] | 1.8672ms | 1.3070ms | 765.1127 Ops/s | 761.4716 Ops/s | |
test_tc_init | 0.1032ms | 48.6976μs | 20.5349 KOps/s | 20.4788 KOps/s | |
test_tc_init_nested | 0.1682ms | 94.2443μs | 10.6107 KOps/s | 10.3317 KOps/s | |
test_tc_first_layer_tensor | 27.3310μs | 1.5374μs | 650.4461 KOps/s | 667.5568 KOps/s | |
test_tc_first_layer_nontensor | 55.2040μs | 4.6578μs | 214.6950 KOps/s | 208.9653 KOps/s | |
test_tc_second_layer_tensor | 29.8450μs | 2.8476μs | 351.1668 KOps/s | 355.4703 KOps/s | |
test_tc_second_layer_nontensor | 53.8090μs | 6.0841μs | 164.3627 KOps/s | 159.6604 KOps/s | |
test_unbind | 0.2461s | 13.8782ms | 72.0552 Ops/s | 61.5748 Ops/s | |
test_full_like | 9.8828ms | 8.9548ms | 111.6722 Ops/s | 132.8169 Ops/s | |
test_zeros_like | 6.1890ms | 3.2957ms | 303.4220 Ops/s | 366.5040 Ops/s | |
test_ones_like | 4.7759ms | 3.5759ms | 279.6519 Ops/s | 301.7850 Ops/s | |
test_clone | 10.9172ms | 6.8794ms | 145.3619 Ops/s | 197.5532 Ops/s | |
test_squeeze | 83.1470μs | 12.7846μs | 78.2190 KOps/s | 78.7710 KOps/s | |
test_unsqueeze | 0.3757ms | 96.2666μs | 10.3878 KOps/s | 10.9264 KOps/s | |
test_split | 0.3958ms | 0.1971ms | 5.0731 KOps/s | 5.2796 KOps/s | |
test_permute | 0.3257ms | 0.2086ms | 4.7948 KOps/s | 4.9780 KOps/s | |
test_stack | 29.9063ms | 24.9307ms | 40.1112 Ops/s | 39.0256 Ops/s | |
test_cat | 29.7383ms | 25.2118ms | 39.6640 Ops/s | 38.3236 Ops/s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack (oldest at bottom):