Skip to content

[Benchmark] Add Redis benchmarks, optimize reads with covering-range strategy#1570

Open
vmoens wants to merge 1 commit intogh/vmoens/56/basefrom
gh/vmoens/56/head
Open

[Benchmark] Add Redis benchmarks, optimize reads with covering-range strategy#1570
vmoens wants to merge 1 commit intogh/vmoens/56/basefrom
gh/vmoens/56/head

Conversation

@vmoens
Copy link
Collaborator

@vmoens vmoens commented Feb 14, 2026

Stack from ghstack (oldest at bottom):

Add benchmarks/storage/bench_redis.py comparing RedisTensorDict against
local TensorDict for get/set, key iteration, indexed read/write (int,
slice, step-slice, fancy, bool mask), and td[idx].to_tensordict().

Performance improvements:

  • Fix _tensor_to_bytes: replace bytes(untyped_storage()) with
    tensor.numpy().tobytes() (~8000x faster serialization).
  • Override _index_tensordict with _abatch_index: batch all leaf key
    fetches into a single pipeline instead of one round-trip per key.
  • Covering-range strategy (_compute_covering_range): every index type
    (int, slice, step-slice, tensor, bool mask) emits at most ONE
    GETRANGE per key. For non-contiguous indices, the covering byte range
    is fetched and a local post-index extracts the requested rows.
  • Coalesce contiguous byte ranges for step-1 slices.
  • Partial covering-range RMW for writes: step/fancy/bool writes fetch
    only the covering range, patch locally, write back (2 cmds/key
    instead of N SETRANGEs).

[ghstack-poisoned]
@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 243. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.4500μs 14.9323μs 66.9691 KOps/s 66.5021 KOps/s $\color{#35bf28}+0.70\%$
test_plain_set_stack_nested 48.4700μs 15.3129μs 65.3045 KOps/s 65.5666 KOps/s $\color{#d91a1a}-0.40\%$
test_plain_set_nested_inplace 63.7010μs 16.3156μs 61.2910 KOps/s 59.5171 KOps/s $\color{#35bf28}+2.98\%$
test_plain_set_stack_nested_inplace 48.3410μs 16.6166μs 60.1807 KOps/s 60.1078 KOps/s $\color{#35bf28}+0.12\%$
test_items 38.7510μs 5.8163μs 171.9320 KOps/s 166.7211 KOps/s $\color{#35bf28}+3.13\%$
test_items_nested 0.5936ms 0.5388ms 1.8559 KOps/s 1.8530 KOps/s $\color{#35bf28}+0.16\%$
test_items_nested_locked 0.6007ms 0.5443ms 1.8371 KOps/s 1.8143 KOps/s $\color{#35bf28}+1.26\%$
test_items_nested_leaf 0.1319ms 97.1817μs 10.2900 KOps/s 10.3392 KOps/s $\color{#d91a1a}-0.48\%$
test_items_stack_nested 0.6284ms 0.5377ms 1.8598 KOps/s 1.8505 KOps/s $\color{#35bf28}+0.50\%$
test_items_stack_nested_leaf 0.1247ms 95.9722μs 10.4197 KOps/s 10.4725 KOps/s $\color{#d91a1a}-0.50\%$
test_items_stack_nested_locked 0.5938ms 0.5395ms 1.8536 KOps/s 1.8190 KOps/s $\color{#35bf28}+1.90\%$
test_keys 30.5010μs 4.3737μs 228.6419 KOps/s 237.3241 KOps/s $\color{#d91a1a}-3.66\%$
test_keys_nested 0.1716ms 0.1204ms 8.3036 KOps/s 8.3346 KOps/s $\color{#d91a1a}-0.37\%$
test_keys_nested_locked 88.8248ms 0.1415ms 7.0651 KOps/s 7.7579 KOps/s $\textbf{\color{#d91a1a}-8.93\%}$
test_keys_nested_leaf 0.1511ms 0.1101ms 9.0792 KOps/s 9.0088 KOps/s $\color{#35bf28}+0.78\%$
test_keys_stack_nested 0.1608ms 0.1196ms 8.3617 KOps/s 8.3814 KOps/s $\color{#d91a1a}-0.24\%$
test_keys_stack_nested_leaf 0.1504ms 0.1102ms 9.0727 KOps/s 9.0689 KOps/s $\color{#35bf28}+0.04\%$
test_keys_stack_nested_locked 0.1703ms 0.1295ms 7.7216 KOps/s 7.7914 KOps/s $\color{#d91a1a}-0.90\%$
test_values 8.6680μs 1.0208μs 979.6124 KOps/s 797.8923 KOps/s $\textbf{\color{#35bf28}+22.78\%}$
test_values_nested 78.3520μs 47.8889μs 20.8817 KOps/s 20.8172 KOps/s $\color{#35bf28}+0.31\%$
test_values_nested_locked 0.2583ms 51.0801μs 19.5771 KOps/s 19.3353 KOps/s $\color{#35bf28}+1.25\%$
test_values_nested_leaf 75.7610μs 54.6445μs 18.3001 KOps/s 17.9812 KOps/s $\color{#35bf28}+1.77\%$
test_values_stack_nested 80.8120μs 47.9406μs 20.8591 KOps/s 20.8044 KOps/s $\color{#35bf28}+0.26\%$
test_values_stack_nested_leaf 85.4610μs 54.5549μs 18.3302 KOps/s 18.2961 KOps/s $\color{#35bf28}+0.19\%$
test_values_stack_nested_locked 91.8320μs 51.4988μs 19.4179 KOps/s 19.4136 KOps/s $\color{#35bf28}+0.02\%$
test_membership 4.7035μs 0.8612μs 1.1611 MOps/s 1.1428 MOps/s $\color{#35bf28}+1.60\%$
test_membership_nested 31.4100μs 3.2061μs 311.9040 KOps/s 314.5865 KOps/s $\color{#d91a1a}-0.85\%$
test_membership_nested_leaf 37.6310μs 3.1951μs 312.9747 KOps/s 312.8918 KOps/s $\color{#35bf28}+0.03\%$
test_membership_stacked_nested 36.8800μs 3.2193μs 310.6278 KOps/s 312.0186 KOps/s $\color{#d91a1a}-0.45\%$
test_membership_stacked_nested_leaf 31.3310μs 3.2049μs 312.0195 KOps/s 314.4032 KOps/s $\color{#d91a1a}-0.76\%$
test_membership_nested_last 71.1120μs 4.6229μs 216.3132 KOps/s 215.5442 KOps/s $\color{#35bf28}+0.36\%$
test_membership_nested_leaf_last 76.4820μs 4.6239μs 216.2654 KOps/s 215.6237 KOps/s $\color{#35bf28}+0.30\%$
test_membership_stacked_nested_last 41.7510μs 4.6388μs 215.5721 KOps/s 215.3246 KOps/s $\color{#35bf28}+0.11\%$
test_membership_stacked_nested_leaf_last 29.6600μs 4.6625μs 214.4782 KOps/s 214.5184 KOps/s $\color{#d91a1a}-0.02\%$
test_nested_getleaf 50.2910μs 21.3825μs 46.7673 KOps/s 46.2434 KOps/s $\color{#35bf28}+1.13\%$
test_nested_get 49.2610μs 20.1930μs 49.5220 KOps/s 47.9664 KOps/s $\color{#35bf28}+3.24\%$
test_stacked_getleaf 55.1910μs 21.7586μs 45.9589 KOps/s 45.8879 KOps/s $\color{#35bf28}+0.15\%$
test_stacked_get 44.7410μs 20.4950μs 48.7923 KOps/s 48.5729 KOps/s $\color{#35bf28}+0.45\%$
test_nested_getitemleaf 48.1910μs 21.7503μs 45.9764 KOps/s 44.1753 KOps/s $\color{#35bf28}+4.08\%$
test_nested_getitem 0.2606ms 20.4035μs 49.0112 KOps/s 47.6981 KOps/s $\color{#35bf28}+2.75\%$
test_stacked_getitemleaf 50.6710μs 21.6768μs 46.1323 KOps/s 45.0003 KOps/s $\color{#35bf28}+2.52\%$
test_stacked_getitem 47.1800μs 20.7267μs 48.2470 KOps/s 46.2045 KOps/s $\color{#35bf28}+4.42\%$
test_lock_nested 7.7380ms 0.4806ms 2.0805 KOps/s 2.0862 KOps/s $\color{#d91a1a}-0.27\%$
test_lock_stack_nested 0.5318ms 0.4778ms 2.0928 KOps/s 2.0615 KOps/s $\color{#35bf28}+1.52\%$
test_unlock_nested 0.4985ms 0.3832ms 2.6093 KOps/s 2.6035 KOps/s $\color{#35bf28}+0.22\%$
test_unlock_stack_nested 0.4322ms 0.3823ms 2.6161 KOps/s 2.5501 KOps/s $\color{#35bf28}+2.59\%$
test_flatten_speed 0.1735ms 0.1222ms 8.1846 KOps/s 8.1060 KOps/s $\color{#35bf28}+0.97\%$
test_unflatten_speed 0.6469ms 0.5899ms 1.6953 KOps/s 1.6773 KOps/s $\color{#35bf28}+1.08\%$
test_common_ops 0.8179ms 0.6761ms 1.4790 KOps/s 1.4442 KOps/s $\color{#35bf28}+2.41\%$
test_creation 0.1267ms 2.9003μs 344.7875 KOps/s 343.1075 KOps/s $\color{#35bf28}+0.49\%$
test_creation_empty 39.2810μs 6.1510μs 162.5749 KOps/s 162.1115 KOps/s $\color{#35bf28}+0.29\%$
test_creation_nested_1 50.8110μs 10.9102μs 91.6575 KOps/s 91.5878 KOps/s $\color{#35bf28}+0.08\%$
test_creation_nested_2 34.6200μs 11.8656μs 84.2775 KOps/s 84.2981 KOps/s $\color{#d91a1a}-0.02\%$
test_creation_many_keys[10] 45.2310μs 18.2938μs 54.6633 KOps/s 54.4250 KOps/s $\color{#35bf28}+0.44\%$
test_creation_many_keys[50] 0.1141ms 78.4317μs 12.7499 KOps/s 12.7235 KOps/s $\color{#35bf28}+0.21\%$
test_creation_many_keys[100] 0.2061ms 0.1540ms 6.4948 KOps/s 6.5051 KOps/s $\color{#d91a1a}-0.16\%$
test_creation_nested_many_keys[10] 64.7220μs 39.3274μs 25.4276 KOps/s 25.2352 KOps/s $\color{#35bf28}+0.76\%$
test_creation_nested_many_keys[50] 0.1939ms 0.1602ms 6.2406 KOps/s 6.1945 KOps/s $\color{#35bf28}+0.75\%$
test_clone 41.8600μs 13.2048μs 75.7300 KOps/s 75.4836 KOps/s $\color{#35bf28}+0.33\%$
test_getitem[int] 1.6909ms 14.6692μs 68.1700 KOps/s 55.3323 KOps/s $\textbf{\color{#35bf28}+23.20\%}$
test_getitem[slice_int] 0.1458ms 25.2556μs 39.5951 KOps/s 39.7305 KOps/s $\color{#d91a1a}-0.34\%$
test_getitem[range] 0.2031ms 60.4210μs 16.5505 KOps/s 16.0159 KOps/s $\color{#35bf28}+3.34\%$
test_getitem[tuple] 0.1483ms 24.3137μs 41.1291 KOps/s 41.2246 KOps/s $\color{#d91a1a}-0.23\%$
test_getitem[list] 0.1822ms 56.8754μs 17.5823 KOps/s 17.3770 KOps/s $\color{#35bf28}+1.18\%$
test_setitem_dim[int] 47.3910μs 26.0681μs 38.3611 KOps/s 39.0930 KOps/s $\color{#d91a1a}-1.87\%$
test_setitem_dim[slice_int] 79.3220μs 43.8630μs 22.7982 KOps/s 22.2742 KOps/s $\color{#35bf28}+2.35\%$
test_setitem_dim[range] 0.1223ms 92.5019μs 10.8106 KOps/s 10.6523 KOps/s $\color{#35bf28}+1.49\%$
test_setitem_dim[tuple] 63.4920μs 40.6888μs 24.5768 KOps/s 23.8971 KOps/s $\color{#35bf28}+2.84\%$
test_setitem 49.5910μs 17.9933μs 55.5763 KOps/s 55.8306 KOps/s $\color{#d91a1a}-0.46\%$
test_set 61.0210μs 17.0583μs 58.6225 KOps/s 59.0393 KOps/s $\color{#d91a1a}-0.71\%$
test_set_shared 0.6237ms 0.2041ms 4.8995 KOps/s 4.7542 KOps/s $\color{#35bf28}+3.06\%$
test_update 0.4200ms 22.0265μs 45.3998 KOps/s 45.4881 KOps/s $\color{#d91a1a}-0.19\%$
test_update_nested 74.1910μs 34.5186μs 28.9699 KOps/s 29.1625 KOps/s $\color{#d91a1a}-0.66\%$
test_update__nested 0.4462ms 35.7117μs 28.0020 KOps/s 28.7851 KOps/s $\color{#d91a1a}-2.72\%$
test_set_nested 55.7110μs 19.7352μs 50.6708 KOps/s 52.3334 KOps/s $\color{#d91a1a}-3.18\%$
test_set_nested_new 57.1710μs 24.2829μs 41.1812 KOps/s 41.3441 KOps/s $\color{#d91a1a}-0.39\%$
test_select 74.8520μs 41.7139μs 23.9728 KOps/s 23.4676 KOps/s $\color{#35bf28}+2.15\%$
test_select_nested 0.1103ms 74.5841μs 13.4077 KOps/s 13.1392 KOps/s $\color{#35bf28}+2.04\%$
test_exclude_nested 0.1274ms 97.9707μs 10.2071 KOps/s 10.1290 KOps/s $\color{#35bf28}+0.77\%$
test_empty[True] 0.4894ms 0.4397ms 2.2745 KOps/s 2.2425 KOps/s $\color{#35bf28}+1.43\%$
test_empty[False] 8.2027μs 1.3282μs 752.9101 KOps/s 752.6957 KOps/s $\color{#35bf28}+0.03\%$
test_to 0.1039ms 72.6077μs 13.7726 KOps/s 13.7468 KOps/s $\color{#35bf28}+0.19\%$
test_to_nonblocking 0.1094ms 64.1692μs 15.5838 KOps/s 15.4039 KOps/s $\color{#35bf28}+1.17\%$
test_unbind_speed 0.3822ms 0.3306ms 3.0250 KOps/s 3.0425 KOps/s $\color{#d91a1a}-0.57\%$
test_unbind_speed_stack0 0.4003ms 0.3257ms 3.0702 KOps/s 3.0581 KOps/s $\color{#35bf28}+0.40\%$
test_unbind_speed_stack1 0.1035s 0.9129ms 1.0954 KOps/s 1.1915 KOps/s $\textbf{\color{#d91a1a}-8.07\%}$
test_split 1.3371ms 1.1403ms 876.9843 Ops/s 881.8171 Ops/s $\color{#d91a1a}-0.55\%$
test_chunk 0.1030s 1.2101ms 826.3952 Ops/s 914.2983 Ops/s $\textbf{\color{#d91a1a}-9.61\%}$
test_to_cpu_blocking 19.4335ms 19.2501ms 51.9479 Ops/s 35.3424 Ops/s $\textbf{\color{#35bf28}+46.98\%}$
test_to_cpu_global_sync 11.3527ms 11.0979ms 90.1073 Ops/s 89.0897 Ops/s $\color{#35bf28}+1.14\%$
test_to_cpu_event_sync 12.2321ms 12.0505ms 82.9843 Ops/s 81.9194 Ops/s $\color{#35bf28}+1.30\%$
test_to_cpu_default 0.1145s 13.3834ms 74.7192 Ops/s 73.8414 Ops/s $\color{#35bf28}+1.19\%$
test_consolidate[False-None] 4.1522ms 4.0756ms 245.3641 Ops/s 245.2643 Ops/s $\color{#35bf28}+0.04\%$
test_consolidate[default-None] 2.0980ms 2.0188ms 495.3341 Ops/s 492.0250 Ops/s $\color{#35bf28}+0.67\%$
test_consolidate[reduce-overhead-None] 1.9968ms 1.9339ms 517.0985 Ops/s 507.2162 Ops/s $\color{#35bf28}+1.95\%$
test_consolidate_njt[False-None] 8.6688ms 8.4112ms 118.8884 Ops/s 118.2557 Ops/s $\color{#35bf28}+0.54\%$
test_to[False-False-None] 2.1377ms 2.0497ms 487.8747 Ops/s 480.3928 Ops/s $\color{#35bf28}+1.56\%$
test_to[True-False-None] 2.1627ms 1.8910ms 528.8312 Ops/s 524.3152 Ops/s $\color{#35bf28}+0.86\%$
test_to[within-False-None] 6.3644ms 6.0603ms 165.0075 Ops/s 164.4546 Ops/s $\color{#35bf28}+0.34\%$
test_to[True-default-None] 7.6409ms 7.4984ms 133.3610 Ops/s 128.8316 Ops/s $\color{#35bf28}+3.52\%$
test_to_njt[False-False-None] 8.8843ms 8.5521ms 116.9300 Ops/s 116.1738 Ops/s $\color{#35bf28}+0.65\%$
test_to_njt[True-False-None] 7.3268ms 7.0017ms 142.8231 Ops/s 143.1244 Ops/s $\color{#d91a1a}-0.21\%$
test_to_njt[within-False-None] 16.0263ms 15.4576ms 64.6933 Ops/s 63.2631 Ops/s $\color{#35bf28}+2.26\%$
test_creation[device0] 0.4232ms 0.1161ms 8.6147 KOps/s 8.5626 KOps/s $\color{#35bf28}+0.61\%$
test_creation_from_tensor 0.4165ms 0.1139ms 8.7758 KOps/s 8.8228 KOps/s $\color{#d91a1a}-0.53\%$
test_add_one[memmap_tensor0] 0.3560ms 6.2059μs 161.1366 KOps/s 157.0581 KOps/s $\color{#35bf28}+2.60\%$
test_contiguous[memmap_tensor0] 13.9800μs 0.6694μs 1.4939 MOps/s 2.1464 MOps/s $\textbf{\color{#d91a1a}-30.40\%}$
test_stack[memmap_tensor0] 34.0810μs 4.6051μs 217.1524 KOps/s 221.1890 KOps/s $\color{#d91a1a}-1.82\%$
test_memmaptd_index 0.9954ms 0.2585ms 3.8692 KOps/s 3.8959 KOps/s $\color{#d91a1a}-0.69\%$
test_memmaptd_index_astensor 0.5107ms 0.3500ms 2.8574 KOps/s 2.8352 KOps/s $\color{#35bf28}+0.78\%$
test_memmaptd_index_op 0.7602ms 0.5851ms 1.7091 KOps/s 1.7013 KOps/s $\color{#35bf28}+0.46\%$
test_serialize_model 0.1401s 0.1368s 7.3109 Ops/s 7.3523 Ops/s $\color{#d91a1a}-0.56\%$
test_serialize_model_pickle 1.8945s 1.3068s 0.7652 Ops/s 0.8384 Ops/s $\textbf{\color{#d91a1a}-8.74\%}$
test_serialize_weights 0.1397s 0.1365s 7.3252 Ops/s 7.3463 Ops/s $\color{#d91a1a}-0.29\%$
test_serialize_weights_returnearly 0.4431s 93.1370ms 10.7369 Ops/s 10.5289 Ops/s $\color{#35bf28}+1.97\%$
test_serialize_weights_pickle 1.3796s 1.2030s 0.8312 Ops/s 0.8206 Ops/s $\color{#35bf28}+1.30\%$
test_reshape_pytree 0.2133ms 33.2747μs 30.0529 KOps/s 30.1810 KOps/s $\color{#d91a1a}-0.42\%$
test_reshape_td 78.3320μs 44.6103μs 22.4164 KOps/s 22.9459 KOps/s $\color{#d91a1a}-2.31\%$
test_view_pytree 0.2315ms 33.3214μs 30.0108 KOps/s 30.6489 KOps/s $\color{#d91a1a}-2.08\%$
test_view_td 87.4920μs 52.1642μs 19.1702 KOps/s 19.8178 KOps/s $\color{#d91a1a}-3.27\%$
test_unbind_pytree 0.2455ms 36.8992μs 27.1008 KOps/s 27.1588 KOps/s $\color{#d91a1a}-0.21\%$
test_unbind_td 0.1265ms 48.5871μs 20.5816 KOps/s 20.2761 KOps/s $\color{#35bf28}+1.51\%$
test_split_pytree 0.1978ms 42.9633μs 23.2757 KOps/s 23.4172 KOps/s $\color{#d91a1a}-0.60\%$
test_split_td 0.2204ms 64.7998μs 15.4321 KOps/s 15.2513 KOps/s $\color{#35bf28}+1.19\%$
test_add_pytree 0.1948ms 42.5273μs 23.5143 KOps/s 23.7906 KOps/s $\color{#d91a1a}-1.16\%$
test_add_td 96.2720μs 52.8003μs 18.9393 KOps/s 18.8613 KOps/s $\color{#35bf28}+0.41\%$
test_compile_add_one_nested[tensordict-compile] 0.2686ms 0.1428ms 7.0048 KOps/s 6.9019 KOps/s $\color{#35bf28}+1.49\%$
test_compile_add_one_nested[tensordict-eager] 0.4455ms 0.1934ms 5.1708 KOps/s 5.2643 KOps/s $\color{#d91a1a}-1.78\%$
test_compile_add_one_nested[pytree-compile] 0.1590ms 0.1073ms 9.3208 KOps/s 8.7484 KOps/s $\textbf{\color{#35bf28}+6.54\%}$
test_compile_add_one_nested[pytree-eager] 0.4328ms 0.1795ms 5.5722 KOps/s 5.5702 KOps/s $\color{#35bf28}+0.04\%$
test_compile_copy_nested[tensordict-compile] 0.1137ms 27.9265μs 35.8083 KOps/s 31.6740 KOps/s $\textbf{\color{#35bf28}+13.05\%}$
test_compile_copy_nested[tensordict-eager] 95.3020μs 52.7892μs 18.9433 KOps/s 19.3872 KOps/s $\color{#d91a1a}-2.29\%$
test_compile_copy_nested[pytree-compile] 0.1008ms 9.7876μs 102.1697 KOps/s 102.7640 KOps/s $\color{#d91a1a}-0.58\%$
test_compile_copy_nested[pytree-eager] 0.4662ms 70.0779μs 14.2698 KOps/s 14.3808 KOps/s $\color{#d91a1a}-0.77\%$
test_compile_add_one_flat[tensordict-compile] 0.3769ms 0.1743ms 5.7363 KOps/s 5.4380 KOps/s $\textbf{\color{#35bf28}+5.49\%}$
test_compile_add_one_flat[tensordict-eager] 0.2980ms 0.2564ms 3.9003 KOps/s 3.8841 KOps/s $\color{#35bf28}+0.42\%$
test_compile_add_one_flat[tensorclass-compile] 0.2104ms 0.1150ms 8.6965 KOps/s 8.3867 KOps/s $\color{#35bf28}+3.69\%$
test_compile_add_one_flat[tensorclass-eager] 0.1164ms 69.5807μs 14.3718 KOps/s 14.3381 KOps/s $\color{#35bf28}+0.24\%$
test_compile_add_one_flat[pytree-compile] 0.3836ms 0.1573ms 6.3560 KOps/s 6.1880 KOps/s $\color{#35bf28}+2.72\%$
test_compile_add_one_flat[pytree-eager] 0.8606ms 0.5218ms 1.9164 KOps/s 1.7907 KOps/s $\textbf{\color{#35bf28}+7.02\%}$
test_compile_add_self_flat[tensordict-eager] 0.4829ms 0.3070ms 3.2570 KOps/s 3.2046 KOps/s $\color{#35bf28}+1.63\%$
test_compile_add_self_flat[tensordict-compile] 0.2598ms 0.1774ms 5.6371 KOps/s 5.1380 KOps/s $\textbf{\color{#35bf28}+9.71\%}$
test_compile_add_self_flat[tensorclass-eager] 0.1438ms 85.2890μs 11.7248 KOps/s 11.5970 KOps/s $\color{#35bf28}+1.10\%$
test_compile_add_self_flat[tensorclass-compile] 0.2905ms 0.1173ms 8.5240 KOps/s 7.7116 KOps/s $\textbf{\color{#35bf28}+10.53\%}$
test_compile_add_self_flat[pytree-eager] 0.6592ms 0.4279ms 2.3369 KOps/s 2.1403 KOps/s $\textbf{\color{#35bf28}+9.18\%}$
test_compile_add_self_flat[pytree-compile] 0.3981ms 0.1597ms 6.2611 KOps/s 6.0735 KOps/s $\color{#35bf28}+3.09\%$
test_compile_copy_flat[tensordict-compile] 57.4810μs 23.7654μs 42.0779 KOps/s 37.8157 KOps/s $\textbf{\color{#35bf28}+11.27\%}$
test_compile_copy_flat[tensordict-eager] 75.7120μs 42.0428μs 23.7853 KOps/s 24.2950 KOps/s $\color{#d91a1a}-2.10\%$
test_compile_copy_flat[pytree-compile] 43.1610μs 10.8885μs 91.8400 KOps/s 92.4187 KOps/s $\color{#d91a1a}-0.63\%$
test_compile_copy_flat[pytree-eager] 0.3805ms 51.9754μs 19.2399 KOps/s 19.0988 KOps/s $\color{#35bf28}+0.74\%$
test_compile_assign_and_add[tensordict-compile] 2.0099ms 0.1721ms 5.8108 KOps/s 5.3417 KOps/s $\textbf{\color{#35bf28}+8.78\%}$
test_compile_assign_and_add[tensordict-eager] 3.3726ms 3.2654ms 306.2399 Ops/s 303.8777 Ops/s $\color{#35bf28}+0.78\%$
test_compile_assign_and_add[pytree-compile] 1.9793ms 0.1603ms 6.2381 KOps/s 5.8950 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_compile_assign_and_add[pytree-eager] 2.9126ms 2.7707ms 360.9199 Ops/s 354.8662 Ops/s $\color{#35bf28}+1.71\%$
test_compile_indexing[tensor-tensordict-compile] 0.2192ms 0.1072ms 9.3293 KOps/s 8.8389 KOps/s $\textbf{\color{#35bf28}+5.55\%}$
test_compile_indexing[tensor-tensordict-eager] 0.3133ms 71.7209μs 13.9429 KOps/s 13.5628 KOps/s $\color{#35bf28}+2.80\%$
test_compile_indexing[tensor-tensorclass-compile] 0.1410ms 95.1742μs 10.5070 KOps/s 10.2563 KOps/s $\color{#35bf28}+2.44\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2532ms 45.1353μs 22.1556 KOps/s 22.1327 KOps/s $\color{#35bf28}+0.10\%$
test_compile_indexing[tensor-pytree-compile] 0.1603ms 96.5130μs 10.3613 KOps/s 10.1741 KOps/s $\color{#35bf28}+1.84\%$
test_compile_indexing[tensor-pytree-eager] 0.2803ms 45.3408μs 22.0552 KOps/s 22.0280 KOps/s $\color{#35bf28}+0.12\%$
test_compile_indexing[slice-tensordict-compile] 0.1590ms 56.3565μs 17.7442 KOps/s 17.1170 KOps/s $\color{#35bf28}+3.66\%$
test_compile_indexing[slice-tensordict-eager] 0.2279ms 28.2254μs 35.4290 KOps/s 35.3540 KOps/s $\color{#35bf28}+0.21\%$
test_compile_indexing[slice-tensorclass-compile] 0.1520ms 44.3779μs 22.5338 KOps/s 21.9453 KOps/s $\color{#35bf28}+2.68\%$
test_compile_indexing[slice-tensorclass-eager] 0.2591ms 22.9411μs 43.5899 KOps/s 44.0569 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_indexing[slice-pytree-compile] 0.1013ms 44.7216μs 22.3606 KOps/s 21.5984 KOps/s $\color{#35bf28}+3.53\%$
test_compile_indexing[slice-pytree-eager] 0.2636ms 22.7773μs 43.9033 KOps/s 44.4200 KOps/s $\color{#d91a1a}-1.16\%$
test_compile_indexing[int-tensordict-compile] 0.1058ms 56.2180μs 17.7879 KOps/s 17.0155 KOps/s $\color{#35bf28}+4.54\%$
test_compile_indexing[int-tensordict-eager] 0.2458ms 27.4760μs 36.3955 KOps/s 35.5450 KOps/s $\color{#35bf28}+2.39\%$
test_compile_indexing[int-tensorclass-compile] 82.0820μs 44.6386μs 22.4022 KOps/s 21.8191 KOps/s $\color{#35bf28}+2.67\%$
test_compile_indexing[int-tensorclass-eager] 0.2657ms 22.6440μs 44.1618 KOps/s 44.2512 KOps/s $\color{#d91a1a}-0.20\%$
test_compile_indexing[int-pytree-compile] 86.5220μs 44.1136μs 22.6687 KOps/s 21.3548 KOps/s $\textbf{\color{#35bf28}+6.15\%}$
test_compile_indexing[int-pytree-eager] 0.2717ms 22.6015μs 44.2448 KOps/s 43.8162 KOps/s $\color{#35bf28}+0.98\%$
test_mod_add[eager] 0.1102ms 50.2669μs 19.8938 KOps/s 19.6414 KOps/s $\color{#35bf28}+1.29\%$
test_mod_add[compile] 0.5504ms 0.1073ms 9.3201 KOps/s 9.3424 KOps/s $\color{#d91a1a}-0.24\%$
test_mod_add[compile-overhead] 0.2473ms 0.1466ms 6.8232 KOps/s 6.3563 KOps/s $\textbf{\color{#35bf28}+7.34\%}$
test_mod_wrap[eager] 0.3810ms 0.3058ms 3.2704 KOps/s 3.2882 KOps/s $\color{#d91a1a}-0.54\%$
test_mod_wrap[compile] 0.4857ms 0.3599ms 2.7787 KOps/s 2.8184 KOps/s $\color{#d91a1a}-1.41\%$
test_mod_wrap[compile-overhead] 7.4146ms 4.0998ms 243.9163 Ops/s 247.7261 Ops/s $\color{#d91a1a}-1.54\%$
test_mod_wrap_and_backward[eager] 1.5993ms 1.4847ms 673.5327 Ops/s 664.6137 Ops/s $\color{#35bf28}+1.34\%$
test_mod_wrap_and_backward[compile] 1.5511ms 1.4404ms 694.2566 Ops/s 636.4033 Ops/s $\textbf{\color{#35bf28}+9.09\%}$
test_mod_wrap_and_backward[compile-overhead] 1.2747ms 0.8853ms 1.1296 KOps/s 986.2120 Ops/s $\textbf{\color{#35bf28}+14.54\%}$
test_seq_add[eager] 0.2102ms 0.1559ms 6.4162 KOps/s 6.4238 KOps/s $\color{#d91a1a}-0.12\%$
test_seq_add[compile] 0.2521ms 0.1139ms 8.7768 KOps/s 8.1254 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_seq_add[compile-overhead] 0.2164ms 0.1523ms 6.5650 KOps/s 6.2783 KOps/s $\color{#35bf28}+4.57\%$
test_seq_wrap[eager] 0.6012ms 0.5243ms 1.9074 KOps/s 1.8196 KOps/s $\color{#35bf28}+4.83\%$
test_seq_wrap[compile] 0.4185ms 0.3667ms 2.7269 KOps/s 2.6431 KOps/s $\color{#35bf28}+3.17\%$
test_seq_wrap[compile-overhead] 0.3276ms 0.2626ms 3.8088 KOps/s 3.7508 KOps/s $\color{#35bf28}+1.55\%$
test_func_call_runtime[False-eager] 0.9240ms 0.8268ms 1.2094 KOps/s 1.1840 KOps/s $\color{#35bf28}+2.15\%$
test_func_call_runtime[False-compile] 1.0874ms 0.9100ms 1.0990 KOps/s 1.0735 KOps/s $\color{#35bf28}+2.37\%$
test_func_call_runtime[False-compile-overhead] 0.5387ms 0.4605ms 2.1716 KOps/s 2.1521 KOps/s $\color{#35bf28}+0.91\%$
test_func_call_runtime[True-eager] 1.1221ms 1.0767ms 928.7687 Ops/s 914.8752 Ops/s $\color{#35bf28}+1.52\%$
test_func_call_runtime[True-compile] 1.0222ms 0.9560ms 1.0460 KOps/s 1.0702 KOps/s $\color{#d91a1a}-2.26\%$
test_func_call_runtime[True-compile-overhead] 0.6044ms 0.4755ms 2.1030 KOps/s 2.0748 KOps/s $\color{#35bf28}+1.36\%$
test_func_call_cm_runtime[False-eager] 0.9421ms 0.8871ms 1.1272 KOps/s 1.1780 KOps/s $\color{#d91a1a}-4.31\%$
test_func_call_cm_runtime[False-compile] 1.0072ms 0.9486ms 1.0542 KOps/s 1.0777 KOps/s $\color{#d91a1a}-2.19\%$
test_func_call_cm_runtime[False-compile-overhead] 0.5611ms 0.4687ms 2.1335 KOps/s 2.1408 KOps/s $\color{#d91a1a}-0.34\%$
test_func_call_cm_runtime[True-eager] 1.5330ms 1.2562ms 796.0406 Ops/s 805.9934 Ops/s $\color{#d91a1a}-1.23\%$
test_func_call_cm_runtime[True-compile] 1.2259ms 0.9992ms 1.0008 KOps/s 1.0372 KOps/s $\color{#d91a1a}-3.51\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5852ms 0.5047ms 1.9814 KOps/s 1.9470 KOps/s $\color{#35bf28}+1.77\%$
test_vmap_func_call_cm_runtime[eager] 2.8282ms 2.3471ms 426.0607 Ops/s 422.1228 Ops/s $\color{#35bf28}+0.93\%$
test_vmap_func_call_cm_runtime[compile] 1.0975ms 0.9693ms 1.0316 KOps/s 1.0197 KOps/s $\color{#35bf28}+1.17\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5638ms 0.5084ms 1.9669 KOps/s 1.9302 KOps/s $\color{#35bf28}+1.90\%$
test_distributed 2.9086ms 0.1673ms 5.9765 KOps/s 5.5932 KOps/s $\textbf{\color{#35bf28}+6.85\%}$
test_tdmodule 0.5023ms 28.6249μs 34.9347 KOps/s 34.8051 KOps/s $\color{#35bf28}+0.37\%$
test_tdmodule_dispatch 77.5510μs 46.7294μs 21.3998 KOps/s 21.2931 KOps/s $\color{#35bf28}+0.50\%$
test_tdseq 46.5910μs 27.5888μs 36.2466 KOps/s 35.8201 KOps/s $\color{#35bf28}+1.19\%$
test_tdseq_dispatch 69.3410μs 49.1759μs 20.3352 KOps/s 20.4026 KOps/s $\color{#d91a1a}-0.33\%$
test_instantiation_functorch 2.1514ms 2.0562ms 486.3387 Ops/s 484.0324 Ops/s $\color{#35bf28}+0.48\%$
test_exec_functorch 0.2300ms 0.1787ms 5.5947 KOps/s 5.5388 KOps/s $\color{#35bf28}+1.01\%$
test_exec_functional_call 0.2096ms 0.1594ms 6.2742 KOps/s 6.3748 KOps/s $\color{#d91a1a}-1.58\%$
test_exec_td_decorator 0.4492ms 0.2345ms 4.2649 KOps/s 4.2399 KOps/s $\color{#35bf28}+0.59\%$
test_vmap_mlp_speed_decorator[True-True] 1.0181ms 0.8159ms 1.2257 KOps/s 1.2092 KOps/s $\color{#35bf28}+1.36\%$
test_vmap_mlp_speed_decorator[True-False] 1.0202ms 0.8182ms 1.2221 KOps/s 1.2132 KOps/s $\color{#35bf28}+0.74\%$
test_vmap_mlp_speed_decorator[False-True] 0.8840ms 0.7075ms 1.4135 KOps/s 1.4020 KOps/s $\color{#35bf28}+0.82\%$
test_vmap_mlp_speed_decorator[False-False] 0.9108ms 0.7079ms 1.4127 KOps/s 1.4042 KOps/s $\color{#35bf28}+0.61\%$
test_vmap_transformer_speed_decorator[True-True] 21.1447ms 20.3616ms 49.1122 Ops/s 48.7245 Ops/s $\color{#35bf28}+0.80\%$
test_vmap_transformer_speed_decorator[True-False] 21.1679ms 20.3888ms 49.0466 Ops/s 48.7268 Ops/s $\color{#35bf28}+0.66\%$
test_vmap_transformer_speed_decorator[False-True] 20.7433ms 20.1562ms 49.6126 Ops/s 49.2583 Ops/s $\color{#35bf28}+0.72\%$
test_vmap_transformer_speed_decorator[False-False] 20.3626ms 20.1742ms 49.5682 Ops/s 49.2791 Ops/s $\color{#35bf28}+0.59\%$
test_to_module_speed[True] 1.6131ms 1.4819ms 674.8184 Ops/s 665.5558 Ops/s $\color{#35bf28}+1.39\%$
test_to_module_speed[False] 1.5464ms 1.4441ms 692.4505 Ops/s 681.6535 Ops/s $\color{#35bf28}+1.58\%$
test_tc_init 77.3920μs 46.7766μs 21.3782 KOps/s 21.5013 KOps/s $\color{#d91a1a}-0.57\%$
test_tc_init_tensor_only 38.5800μs 9.9674μs 100.3267 KOps/s 100.8990 KOps/s $\color{#d91a1a}-0.57\%$
test_tc_init_nested 0.1478ms 93.3513μs 10.7122 KOps/s 10.8309 KOps/s $\color{#d91a1a}-1.10\%$
test_tc_init_many_fields 43.5210μs 16.7281μs 59.7797 KOps/s 60.5693 KOps/s $\color{#d91a1a}-1.30\%$
test_tc_first_layer_tensor 22.1110μs 1.8567μs 538.5882 KOps/s 538.7478 KOps/s $\color{#d91a1a}-0.03\%$
test_tc_first_layer_tensor_only 5.0900μs 0.7599μs 1.3160 MOps/s 1.3133 MOps/s $\color{#35bf28}+0.21\%$
test_tc_first_layer_tensor_set 26.6200μs 4.1694μs 239.8401 KOps/s 235.9953 KOps/s $\color{#35bf28}+1.63\%$
test_tc_first_layer_tensor_only_set 16.6000μs 3.1389μs 318.5831 KOps/s 314.0949 KOps/s $\color{#35bf28}+1.43\%$
test_tc_first_layer_nontensor 28.3400μs 6.1696μs 162.0850 KOps/s 161.2530 KOps/s $\color{#35bf28}+0.52\%$
test_tc_second_layer_tensor 40.4200μs 4.4558μs 224.4245 KOps/s 224.8797 KOps/s $\color{#d91a1a}-0.20\%$
test_tc_second_layer_nontensor 40.3800μs 8.7592μs 114.1657 KOps/s 113.1505 KOps/s $\color{#35bf28}+0.90\%$
test_unbind 0.2448s 14.2322ms 70.2631 Ops/s 57.0595 Ops/s $\textbf{\color{#35bf28}+23.14\%}$
test_full_like 4.9841ms 4.3849ms 228.0576 Ops/s 228.6780 Ops/s $\color{#d91a1a}-0.27\%$
test_zeros_like 4.4643ms 4.3580ms 229.4604 Ops/s 228.8660 Ops/s $\color{#35bf28}+0.26\%$
test_ones_like 4.4802ms 4.3655ms 229.0671 Ops/s 228.8934 Ops/s $\color{#35bf28}+0.08\%$
test_clone 6.6580ms 6.4544ms 154.9337 Ops/s 154.5093 Ops/s $\color{#35bf28}+0.27\%$
test_squeeze 0.1879ms 13.9733μs 71.5651 KOps/s 69.2840 KOps/s $\color{#35bf28}+3.29\%$
test_unsqueeze 0.2607ms 0.1103ms 9.0686 KOps/s 9.0254 KOps/s $\color{#35bf28}+0.48\%$
test_split 0.2425ms 0.1835ms 5.4482 KOps/s 5.4460 KOps/s $\color{#35bf28}+0.04\%$
test_permute 0.2486ms 0.2029ms 4.9287 KOps/s 4.9190 KOps/s $\color{#35bf28}+0.20\%$
test_stack 51.4509ms 51.1405ms 19.5540 Ops/s 23.3102 Ops/s $\textbf{\color{#d91a1a}-16.11\%}$
test_cat 51.4945ms 50.9972ms 19.6089 Ops/s 23.2974 Ops/s $\textbf{\color{#d91a1a}-15.83\%}$

@github-actions
Copy link
Contributor

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 243. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}16$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 53.0710μs 15.0821μs 66.3037 KOps/s 66.1575 KOps/s $\color{#35bf28}+0.22\%$
test_plain_set_stack_nested 36.7400μs 15.1358μs 66.0684 KOps/s 64.4900 KOps/s $\color{#35bf28}+2.45\%$
test_plain_set_nested_inplace 64.5810μs 16.6186μs 60.1737 KOps/s 58.4135 KOps/s $\color{#35bf28}+3.01\%$
test_plain_set_stack_nested_inplace 45.5600μs 16.6449μs 60.0786 KOps/s 59.5080 KOps/s $\color{#35bf28}+0.96\%$
test_items 36.3900μs 5.9295μs 168.6495 KOps/s 169.7163 KOps/s $\color{#d91a1a}-0.63\%$
test_items_nested 0.7194ms 0.5388ms 1.8561 KOps/s 1.8674 KOps/s $\color{#d91a1a}-0.61\%$
test_items_nested_locked 0.7629ms 0.5391ms 1.8549 KOps/s 1.8492 KOps/s $\color{#35bf28}+0.31\%$
test_items_nested_leaf 0.1756ms 97.8323μs 10.2216 KOps/s 10.4667 KOps/s $\color{#d91a1a}-2.34\%$
test_items_stack_nested 0.7246ms 0.5398ms 1.8524 KOps/s 1.8767 KOps/s $\color{#d91a1a}-1.30\%$
test_items_stack_nested_leaf 0.1528ms 96.4331μs 10.3699 KOps/s 10.1885 KOps/s $\color{#35bf28}+1.78\%$
test_items_stack_nested_locked 0.7529ms 0.5452ms 1.8343 KOps/s 1.8217 KOps/s $\color{#35bf28}+0.69\%$
test_keys 46.2210μs 4.2852μs 233.3614 KOps/s 229.2209 KOps/s $\color{#35bf28}+1.81\%$
test_keys_nested 0.2133ms 0.1228ms 8.1463 KOps/s 8.3246 KOps/s $\color{#d91a1a}-2.14\%$
test_keys_nested_locked 88.6536ms 0.1431ms 6.9886 KOps/s 7.7574 KOps/s $\textbf{\color{#d91a1a}-9.91\%}$
test_keys_nested_leaf 0.1700ms 0.1122ms 8.9096 KOps/s 9.0275 KOps/s $\color{#d91a1a}-1.31\%$
test_keys_stack_nested 0.1932ms 0.1197ms 8.3523 KOps/s 8.1877 KOps/s $\color{#35bf28}+2.01\%$
test_keys_stack_nested_leaf 0.2009ms 0.1127ms 8.8700 KOps/s 8.9122 KOps/s $\color{#d91a1a}-0.47\%$
test_keys_stack_nested_locked 0.1911ms 0.1301ms 7.6835 KOps/s 7.6462 KOps/s $\color{#35bf28}+0.49\%$
test_values 10.3822μs 1.0281μs 972.6747 KOps/s 977.9815 KOps/s $\color{#d91a1a}-0.54\%$
test_values_nested 87.6420μs 47.3071μs 21.1385 KOps/s 20.8226 KOps/s $\color{#35bf28}+1.52\%$
test_values_nested_locked 96.1420μs 50.5078μs 19.7989 KOps/s 19.5504 KOps/s $\color{#35bf28}+1.27\%$
test_values_nested_leaf 88.3020μs 53.6431μs 18.6417 KOps/s 18.4562 KOps/s $\color{#35bf28}+1.01\%$
test_values_stack_nested 0.1177ms 47.4836μs 21.0599 KOps/s 20.9452 KOps/s $\color{#35bf28}+0.55\%$
test_values_stack_nested_leaf 0.1003ms 54.2725μs 18.4255 KOps/s 18.4556 KOps/s $\color{#d91a1a}-0.16\%$
test_values_stack_nested_locked 84.1010μs 50.6943μs 19.7261 KOps/s 19.7761 KOps/s $\color{#d91a1a}-0.25\%$
test_membership 7.3252μs 0.8415μs 1.1884 MOps/s 1.1755 MOps/s $\color{#35bf28}+1.10\%$
test_membership_nested 22.9900μs 3.2104μs 311.4880 KOps/s 315.5347 KOps/s $\color{#d91a1a}-1.28\%$
test_membership_nested_leaf 55.1410μs 3.2106μs 311.4644 KOps/s 316.0496 KOps/s $\color{#d91a1a}-1.45\%$
test_membership_stacked_nested 32.5110μs 3.2668μs 306.1077 KOps/s 314.7572 KOps/s $\color{#d91a1a}-2.75\%$
test_membership_stacked_nested_leaf 53.8310μs 3.1878μs 313.6961 KOps/s 316.3856 KOps/s $\color{#d91a1a}-0.85\%$
test_membership_nested_last 37.6410μs 4.6408μs 215.4810 KOps/s 219.5590 KOps/s $\color{#d91a1a}-1.86\%$
test_membership_nested_leaf_last 67.9310μs 4.6713μs 214.0734 KOps/s 218.2892 KOps/s $\color{#d91a1a}-1.93\%$
test_membership_stacked_nested_last 37.0010μs 4.6626μs 214.4743 KOps/s 217.4423 KOps/s $\color{#d91a1a}-1.36\%$
test_membership_stacked_nested_leaf_last 43.9710μs 4.7036μs 212.6050 KOps/s 213.3685 KOps/s $\color{#d91a1a}-0.36\%$
test_nested_getleaf 60.6610μs 21.3394μs 46.8616 KOps/s 45.1639 KOps/s $\color{#35bf28}+3.76\%$
test_nested_get 64.3010μs 20.4316μs 48.9438 KOps/s 47.5484 KOps/s $\color{#35bf28}+2.93\%$
test_stacked_getleaf 61.5010μs 21.9681μs 45.5205 KOps/s 45.4699 KOps/s $\color{#35bf28}+0.11\%$
test_stacked_get 59.4210μs 20.5338μs 48.7002 KOps/s 47.6541 KOps/s $\color{#35bf28}+2.20\%$
test_nested_getitemleaf 50.3110μs 22.0883μs 45.2729 KOps/s 43.5470 KOps/s $\color{#35bf28}+3.96\%$
test_nested_getitem 63.5610μs 21.0652μs 47.4716 KOps/s 47.2002 KOps/s $\color{#35bf28}+0.58\%$
test_stacked_getitemleaf 72.9610μs 22.0222μs 45.4087 KOps/s 43.9832 KOps/s $\color{#35bf28}+3.24\%$
test_stacked_getitem 62.4810μs 21.1625μs 47.2533 KOps/s 47.0606 KOps/s $\color{#35bf28}+0.41\%$
test_lock_nested 7.6490ms 0.4804ms 2.0816 KOps/s 2.0715 KOps/s $\color{#35bf28}+0.49\%$
test_lock_stack_nested 0.5620ms 0.4827ms 2.0717 KOps/s 2.0573 KOps/s $\color{#35bf28}+0.70\%$
test_unlock_nested 0.5229ms 0.3809ms 2.6253 KOps/s 2.5299 KOps/s $\color{#35bf28}+3.77\%$
test_unlock_stack_nested 0.4517ms 0.3842ms 2.6031 KOps/s 2.5435 KOps/s $\color{#35bf28}+2.34\%$
test_flatten_speed 0.1983ms 0.1215ms 8.2333 KOps/s 8.1208 KOps/s $\color{#35bf28}+1.39\%$
test_unflatten_speed 0.7803ms 0.6066ms 1.6486 KOps/s 1.6593 KOps/s $\color{#d91a1a}-0.64\%$
test_common_ops 0.8836ms 0.6943ms 1.4404 KOps/s 1.4179 KOps/s $\color{#35bf28}+1.59\%$
test_creation 0.4426ms 2.9864μs 334.8525 KOps/s 334.0187 KOps/s $\color{#35bf28}+0.25\%$
test_creation_empty 29.7600μs 6.2811μs 159.2086 KOps/s 157.7846 KOps/s $\color{#35bf28}+0.90\%$
test_creation_nested_1 30.8810μs 10.8843μs 91.8756 KOps/s 90.5681 KOps/s $\color{#35bf28}+1.44\%$
test_creation_nested_2 0.4265ms 11.8778μs 84.1909 KOps/s 82.3429 KOps/s $\color{#35bf28}+2.24\%$
test_creation_many_keys[10] 41.1510μs 18.7131μs 53.4385 KOps/s 54.3758 KOps/s $\color{#d91a1a}-1.72\%$
test_creation_many_keys[50] 0.1189ms 78.6281μs 12.7181 KOps/s 12.6690 KOps/s $\color{#35bf28}+0.39\%$
test_creation_many_keys[100] 0.5739ms 0.1538ms 6.5041 KOps/s 6.5291 KOps/s $\color{#d91a1a}-0.38\%$
test_creation_nested_many_keys[10] 0.4560ms 39.6310μs 25.2327 KOps/s 24.9416 KOps/s $\color{#35bf28}+1.17\%$
test_creation_nested_many_keys[50] 0.5745ms 0.1616ms 6.1898 KOps/s 6.1940 KOps/s $\color{#d91a1a}-0.07\%$
test_clone 40.0210μs 13.4088μs 74.5779 KOps/s 74.5675 KOps/s $\color{#35bf28}+0.01\%$
test_getitem[int] 1.6164ms 14.5949μs 68.5172 KOps/s 55.6337 KOps/s $\textbf{\color{#35bf28}+23.16\%}$
test_getitem[slice_int] 0.4546ms 26.4629μs 37.7887 KOps/s 39.1609 KOps/s $\color{#d91a1a}-3.50\%$
test_getitem[range] 0.1820ms 66.4768μs 15.0428 KOps/s 16.0363 KOps/s $\textbf{\color{#d91a1a}-6.20\%}$
test_getitem[tuple] 0.1490ms 25.0304μs 39.9514 KOps/s 40.8746 KOps/s $\color{#d91a1a}-2.26\%$
test_getitem[list] 0.5042ms 62.0778μs 16.1088 KOps/s 17.5870 KOps/s $\textbf{\color{#d91a1a}-8.40\%}$
test_setitem_dim[int] 49.6810μs 28.8160μs 34.7030 KOps/s 38.8951 KOps/s $\textbf{\color{#d91a1a}-10.78\%}$
test_setitem_dim[slice_int] 72.0010μs 47.4451μs 21.0770 KOps/s 22.5778 KOps/s $\textbf{\color{#d91a1a}-6.65\%}$
test_setitem_dim[range] 0.5472ms 0.1027ms 9.7360 KOps/s 10.6715 KOps/s $\textbf{\color{#d91a1a}-8.77\%}$
test_setitem_dim[tuple] 71.9810μs 44.4919μs 22.4760 KOps/s 24.1922 KOps/s $\textbf{\color{#d91a1a}-7.09\%}$
test_setitem 67.9910μs 18.4187μs 54.2928 KOps/s 55.1216 KOps/s $\color{#d91a1a}-1.50\%$
test_set 60.7710μs 17.5603μs 56.9465 KOps/s 56.6949 KOps/s $\color{#35bf28}+0.44\%$
test_set_shared 0.6301ms 0.2100ms 4.7630 KOps/s 4.7867 KOps/s $\color{#d91a1a}-0.50\%$
test_update 0.1997ms 22.3499μs 44.7430 KOps/s 44.9933 KOps/s $\color{#d91a1a}-0.56\%$
test_update_nested 78.1010μs 34.6024μs 28.8997 KOps/s 28.6023 KOps/s $\color{#35bf28}+1.04\%$
test_update__nested 0.4597ms 34.3179μs 29.1393 KOps/s 28.4697 KOps/s $\color{#35bf28}+2.35\%$
test_set_nested 58.2410μs 18.8768μs 52.9751 KOps/s 51.6232 KOps/s $\color{#35bf28}+2.62\%$
test_set_nested_new 59.0210μs 24.2108μs 41.3039 KOps/s 39.9394 KOps/s $\color{#35bf28}+3.42\%$
test_select 80.7810μs 42.8060μs 23.3612 KOps/s 23.4409 KOps/s $\color{#d91a1a}-0.34\%$
test_select_nested 0.1054ms 75.0275μs 13.3284 KOps/s 13.2317 KOps/s $\color{#35bf28}+0.73\%$
test_exclude_nested 0.1481ms 98.6181μs 10.1401 KOps/s 10.1745 KOps/s $\color{#d91a1a}-0.34\%$
test_empty[True] 0.5673ms 0.4435ms 2.2549 KOps/s 2.2647 KOps/s $\color{#d91a1a}-0.43\%$
test_empty[False] 9.3575μs 1.3300μs 751.8762 KOps/s 750.8220 KOps/s $\color{#35bf28}+0.14\%$
test_to 0.1026ms 71.1736μs 14.0501 KOps/s 13.7655 KOps/s $\color{#35bf28}+2.07\%$
test_to_nonblocking 0.1180ms 64.8959μs 15.4093 KOps/s 15.6550 KOps/s $\color{#d91a1a}-1.57\%$
test_unbind_speed 0.4544ms 0.3270ms 3.0584 KOps/s 3.0610 KOps/s $\color{#d91a1a}-0.08\%$
test_unbind_speed_stack0 0.4375ms 0.3248ms 3.0788 KOps/s 3.0059 KOps/s $\color{#35bf28}+2.42\%$
test_unbind_speed_stack1 0.1020s 0.9105ms 1.0983 KOps/s 1.1799 KOps/s $\textbf{\color{#d91a1a}-6.92\%}$
test_split 1.3307ms 1.1421ms 875.5524 Ops/s 776.5132 Ops/s $\textbf{\color{#35bf28}+12.75\%}$
test_chunk 0.1018s 1.2053ms 829.6720 Ops/s 916.1265 Ops/s $\textbf{\color{#d91a1a}-9.44\%}$
test_to_cpu_blocking 19.6972ms 19.4171ms 51.5011 Ops/s 40.0220 Ops/s $\textbf{\color{#35bf28}+28.68\%}$
test_to_cpu_global_sync 11.4553ms 11.2534ms 88.8621 Ops/s 88.9312 Ops/s $\color{#d91a1a}-0.08\%$
test_to_cpu_event_sync 0.1137s 13.4787ms 74.1913 Ops/s 81.7288 Ops/s $\textbf{\color{#d91a1a}-9.22\%}$
test_to_cpu_default 12.6440ms 12.3314ms 81.0937 Ops/s 81.4164 Ops/s $\color{#d91a1a}-0.40\%$
test_consolidate[False-None] 4.7298ms 4.1539ms 240.7394 Ops/s 217.2961 Ops/s $\textbf{\color{#35bf28}+10.79\%}$
test_consolidate[default-None] 2.1744ms 2.0278ms 493.1467 Ops/s 472.2677 Ops/s $\color{#35bf28}+4.42\%$
test_consolidate[reduce-overhead-None] 2.0166ms 1.9471ms 513.5753 Ops/s 501.8109 Ops/s $\color{#35bf28}+2.34\%$
test_consolidate_njt[False-None] 8.6583ms 8.5203ms 117.3665 Ops/s 116.0160 Ops/s $\color{#35bf28}+1.16\%$
test_to[False-False-None] 2.1788ms 2.0743ms 482.0791 Ops/s 475.4570 Ops/s $\color{#35bf28}+1.39\%$
test_to[True-False-None] 2.2244ms 1.9593ms 510.3743 Ops/s 515.8969 Ops/s $\color{#d91a1a}-1.07\%$
test_to[within-False-None] 6.2253ms 6.1525ms 162.5365 Ops/s 162.2329 Ops/s $\color{#35bf28}+0.19\%$
test_to[True-default-None] 7.7209ms 7.5354ms 132.7076 Ops/s 127.5393 Ops/s $\color{#35bf28}+4.05\%$
test_to_njt[False-False-None] 8.8480ms 8.6691ms 115.3521 Ops/s 115.6911 Ops/s $\color{#d91a1a}-0.29\%$
test_to_njt[True-False-None] 7.1665ms 6.9966ms 142.9261 Ops/s 141.6473 Ops/s $\color{#35bf28}+0.90\%$
test_to_njt[within-False-None] 16.1343ms 15.8437ms 63.1164 Ops/s 63.4315 Ops/s $\color{#d91a1a}-0.50\%$
test_creation[device0] 0.3517ms 0.1170ms 8.5491 KOps/s 8.3763 KOps/s $\color{#35bf28}+2.06\%$
test_creation_from_tensor 0.3679ms 0.1140ms 8.7683 KOps/s 8.6514 KOps/s $\color{#35bf28}+1.35\%$
test_add_one[memmap_tensor0] 0.2681ms 6.3528μs 157.4103 KOps/s 155.4092 KOps/s $\color{#35bf28}+1.29\%$
test_contiguous[memmap_tensor0] 19.7610μs 0.6724μs 1.4872 MOps/s 2.1097 MOps/s $\textbf{\color{#d91a1a}-29.50\%}$
test_stack[memmap_tensor0] 26.8000μs 4.6422μs 215.4173 KOps/s 215.4886 KOps/s $\color{#d91a1a}-0.03\%$
test_memmaptd_index 0.9977ms 0.2599ms 3.8480 KOps/s 3.7552 KOps/s $\color{#35bf28}+2.47\%$
test_memmaptd_index_astensor 0.5143ms 0.3538ms 2.8265 KOps/s 2.8007 KOps/s $\color{#35bf28}+0.92\%$
test_memmaptd_index_op 0.8490ms 0.5900ms 1.6950 KOps/s 1.6673 KOps/s $\color{#35bf28}+1.66\%$
test_serialize_model 0.1398s 0.1371s 7.2915 Ops/s 7.2363 Ops/s $\color{#35bf28}+0.76\%$
test_serialize_model_pickle 1.3485s 1.1930s 0.8382 Ops/s 0.8230 Ops/s $\color{#35bf28}+1.85\%$
test_serialize_weights 0.1374s 0.1360s 7.3506 Ops/s 7.2940 Ops/s $\color{#35bf28}+0.78\%$
test_serialize_weights_returnearly 0.4393s 94.4192ms 10.5911 Ops/s 6.3024 Ops/s $\textbf{\color{#35bf28}+68.05\%}$
test_serialize_weights_pickle 1.3649s 1.2132s 0.8242 Ops/s 0.8215 Ops/s $\color{#35bf28}+0.34\%$
test_reshape_pytree 0.2088ms 33.6015μs 29.7606 KOps/s 29.9943 KOps/s $\color{#d91a1a}-0.78\%$
test_reshape_td 80.0820μs 44.9238μs 22.2599 KOps/s 21.8760 KOps/s $\color{#35bf28}+1.75\%$
test_view_pytree 0.2185ms 34.4273μs 29.0467 KOps/s 30.1859 KOps/s $\color{#d91a1a}-3.77\%$
test_view_td 96.3720μs 53.6073μs 18.6542 KOps/s 19.2987 KOps/s $\color{#d91a1a}-3.34\%$
test_unbind_pytree 0.2373ms 38.1386μs 26.2202 KOps/s 26.7354 KOps/s $\color{#d91a1a}-1.93\%$
test_unbind_td 0.1823ms 49.6080μs 20.1580 KOps/s 19.9604 KOps/s $\color{#35bf28}+0.99\%$
test_split_pytree 0.2632ms 42.7840μs 23.3732 KOps/s 23.2869 KOps/s $\color{#35bf28}+0.37\%$
test_split_td 0.1141ms 64.6353μs 15.4714 KOps/s 15.2330 KOps/s $\color{#35bf28}+1.56\%$
test_add_pytree 0.2341ms 42.6024μs 23.4729 KOps/s 24.0821 KOps/s $\color{#d91a1a}-2.53\%$
test_add_td 0.1123ms 54.0868μs 18.4888 KOps/s 19.0788 KOps/s $\color{#d91a1a}-3.09\%$
test_compile_add_one_nested[tensordict-compile] 0.1941ms 0.1396ms 7.1637 KOps/s 6.6339 KOps/s $\textbf{\color{#35bf28}+7.99\%}$
test_compile_add_one_nested[tensordict-eager] 0.3957ms 0.2097ms 4.7684 KOps/s 5.1859 KOps/s $\textbf{\color{#d91a1a}-8.05\%}$
test_compile_add_one_nested[pytree-compile] 0.1634ms 0.1100ms 9.0908 KOps/s 8.9516 KOps/s $\color{#35bf28}+1.56\%$
test_compile_add_one_nested[pytree-eager] 0.4371ms 0.1814ms 5.5129 KOps/s 5.5160 KOps/s $\color{#d91a1a}-0.06\%$
test_compile_copy_nested[tensordict-compile] 0.2853ms 38.3004μs 26.1094 KOps/s 30.5081 KOps/s $\textbf{\color{#d91a1a}-14.42\%}$
test_compile_copy_nested[tensordict-eager] 92.5320μs 53.5682μs 18.6678 KOps/s 18.8920 KOps/s $\color{#d91a1a}-1.19\%$
test_compile_copy_nested[pytree-compile] 0.1233ms 10.0526μs 99.4769 KOps/s 101.5130 KOps/s $\color{#d91a1a}-2.01\%$
test_compile_copy_nested[pytree-eager] 0.4569ms 70.1654μs 14.2520 KOps/s 14.2108 KOps/s $\color{#35bf28}+0.29\%$
test_compile_add_one_flat[tensordict-compile] 0.2825ms 0.1790ms 5.5859 KOps/s 5.3769 KOps/s $\color{#35bf28}+3.89\%$
test_compile_add_one_flat[tensordict-eager] 0.3377ms 0.2572ms 3.8879 KOps/s 3.8951 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_add_one_flat[tensorclass-compile] 0.1734ms 0.1184ms 8.4473 KOps/s 8.2173 KOps/s $\color{#35bf28}+2.80\%$
test_compile_add_one_flat[tensorclass-eager] 0.1355ms 69.5889μs 14.3701 KOps/s 14.3150 KOps/s $\color{#35bf28}+0.39\%$
test_compile_add_one_flat[pytree-compile] 0.4215ms 0.1588ms 6.2978 KOps/s 6.1289 KOps/s $\color{#35bf28}+2.76\%$
test_compile_add_one_flat[pytree-eager] 0.8531ms 0.5310ms 1.8832 KOps/s 1.8557 KOps/s $\color{#35bf28}+1.48\%$
test_compile_add_self_flat[tensordict-eager] 0.3901ms 0.3167ms 3.1576 KOps/s 3.2020 KOps/s $\color{#d91a1a}-1.39\%$
test_compile_add_self_flat[tensordict-compile] 0.5002ms 0.1833ms 5.4561 KOps/s 5.1329 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_compile_add_self_flat[tensorclass-eager] 0.2150ms 88.5229μs 11.2965 KOps/s 11.7413 KOps/s $\color{#d91a1a}-3.79\%$
test_compile_add_self_flat[tensorclass-compile] 0.2827ms 0.1225ms 8.1622 KOps/s 8.0540 KOps/s $\color{#35bf28}+1.34\%$
test_compile_add_self_flat[pytree-eager] 0.6848ms 0.4460ms 2.2422 KOps/s 2.2745 KOps/s $\color{#d91a1a}-1.42\%$
test_compile_add_self_flat[pytree-compile] 0.2817ms 0.1643ms 6.0849 KOps/s 6.0232 KOps/s $\color{#35bf28}+1.02\%$
test_compile_copy_flat[tensordict-compile] 0.1308ms 24.8354μs 40.2652 KOps/s 38.3730 KOps/s $\color{#35bf28}+4.93\%$
test_compile_copy_flat[tensordict-eager] 0.1587ms 41.4162μs 24.1452 KOps/s 23.6685 KOps/s $\color{#35bf28}+2.01\%$
test_compile_copy_flat[pytree-compile] 1.3401ms 10.8812μs 91.9012 KOps/s 91.1391 KOps/s $\color{#35bf28}+0.84\%$
test_compile_copy_flat[pytree-eager] 0.4009ms 52.6727μs 18.9852 KOps/s 18.8340 KOps/s $\color{#35bf28}+0.80\%$
test_compile_assign_and_add[tensordict-compile] 1.9930ms 0.1737ms 5.7584 KOps/s 5.3748 KOps/s $\textbf{\color{#35bf28}+7.14\%}$
test_compile_assign_and_add[tensordict-eager] 3.4359ms 3.3210ms 301.1182 Ops/s 301.3263 Ops/s $\color{#d91a1a}-0.07\%$
test_compile_assign_and_add[pytree-compile] 1.9662ms 0.1626ms 6.1501 KOps/s 6.1088 KOps/s $\color{#35bf28}+0.68\%$
test_compile_assign_and_add[pytree-eager] 2.9228ms 2.7896ms 358.4711 Ops/s 360.3436 Ops/s $\color{#d91a1a}-0.52\%$
test_compile_indexing[tensor-tensordict-compile] 0.2334ms 0.1092ms 9.1595 KOps/s 8.8248 KOps/s $\color{#35bf28}+3.79\%$
test_compile_indexing[tensor-tensordict-eager] 0.3122ms 74.5294μs 13.4175 KOps/s 13.8701 KOps/s $\color{#d91a1a}-3.26\%$
test_compile_indexing[tensor-tensorclass-compile] 0.2212ms 98.0198μs 10.2020 KOps/s 10.2938 KOps/s $\color{#d91a1a}-0.89\%$
test_compile_indexing[tensor-tensorclass-eager] 0.2690ms 45.3790μs 22.0366 KOps/s 22.0753 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_indexing[tensor-pytree-compile] 0.1476ms 99.5113μs 10.0491 KOps/s 10.2247 KOps/s $\color{#d91a1a}-1.72\%$
test_compile_indexing[tensor-pytree-eager] 0.3125ms 46.3775μs 21.5622 KOps/s 22.2019 KOps/s $\color{#d91a1a}-2.88\%$
test_compile_indexing[slice-tensordict-compile] 0.1915ms 56.3591μs 17.7434 KOps/s 17.0846 KOps/s $\color{#35bf28}+3.86\%$
test_compile_indexing[slice-tensordict-eager] 0.2331ms 28.1087μs 35.5762 KOps/s 34.9948 KOps/s $\color{#35bf28}+1.66\%$
test_compile_indexing[slice-tensorclass-compile] 0.1401ms 45.2288μs 22.1098 KOps/s 21.4823 KOps/s $\color{#35bf28}+2.92\%$
test_compile_indexing[slice-tensorclass-eager] 0.2646ms 22.9107μs 43.6477 KOps/s 43.4145 KOps/s $\color{#35bf28}+0.54\%$
test_compile_indexing[slice-pytree-compile] 87.6410μs 45.6474μs 21.9071 KOps/s 21.3913 KOps/s $\color{#35bf28}+2.41\%$
test_compile_indexing[slice-pytree-eager] 0.2553ms 22.9371μs 43.5974 KOps/s 42.9271 KOps/s $\color{#35bf28}+1.56\%$
test_compile_indexing[int-tensordict-compile] 0.1001ms 57.3708μs 17.4305 KOps/s 16.7750 KOps/s $\color{#35bf28}+3.91\%$
test_compile_indexing[int-tensordict-eager] 0.2761ms 28.6169μs 34.9443 KOps/s 35.2447 KOps/s $\color{#d91a1a}-0.85\%$
test_compile_indexing[int-tensorclass-compile] 88.3910μs 46.2193μs 21.6360 KOps/s 20.6802 KOps/s $\color{#35bf28}+4.62\%$
test_compile_indexing[int-tensorclass-eager] 0.2655ms 22.9767μs 43.5223 KOps/s 43.4663 KOps/s $\color{#35bf28}+0.13\%$
test_compile_indexing[int-pytree-compile] 82.1110μs 45.2072μs 22.1204 KOps/s 21.0865 KOps/s $\color{#35bf28}+4.90\%$
test_compile_indexing[int-pytree-eager] 0.2938ms 22.9199μs 43.6303 KOps/s 42.9167 KOps/s $\color{#35bf28}+1.66\%$
test_mod_add[eager] 87.3720μs 50.9337μs 19.6334 KOps/s 19.6733 KOps/s $\color{#d91a1a}-0.20\%$
test_mod_add[compile] 0.1856ms 0.1041ms 9.6021 KOps/s 9.3726 KOps/s $\color{#35bf28}+2.45\%$
test_mod_add[compile-overhead] 0.2436ms 0.1496ms 6.6841 KOps/s 6.6424 KOps/s $\color{#35bf28}+0.63\%$
test_mod_wrap[eager] 0.3961ms 0.3145ms 3.1793 KOps/s 3.4323 KOps/s $\textbf{\color{#d91a1a}-7.37\%}$
test_mod_wrap[compile] 0.4561ms 0.3634ms 2.7516 KOps/s 2.8185 KOps/s $\color{#d91a1a}-2.37\%$
test_mod_wrap[compile-overhead] 7.3591ms 4.0797ms 245.1177 Ops/s 245.7921 Ops/s $\color{#d91a1a}-0.27\%$
test_mod_wrap_and_backward[eager] 2.1742ms 1.5233ms 656.4601 Ops/s 658.4517 Ops/s $\color{#d91a1a}-0.30\%$
test_mod_wrap_and_backward[compile] 1.6233ms 1.4580ms 685.8927 Ops/s 685.1319 Ops/s $\color{#35bf28}+0.11\%$
test_mod_wrap_and_backward[compile-overhead] 1.3419ms 0.9050ms 1.1049 KOps/s 1.1001 KOps/s $\color{#35bf28}+0.44\%$
test_seq_add[eager] 0.2301ms 0.1608ms 6.2195 KOps/s 6.2239 KOps/s $\color{#d91a1a}-0.07\%$
test_seq_add[compile] 0.1732ms 0.1164ms 8.5927 KOps/s 8.3709 KOps/s $\color{#35bf28}+2.65\%$
test_seq_add[compile-overhead] 0.2381ms 0.1553ms 6.4387 KOps/s 6.3019 KOps/s $\color{#35bf28}+2.17\%$
test_seq_wrap[eager] 0.5970ms 0.5341ms 1.8725 KOps/s 1.9071 KOps/s $\color{#d91a1a}-1.82\%$
test_seq_wrap[compile] 0.4865ms 0.3751ms 2.6663 KOps/s 2.6953 KOps/s $\color{#d91a1a}-1.08\%$
test_seq_wrap[compile-overhead] 0.3333ms 0.2667ms 3.7496 KOps/s 3.7371 KOps/s $\color{#35bf28}+0.33\%$
test_func_call_runtime[False-eager] 0.9848ms 0.8934ms 1.1193 KOps/s 1.1299 KOps/s $\color{#d91a1a}-0.94\%$
test_func_call_runtime[False-compile] 1.0669ms 0.9299ms 1.0754 KOps/s 1.0792 KOps/s $\color{#d91a1a}-0.36\%$
test_func_call_runtime[False-compile-overhead] 0.5276ms 0.4635ms 2.1574 KOps/s 2.1547 KOps/s $\color{#35bf28}+0.12\%$
test_func_call_runtime[True-eager] 1.1718ms 1.0832ms 923.2145 Ops/s 912.0631 Ops/s $\color{#35bf28}+1.22\%$
test_func_call_runtime[True-compile] 1.0548ms 0.9247ms 1.0814 KOps/s 1.0665 KOps/s $\color{#35bf28}+1.40\%$
test_func_call_runtime[True-compile-overhead] 0.5521ms 0.4793ms 2.0862 KOps/s 2.0600 KOps/s $\color{#35bf28}+1.27\%$
test_func_call_cm_runtime[False-eager] 0.9313ms 0.8390ms 1.1919 KOps/s 1.1811 KOps/s $\color{#35bf28}+0.92\%$
test_func_call_cm_runtime[False-compile] 1.0198ms 0.9272ms 1.0785 KOps/s 1.0758 KOps/s $\color{#35bf28}+0.25\%$
test_func_call_cm_runtime[False-compile-overhead] 0.6012ms 0.4674ms 2.1394 KOps/s 2.1297 KOps/s $\color{#35bf28}+0.46\%$
test_func_call_cm_runtime[True-eager] 1.3851ms 1.2372ms 808.2510 Ops/s 804.6322 Ops/s $\color{#35bf28}+0.45\%$
test_func_call_cm_runtime[True-compile] 1.0321ms 0.9615ms 1.0400 KOps/s 1.0344 KOps/s $\color{#35bf28}+0.54\%$
test_func_call_cm_runtime[True-compile-overhead] 0.5699ms 0.5056ms 1.9778 KOps/s 1.9547 KOps/s $\color{#35bf28}+1.18\%$
test_vmap_func_call_cm_runtime[eager] 2.8230ms 2.3434ms 426.7330 Ops/s 422.0612 Ops/s $\color{#35bf28}+1.11\%$
test_vmap_func_call_cm_runtime[compile] 1.0659ms 0.9823ms 1.0180 KOps/s 1.0151 KOps/s $\color{#35bf28}+0.28\%$
test_vmap_func_call_cm_runtime[compile-overhead] 0.5667ms 0.5165ms 1.9361 KOps/s 1.9179 KOps/s $\color{#35bf28}+0.95\%$
test_distributed 0.7618ms 0.1544ms 6.4783 KOps/s 6.4874 KOps/s $\color{#d91a1a}-0.14\%$
test_tdmodule 0.2678ms 28.6635μs 34.8875 KOps/s 33.8436 KOps/s $\color{#35bf28}+3.08\%$
test_tdmodule_dispatch 77.8610μs 46.3951μs 21.5540 KOps/s 21.2084 KOps/s $\color{#35bf28}+1.63\%$
test_tdseq 56.9010μs 28.3841μs 35.2310 KOps/s 36.0692 KOps/s $\color{#d91a1a}-2.32\%$
test_tdseq_dispatch 78.1610μs 49.2009μs 20.3248 KOps/s 20.4376 KOps/s $\color{#d91a1a}-0.55\%$
test_instantiation_functorch 2.1964ms 2.1119ms 473.5070 Ops/s 481.7340 Ops/s $\color{#d91a1a}-1.71\%$
test_exec_functorch 0.2236ms 0.1805ms 5.5393 KOps/s 5.4067 KOps/s $\color{#35bf28}+2.45\%$
test_exec_functional_call 0.2245ms 0.1616ms 6.1866 KOps/s 6.1780 KOps/s $\color{#35bf28}+0.14\%$
test_exec_td_decorator 0.4552ms 0.2367ms 4.2240 KOps/s 4.1925 KOps/s $\color{#35bf28}+0.75\%$
test_vmap_mlp_speed_decorator[True-True] 1.0259ms 0.8205ms 1.2188 KOps/s 1.2157 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed_decorator[True-False] 0.9943ms 0.8187ms 1.2214 KOps/s 1.2178 KOps/s $\color{#35bf28}+0.30\%$
test_vmap_mlp_speed_decorator[False-True] 0.9430ms 0.7097ms 1.4091 KOps/s 1.3848 KOps/s $\color{#35bf28}+1.76\%$
test_vmap_mlp_speed_decorator[False-False] 0.9088ms 0.7148ms 1.3990 KOps/s 1.3984 KOps/s $\color{#35bf28}+0.04\%$
test_vmap_transformer_speed_decorator[True-True] 20.9487ms 20.5709ms 48.6124 Ops/s 48.8250 Ops/s $\color{#d91a1a}-0.44\%$
test_vmap_transformer_speed_decorator[True-False] 21.0421ms 20.6276ms 48.4788 Ops/s 48.6567 Ops/s $\color{#d91a1a}-0.37\%$
test_vmap_transformer_speed_decorator[False-True] 21.2422ms 20.5887ms 48.5703 Ops/s 48.6957 Ops/s $\color{#d91a1a}-0.26\%$
test_vmap_transformer_speed_decorator[False-False] 21.5296ms 20.6171ms 48.5035 Ops/s 49.0393 Ops/s $\color{#d91a1a}-1.09\%$
test_to_module_speed[True] 2.0307ms 1.4770ms 677.0336 Ops/s 656.8239 Ops/s $\color{#35bf28}+3.08\%$
test_to_module_speed[False] 1.9573ms 1.4562ms 686.7321 Ops/s 669.2965 Ops/s $\color{#35bf28}+2.61\%$
test_tc_init 88.1010μs 45.9732μs 21.7518 KOps/s 20.7578 KOps/s $\color{#35bf28}+4.79\%$
test_tc_init_tensor_only 33.9900μs 9.9855μs 100.1455 KOps/s 99.8510 KOps/s $\color{#35bf28}+0.29\%$
test_tc_init_nested 0.1384ms 93.9041μs 10.6492 KOps/s 10.4251 KOps/s $\color{#35bf28}+2.15\%$
test_tc_init_many_fields 57.5410μs 17.2164μs 58.0841 KOps/s 59.4010 KOps/s $\color{#d91a1a}-2.22\%$
test_tc_first_layer_tensor 27.9200μs 1.8947μs 527.7760 KOps/s 533.3707 KOps/s $\color{#d91a1a}-1.05\%$
test_tc_first_layer_tensor_only 4.2029μs 0.7665μs 1.3046 MOps/s 1.3105 MOps/s $\color{#d91a1a}-0.45\%$
test_tc_first_layer_tensor_set 37.9210μs 4.1936μs 238.4579 KOps/s 238.9117 KOps/s $\color{#d91a1a}-0.19\%$
test_tc_first_layer_tensor_only_set 32.1600μs 3.1561μs 316.8470 KOps/s 314.0508 KOps/s $\color{#35bf28}+0.89\%$
test_tc_first_layer_nontensor 34.9210μs 6.3186μs 158.2642 KOps/s 158.7240 KOps/s $\color{#d91a1a}-0.29\%$
test_tc_second_layer_tensor 35.2600μs 4.6189μs 216.5035 KOps/s 222.3110 KOps/s $\color{#d91a1a}-2.61\%$
test_tc_second_layer_nontensor 42.2510μs 8.9711μs 111.4697 KOps/s 111.6973 KOps/s $\color{#d91a1a}-0.20\%$
test_unbind 0.2577s 16.0201ms 62.4214 Ops/s 56.7847 Ops/s $\textbf{\color{#35bf28}+9.93\%}$
test_full_like 5.0700ms 4.3374ms 230.5533 Ops/s 228.1880 Ops/s $\color{#35bf28}+1.04\%$
test_zeros_like 4.4818ms 4.3645ms 229.1218 Ops/s 228.8570 Ops/s $\color{#35bf28}+0.12\%$
test_ones_like 4.9553ms 4.3808ms 228.2689 Ops/s 228.6968 Ops/s $\color{#d91a1a}-0.19\%$
test_clone 6.6185ms 6.4721ms 154.5098 Ops/s 154.7065 Ops/s $\color{#d91a1a}-0.13\%$
test_squeeze 0.1815ms 14.4860μs 69.0322 KOps/s 64.9912 KOps/s $\textbf{\color{#35bf28}+6.22\%}$
test_unsqueeze 0.1677ms 0.1148ms 8.7124 KOps/s 8.7355 KOps/s $\color{#d91a1a}-0.27\%$
test_split 0.2596ms 0.1865ms 5.3629 KOps/s 5.3679 KOps/s $\color{#d91a1a}-0.09\%$
test_permute 0.2969ms 0.2063ms 4.8468 KOps/s 4.7086 KOps/s $\color{#35bf28}+2.93\%$
test_stack 51.8021ms 50.8625ms 19.6608 Ops/s 23.1830 Ops/s $\textbf{\color{#d91a1a}-15.19\%}$
test_cat 51.6385ms 51.2774ms 19.5018 Ops/s 23.0894 Ops/s $\textbf{\color{#d91a1a}-15.54\%}$

Xmaster6y pushed a commit to Xmaster6y/tensordict that referenced this pull request Feb 27, 2026
…strategy

Add benchmarks/storage/bench_redis.py comparing RedisTensorDict against
local TensorDict for get/set, key iteration, indexed read/write (int,
slice, step-slice, fancy, bool mask), and td[idx].to_tensordict().

Performance improvements:
- Fix _tensor_to_bytes: replace bytes(untyped_storage()) with
  tensor.numpy().tobytes() (~8000x faster serialization).
- Override _index_tensordict with _abatch_index: batch all leaf key
  fetches into a single pipeline instead of one round-trip per key.
- Covering-range strategy (_compute_covering_range): every index type
  (int, slice, step-slice, tensor, bool mask) emits at most ONE
  GETRANGE per key. For non-contiguous indices, the covering byte range
  is fetched and a local post-index extracts the requested rows.
- Coalesce contiguous byte ranges for step-1 slices.
- Partial covering-range RMW for writes: step/fancy/bool writes fetch
  only the covering range, patch locally, write back (2 cmds/key
  instead of N SETRANGEs).


ghstack-source-id: b1854cb
Pull-Request: pytorch#1570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Benchmarks CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant