ledger: Build the grouped slot leaders manually by vadorovsky · Pull Request #8451 · anza-xyz/agave

vadorovsky · 2025-10-13T15:04:21Z

Problem

Building them with itertools::into_group_map takes 20.9ms.

Summary of Changes

Building them manually with a pre-allocated hash map, using pubkey hasher, takes 5.2ms.

Ref: #8280

codecov-commenter · 2025-10-13T15:56:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.1%. Comparing base (da1dcd7) to head (34d5d1a).
⚠️ Report is 114 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #8451   +/-   ##
=======================================
  Coverage    83.1%    83.1%           
=======================================
  Files         849      849           
  Lines      321241   321256   +15     
=======================================
+ Hits       267026   267069   +43     
+ Misses      54215    54187   -28

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ledger/src/leader_schedule/identity_keyed.rs

HaoranYi · 2025-10-14T14:23:47Z

Excellent work! I have just one comment!

HaoranYi

LGTM. Please wait for other reviewer's approval before merge.

brooksprumo · 2025-10-15T16:14:30Z

I'm going to bow out and defer to the other reviewers since I'm OOO.

jstarry · 2025-10-16T02:06:12Z

We should probably review our uses of itertools across the codebase because many of the utility methods create hashsets and hashmaps which are not pre-allocated.

vadorovsky · 2025-10-16T07:58:00Z

We should probably review our uses of itertools across the codebase because many of the utility methods create hashsets and hashmaps which are not pre-allocated.

I'm yet to profile all uses of into_group_map() in ledger/runtime/accounts-db, but my gut feeling for now is that we should move away from it entirely, especially given that the manual alternative I'm replacing it with is basically this few-liner:

let mut grouped = HashMap::with_capacity_and_hasher(cap, hasher);
for (key, value) in input {
    grouped_slot_leaders
        .entry(value)
        .and_modify(|keys| {
            keys.push(key);
    })
    .or_insert(vec![key]);
}

Perhaps I should already move it to some common function in solana-perf.

Given the lack of elasticity of itertools (no way of providing capacities and hashers), I think its usage is a footgun. I tried to make it accept custom hashers, but the PR got closed: rust-itertools/itertools#1057.

kskalski · 2025-10-16T10:56:37Z

ledger/src/leader_schedule/identity_keyed.rs

+                    let slots = Arc::get_mut(slots).expect("should be the only reference");
+                    slots.push(slot)
+                })
+                .or_insert(Arc::new(vec![slot]));


could also experiment with smallvec, with just a slot value you could keep like 2-3 elements inline without allocating

Good idea. Or maybe even arrayvec.

in the profile, I think you see grow taking a lot of time, so if anything you should allocate with higher capacity?

Oh it's Vec<usize>, yeah use arrayvec

Actually we we can't use arrayvec. The amount of non-repeating slots we store for each leader looks like:

curl https://api.mainnet-beta.solana.com -X POST -H "Content-Type: application/json" -d '{"jsonrpc":"2.0", "id":1, "method":"getLeaderSchedule", "params":[null, {"commitment":"finalized"}]}' | jq -r '.result | to_entries | map({identity: .key, slots: ((.value | length)/4)}) | sort_by(.slots) | reverse[] | "\(.slots)\t\(.identity)"' | save -r leader_slots.txt

leader_slots.txt

I'm dividing by 4, because of recently merged #9126.

The top 17 validators have more than 1k non-repeating slots:

3630 HEL1USMZKAL2odpNBj2oCjffnFGaYwmbGmyewGv1e2TU 3438 Fd7btgySsrjuo25CJCj7oE7VPMyezDhnx7pZkj2v69Nk 3412 DRpbCBMxVnDK7maPM5tGv6MvB3v1sRMC86PZ8okm21hy 3142 JupmVLmA8RoyTUbTMMuTtoPWHEiNQobxgTeGTrPNkzT 2139 q9XWcZ7T1wP4bW9SB4XgNNwjnFEJ982nE8aVbbNuwot 1906 EvnRmnMrd69kFdbLMxWkTn1icZ7DCceRhvmb2SJXqDo4 1801 DtdSSG8ZJRZVv5Jx7K1MeWp7Zxcu19GD5wQRGRpQ9uMF 1793 E1r4Psq84tHfQ6aPTvvDka4U3u8zPVD7gEUrH25RdxHL 1764 JD549HsbJHeEKKUrKgg4Fj2iyv2RGjsV7NTZjZUrHybB 1708 Awes4Tr6TX8JDzEhCZY2QVNimT6iD1zWHzf1vNyGvpLM 1626 5pPRHniefFjkiaArbGX3Y8NUysJmQ9tMZg3FrFGwHzSm 1594 CAo1dCGYrB6NhHh5xb1cGjUiu86iyCfMTENxgHumSve4 1289 9jxgosAfHgHzwnxsHw4RAZYaLVokMbnYtmiZBreynGFP 1269 5Cchr1XGEg7dbBXByV5NY2ad8jfxAM7HA3x8D56rq9Ux 1173 9rkJMARqK6VBkcxGfKBAwnA44gPAfGxPbPsfsggFNDSQ 1032 FBKFWadXZJahGtFitAsBvbqh5968gLY7dMBBJUoUjeNi 1015 BkoS26vBuaXnSowACdChi4WKid8UwmuPNhEJWa8KsLHd

That's way too much for arrayvec, we would blow up the stack, since we would need to go with something like ArrayVec<Slot, 4096> and have 1k of them. Then there is always a risk we have to increase it if the dominance of top validators increases.

But also, the last ~500 validators have less than 64 non-repeating slots, and last ~300 validators less than 32, ~70 validators that have less than 10. I'm not even sure if we gain anything from smallvec in that case.

Anyways, can we think of smallvec separately, outside of this PR? First of all, the open question is how large arrays, and how many of them. can we keep on stack. The choice of what size of smallvec we pick (SmallVec<[Slot; 10]>, SmallVec<[Slot; 32]>, ``SmallVec<[Slot; 64]>` etc.) depends on that. Then the other question is if it brings any visible perf improvement.

ledger/src/leader_schedule/identity_keyed.rs

HaoranYi · 2025-10-16T16:22:57Z

We should probably review our uses of itertools across the codebase because many of the utility methods create hashsets and hashmaps which are not pre-allocated.

Good point — applying the same pattern to staked_nodes in stake_cache gave us a 3–4× speed-up as well: #8516

brooksprumo

Worth reviving this one?

leader-schedule/src/vote_keyed.rs

mergify · 2026-03-17T17:59:01Z

If this PR represents a change to the public RPC API:

Make sure it includes a complementary update to rpc-client/ (example)
Open a follow-up PR to update the JavaScript client @solana/kit (example)

Thank you for keeping the RPC clients in sync with the server API @vadorovsky.

vadorovsky

Yes, sorry for letting it stale!

ledger/src/leader_schedule/identity_keyed.rs

leader-schedule/src/vote_keyed.rs

Building them with `itertools::into_group_map` takes 20.9ms. Building them manually with a pre-allocated hash map, using pubkey hasher, takes 5.2ms.

kskalski · 2026-03-25T11:56:42Z

LGTM

leader-schedule/src/vote_keyed.rs

leader-schedule/Cargo.toml

leader-schedule/src/vote_keyed.rs

It works even with a custom hasher.

In the context of enumeration in leader schedule, `index` is a more appropriate name.

brooksprumo

vadorovsky force-pushed the compute-epoch-schedule-group-map branch 2 times, most recently from e42e215 to cb8fedb Compare October 13, 2025 15:14

vadorovsky force-pushed the compute-epoch-schedule-group-map branch 2 times, most recently from 37b32a9 to b758fe5 Compare October 14, 2025 07:07

vadorovsky marked this pull request as ready for review October 14, 2025 07:24

vadorovsky requested review from HaoranYi and brooksprumo October 14, 2025 07:24

HaoranYi reviewed Oct 14, 2025

View reviewed changes

ledger/src/leader_schedule/identity_keyed.rs Outdated Show resolved Hide resolved

HaoranYi previously approved these changes Oct 15, 2025

View reviewed changes

vadorovsky requested review from alessandrod and jstarry October 15, 2025 15:08

brooksprumo removed their request for review October 15, 2025 16:14

jstarry previously approved these changes Oct 16, 2025

View reviewed changes

kskalski reviewed Oct 16, 2025

View reviewed changes

jstarry reviewed Oct 16, 2025

View reviewed changes

ledger/src/leader_schedule/identity_keyed.rs Outdated Show resolved Hide resolved

vadorovsky mentioned this pull request Oct 17, 2025

Remove Arc from leader_slots_map in LeaderSchedule #8499

Merged

vadorovsky dismissed stale reviews from jstarry and HaoranYi via 80b2e04 October 21, 2025 08:49

vadorovsky force-pushed the compute-epoch-schedule-group-map branch from cd7b5b6 to 80b2e04 Compare October 21, 2025 08:49

vadorovsky force-pushed the compute-epoch-schedule-group-map branch from 80b2e04 to 3c4c8de Compare February 9, 2026 14:26

brooksprumo reviewed Mar 6, 2026

View reviewed changes

leader-schedule/src/vote_keyed.rs Outdated Show resolved Hide resolved

leader-schedule/src/vote_keyed.rs Show resolved Hide resolved

leader-schedule/src/vote_keyed.rs Outdated Show resolved Hide resolved

vadorovsky mentioned this pull request Mar 16, 2026

ledger: Don't keep repeated slots in the leader schedule #9126

Merged

vadorovsky force-pushed the compute-epoch-schedule-group-map branch from 3c4c8de to a4f2f46 Compare March 17, 2026 17:58

vadorovsky commented Mar 17, 2026

View reviewed changes

kskalski reviewed Mar 18, 2026

View reviewed changes

leader-schedule/src/vote_keyed.rs Outdated Show resolved Hide resolved

vadorovsky force-pushed the compute-epoch-schedule-group-map branch 2 times, most recently from 798b7de to d6416d8 Compare March 25, 2026 11:32

ledger: Build the grouped slot leaders manually

bf965a7

Building them with `itertools::into_group_map` takes 20.9ms. Building them manually with a pre-allocated hash map, using pubkey hasher, takes 5.2ms.

vadorovsky force-pushed the compute-epoch-schedule-group-map branch from d6416d8 to bf965a7 Compare March 25, 2026 11:33

brooksprumo reviewed Mar 25, 2026

View reviewed changes

leader-schedule/src/vote_keyed.rs Outdated Show resolved Hide resolved

leader-schedule/src/vote_keyed.rs Outdated Show resolved Hide resolved

leader-schedule/Cargo.toml Show resolved Hide resolved

leader-schedule/src/vote_keyed.rs Outdated Show resolved Hide resolved

vadorovsky added 3 commits March 25, 2026 14:53

leader-schedule: Use HashMap::default()

167615f

It works even with a custom hasher.

leader-schedule: Change &mut Vec<usize> to &mut Vec<_>

fc1c4a2

leader-schedule: Rename slot to index where appropriate

34d5d1a

In the context of enumeration in leader schedule, `index` is a more appropriate name.

vadorovsky requested a review from brooksprumo March 25, 2026 14:01

brooksprumo approved these changes Mar 25, 2026

View reviewed changes

vadorovsky added this pull request to the merge queue Mar 25, 2026

Merged via the queue into anza-xyz:master with commit 5bc7ba4 Mar 25, 2026
62 checks passed

vadorovsky deleted the compute-epoch-schedule-group-map branch March 25, 2026 14:39

Conversation

vadorovsky commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of Changes

Uh oh!

codecov-commenter commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

HaoranYi commented Oct 14, 2025

Uh oh!

HaoranYi left a comment

Choose a reason for hiding this comment

Uh oh!

brooksprumo commented Oct 15, 2025

Uh oh!

jstarry commented Oct 16, 2025

Uh oh!

vadorovsky commented Oct 16, 2025

Uh oh!

kskalski Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

alessandrod Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

alessandrod Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

vadorovsky Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HaoranYi commented Oct 16, 2025

Uh oh!

brooksprumo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Mar 17, 2026

Uh oh!

vadorovsky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kskalski commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brooksprumo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

vadorovsky commented Oct 13, 2025 •

edited

Loading

codecov-commenter commented Oct 13, 2025 •

edited

Loading