-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Simplify and optimize VecCache
's SlotIndex::from_index
#142095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Break out bucket 0 (containing `idx < 4096`) as an early return, which simplifies the remainder of the function, and allows optimizing the `checked_ilog2` since it can no longer return `None`. This reduces the runtime of `vec_cache::tests::slot_index_exhaustive` (which calls `SlotIndex::from_index` for every `u32`, twice) from ~15.5s to ~13.3s.
`slot_index_exhaustive` has additional complexity in its loop that only applies for index 0. Pull that case out of the loop.
r? @SparrowLii rustbot has assigned @SparrowLii. Use |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Simplify and optimize `VecCache`'s `SlotIndex::from_index` Simplify and optimize `SlotIndex::from_index` Break out bucket 0 (containing `idx < 4096`) as an early return, which simplifies the remainder of the function, and allows optimizing the `checked_ilog2` since it can no longer return `None`. This reduces the runtime of `vec_cache::tests::slot_index_exhaustive` (which calls `SlotIndex::from_index` for every `u32`, twice) from ~15.5s to ~13.3s. Separately, simplify the test case as well. (The old and new code passes with the old and new test case.)
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (18aa6ca): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary 1.0%, secondary -5.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (primary -1.6%, secondary -0.8%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 750.971s -> 749.964s (-0.13%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result looks very nice!
@SparrowLii: you gave approval in the GitHub review, do you want to give r+? I'm happy with how the PR looks. |
@bors r+ |
Simplify and optimize
SlotIndex::from_index
Break out bucket 0 (containing
idx < 4096
) as an early return, whichsimplifies the remainder of the function, and allows optimizing the
checked_ilog2
since it can no longer returnNone
.This reduces the runtime of
vec_cache::tests::slot_index_exhaustive
(which calls
SlotIndex::from_index
for everyu32
, twice) from ~15.5sto ~13.3s.
Separately, simplify the test case as well. (The old and new code passes with
the old and new test case.)
Noticed because
slot_index_exhaustive
stood out as taking unusually long compared to other tests, so I started investigating what it was doing.