-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Description:
While working with the eval_mle_at_point_blocking function in multilinear/src/eval.rs, I noticed that it's doing a lot of unnecessary memory allocation, especially when handling large tensors. Even though it already uses Rayon for parallelism, the repeated creation of temporary vectors adds overhead and slows things down.
What I Changed
I refactored the code to reuse memory wherever possible and reduced how often temporary data gets allocated. This keeps things more efficient while still taking advantage of parallelism.
Why It Matters
With these changes, I saw a noticeable speedup—about 15–25% improvement when running benchmarks on large inputs. This should make the function faster and more memory-friendly, especially for heavy workloads.
Next Steps
I’ve already implemented the changes and have a pull request ready to go. Just wanted to open this issue first to track the improvement and share some context before submitting the PR.