You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SIMD implementation is slower than NoParallelization across the board. I don't want to jump to conclusions and blame LoopVectorization here; I probably have written my loop wrong.
Steps To Reproduce
This happens on all benchmarks (in the README).
Expected behavior
SIMD should be faster than NoParallelization. My "gold standard" (i.e. the fastest histogram implementation I can find so far) is AHTL. I would like to match AHTL's performance as much as possible.
As of writing, FastHistograms (without SIMD) needs 2.57596e8 ns to do what AHTL can do in 1.08249e8 ns.
Describe the bug
The SIMD implementation is slower than NoParallelization across the board. I don't want to jump to conclusions and blame LoopVectorization here; I probably have written my loop wrong.
Steps To Reproduce
This happens on all benchmarks (in the README).
Expected behavior
SIMD should be faster than NoParallelization. My "gold standard" (i.e. the fastest histogram implementation I can find so far) is AHTL. I would like to match AHTL's performance as much as possible.
As of writing, FastHistograms (without SIMD) needs
2.57596e8
ns to do what AHTL can do in1.08249e8
ns.Additional Information
Relevant code in AHTL: https://github.com/pcjung/AHTL/blob/master/src/fixed.cpp
Also, debatably I should remove SIMD from this package (for now) because it's slower.
The text was updated successfully, but these errors were encountered: