Skip to content

AVX-256/AVX-512 support? #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alexandergunnarson opened this issue Mar 5, 2025 · 4 comments
Open

AVX-256/AVX-512 support? #33

alexandergunnarson opened this issue Mar 5, 2025 · 4 comments

Comments

@alexandergunnarson
Copy link

I'm looking to use gtl::parallel_flat_hash_map. Forgive my ignorance, as most of my experience has been in the JVM world and I'm still learning about vector intrinsics.

I was curious why there's SSE2/SSE3 support (up to 128-bit vectorization, IIRC), but no AVX-256/AVX-512 support. I know there's the AVX-512 downclocking phenomenon, so perhaps potential performance gains are offset by that in this context. However AFAIK AVX-256 doesn't suffer from this. I'm sure you have a rationale for not using 256-bit/512-bit vector instructions, but curious to know 1) whether they would actually speed up the implementation and 2) if so, why they're not being used.

Thanks! Looks like an amazing project.

@greg7mdp
Copy link
Owner

greg7mdp commented Mar 7, 2025

Hi @alexandergunnarson , thanks for using gtl and for the interesting question.
To be honest, I have not looked in great detail at AVX-256/AVX-512 support, so I can really answer whether this. Still, I doubt that in real world use, you would find that this parallel probing ends up being a bottleneck of your application or even of the hash map lookups (usually these are constrained by the speed of memory reads).

@alexandergunnarson
Copy link
Author

Thanks for your thoughtful reply! Curious if it’s simple to experiment with adding the instructions by using some AVX-specific C header library and ifdef-ing where the SSE3 instructions are currently being used.

Also, shamelessly, I’m wondering whether there are any hash map implementations even more performant than gtl’s. I haven’t found any yet — incredible work! 🎉 — but I’m always on the hunt ;)

@greg7mdp
Copy link
Owner

greg7mdp commented Mar 7, 2025

Sure, here is the implementation for both the SSE2 and portable C++ versions. You could definitely experiment with adding a AVX version.

Boost since version 1.82 also has a good unordered_flat_map and concurrent_flat_map implementation. If you compare it with gtl, I'd be interested to hear your thoughts.

@alexandergunnarson
Copy link
Author

Thanks @greg7mdp 🙌 Will take a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants