-
-
Notifications
You must be signed in to change notification settings - Fork 111
simd: support AVX2 detection without std #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR and the very thorough analysis! I'm definitely a bit hesitant to commit to something like this, in part because it's hard to know whether the code written for the CPUID check is correct or not without spending the time to go and understand everything myself. (Which I don't have time for right now.)
Could you say what this is a stop gap for? In particular, you later say:
So who would actually benefit from this? Is there a concrete use case that motivated this PR in the first place? In my view, if this is just "nice to have," then I'm not sure it's worth doing. Users will get the SSE2 implementation at least, which is also quite fast. |
Yeah, this is about the kind of reply I had expected… The question of "Is it worth it?" and "Can you provide an actual use case?" are definitely the two things I was most unsure about when I posted this PR. The thing about performance is, well, for most cases it doesn't really matter that much. Even without considering In my case, I was writing a In the end, whether something is considered "worth doing" is quite subjective, and it's up to the maintainer to decide what sort of Edit: Oh by the way,
Can we do this to keep the AVX option available at least? Most people probably still won't benefit from this, but it should be a minimal three-lines change. |
Out of curiosity, what is the use case for a
Yeah I think I can get on board with this. Otherwise, I'll probably hold off on this PR. |
I may misunderstand (edit: yes), but I can think of a likely case and an unlikely (or rather uncommon) case:
|
@8573 The former case isn't really applicable here. This library already gives you that. The context that is important here is actually running code without std on x86, as this PR is about making a faster version of memchr available in such scenarios. |
I am bringing in #120 so hopefully that's good enough. Otherwise I don't think the use case is compelling enough here to justify maintaining my own CPUID checks. I'm not 100% opposed to it, but I would need a really compelling use case for it because it looks like a maintenance hazard to me. |
Summary
This PR implements AVX2 runtime feature detection that works in
no_std
using thecpuid
instruction.Motivation
memchr
provides low-level primitives, and it's unfortunate that we cannot make use of the most performant implementation without bringing inlibstd
. And there's no good reason for it, either — unlike other architectures, detecting CPU features on x86 isn't OS dependent (see this comment).The proper solution would be to wait until
is_x86_feature_detected
is available inlibcore
, but that probably won't happen any time soon. In the meantime, I'd like to propose doing it ourselves as a stopgap measure.Drawbacks
This adds an amount of low-level complexity to the crate. Getting information from
cpuid
isn't the simplest thing and involves a lot of details to get right. However, compared to writing complex SIMD routines by hand, I believe it should be relatively manageable.The maintenance burden should be neglectable, since we only care about one feature and the CPUID "protocol" can never change, no reason to touch it in the future.
Rationale and alternatives
Not doing this. The real life use cases for the
no_std
+x86_64
+performance critical
combo aren't the most plentiful. I'm not sure how many people would actually have a need for this, besides "it's nice to have". Boils down to cost-benefit analysis.Use another crate to do the detection for us.
raw-cpuid
supports everything related tocpuid
, and it's quite large as a result. Is the extra compile time worth it?cpufeatures
is much more lightweight, but it's detection logic is actually faulty — same issue as gcc#85100, glibc#13007, etc.Fall back to
#[cfg(target_feature)]
compile-time detection forno_std
.Prior art
getrandom
usescpuid
to detectrdrand
.rdrand
also do this when thestd
feature is disabled.no_std
, which use the previously mentionedcpufeatures
crate for feature detection.target_feature
compile-time detection inno_std
.