-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simd improvement #1190
Simd improvement #1190
Conversation
laurentcau
commented
Dec 22, 2023
- Add simd aligned_vec3 (and sse aligned_dvec3 - 2 x xmm)
- Fast packed_vec3 <=> aligned_vec3 and packed_vec4 <=> aligned_vec4 conversion
- Fast aligned_vec3 <=> aligned_vec4 conversion
- Optimized aligned_mat x aligned_mat and aligned_mat x aligned_vec
- Inverse aligned_mat3 simd version (actually slower than ssid on my computer even it has 30% less instruction ?)
8882aaf
to
9143125
Compare
Hi Laurent, Could you rebase with master branch? Not sure why C.I. wasn't triggered but hopefully that will trigger it. I created #1191 to trigger C.I. and realized the conflicts. |
1702dc2
to
b03ab62
Compare
I don't remember exactly the deal with FMA and AVX2, only that FMA3 and FMA4 was a mess... My recollection is that this was unified in AVX2 so at least it's easy to use from here. If we want to add a dedicated FMA flag, then clarification on whether it's FMA4 or FMA3 should be made and on a dedicated PR. |
I created #1212 to run C.I. ... I still didn't manage to figure out how to get external PR to trigger C.I. :/ |
I have builds running on my account every time I push to fix compilation but nothing appears in the PR itself. |
Issue is that how FMA is handled in the current PR is a bit rough. Consider, for example, how the issue noted above may impact people updating this library. Also, few issues I have hit so far: #include <stdio.h>
#include <glm/glm.hpp>
// g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0
//
// g++ -DGLM_FORCE_INTRINSICS ...
// error: ‘call’ is not a member of ‘glm::detail::compute_vec_shift_right<3, long long unsigned int, glm::aligned_highp, 0, 64, true>’
void issue_shift(void) {
using uvec3 = glm::vec<3, unsigned long long, glm::aligned_highp>;
uvec3 b = glm::bitfieldExtract(uvec3{1, 2, 3}, 0, 1);
printf("%lld %lld %lld\n", b.x, b.y, b.z);
}
// g++ -O3 -Wall -DGLM_FORCE_INTRINSICS -DGLM_FORCE_INLINE ...
// warning: array subscript 3 is outside array bounds of ‘glm::mat<3, 3, float, glm::aligned_highp> [1]’ [-Warray-bounds=]
void issue_transpose(void) {
glm::mat<3, 3, float, glm::aligned_highp> m{0};
glm::mat<3, 3, float, glm::aligned_highp> t = glm::transpose(m);
printf("%f %f %f\n", t[0][0], t[1][1], t[2][2]);
}
int main (void) {
issue_shift();
issue_transpose();
return 0;
} In the shifting case |
Oh so it's not an AVX2 instruction but something additional. Ok this is breaking the AVX2 code path then and we need an additional flag which clarify what FMA we are talking about FMA3 or FMA4? |
if(NOT GLM_DISABLE_AUTO_DETECTION) | ||
add_compile_options(-Werror -Weverything) | ||
endif() | ||
if (NOT APPLE) | ||
add_compile_options(-Wno-unsafe-buffer-usage) #this option makes MacOS compilation fail but prevent error on windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some doubt about this too. Were this warning was trigger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many places this warning is triggered when you compile on windows / visual studio 2022 / clang / ARM.
It's not generate by modifications of this PR, they are also triggered on the master branch, so I decided to ignore these warning to be able to compile ARM version.
I'm not sure there is an easy fix for this. For example, this code generate one on this->value[i]:
template<typename T, qualifier Q>
GLM_FUNC_QUALIFIER GLM_CONSTEXPR typename mat<4, 4, T, Q>::col_type const& mat<4, 4, T, Q>::operator[](typename mat<4, 4, T, Q>::length_type i) const GLM_NOEXCEPT
{
assert(i < this->length());
return this->value[i];
}
If that possible, this configuration should be also added in continuous integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: -Wno-unsafe-buffer-usage this option is not recognized on all clang version (at least on MacOS) that's why i used a if (NOT APPLE). I don't know if there is a better way to achieve this.
cb3e837
to
2c60db4
Compare
All CI tests are green on my side: |
Few more issues: #include <stdio.h>
#include <glm/glm.hpp>
//└> g++ --version
// g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0
// g++ -DGLM_FORCE_INTRINSICS ...
// error: ‘call’ is not a member of ‘glm::detail::compute_vec_xor<4, unsigned int, glm::aligned_highp, -1, 32, true>’
void issue_xor(void) {
using uvec4 = glm::vec<4, unsigned int, glm::aligned_highp>;
uvec4 b = uvec4{1, 2, 3, 4} ^ 4u;
printf("%d %d %d\n", b.x, b.y, b.z);
}
// g++ -O3 -Wall -DGLM_FORCE_INTRINSICS -DGLM_FORCE_INLINE ...
// matrix.h:173:37: warning: array subscript 3 is outside array bounds of ‘glm::mat<3, 3, float, glm::aligned_highp> [1]’ [-Warray-bounds=]
void issue_transpose(void) {
glm::mat<3, 3, float, glm::aligned_highp> m{0};
glm::mat<3, 3, float, glm::aligned_highp> t = glm::transpose(m);
printf("%f %f %f\n", t[0][0], t[1][1], t[2][2]);
} |
- Add simd aligned_vec3 (and sse aligned_dvec3 - 2 x xmm) - Fast packed_vec3 <=> aligned_vec3 and packed_vec4 <=> aligned_vec4 conversion - Fast aligned_vec3 <=> aligned_vec4 conversion - Optimized aligned_mat x aligned_mat and aligned_mat x aligned_vec - Inverse aligned_mat3 simd version (actually slower than ssid on my computer even it has 30% less instruction ?)
I pulled the branched and fixed some warnings : The tests looks like they are going to be green to the changes are going to be merge soon if you want to have a last look! :D |