Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simd improvement #1190

Closed
wants to merge 1 commit into from
Closed

Simd improvement #1190

wants to merge 1 commit into from

Conversation

laurentcau
Copy link

  • Add simd aligned_vec3 (and sse aligned_dvec3 - 2 x xmm)
  • Fast packed_vec3 <=> aligned_vec3 and packed_vec4 <=> aligned_vec4 conversion
  • Fast aligned_vec3 <=> aligned_vec4 conversion
  • Optimized aligned_mat x aligned_mat and aligned_mat x aligned_vec
  • Inverse aligned_mat3 simd version (actually slower than ssid on my computer even it has 30% less instruction ?)

@laurentcau laurentcau force-pushed the b5 branch 7 times, most recently from 8882aaf to 9143125 Compare December 22, 2023 18:34
@christophe-lunarg
Copy link

Hi Laurent,

Could you rebase with master branch? Not sure why C.I. wasn't triggered but hopefully that will trigger it. I created #1191 to trigger C.I. and realized the conflicts.

@laurentcau laurentcau force-pushed the b5 branch 6 times, most recently from 1702dc2 to b03ab62 Compare January 3, 2024 14:14
@gottfriedleibniz
Copy link

gottfriedleibniz commented Jan 3, 2024

Are we okay forcing FMA when compiling w/ AVX2?

Should, or can, this be moved to a CMake or setup.hpp flag (e.g., GLM_FORCE_FMA)? Unfortunately not all compilers define __FMA__ and issues emerge elsewhere (in addition to arguments about FMA itself).

@christophe-lunarg
Copy link

Are we okay forcing FMA when compiling w/ AVX2?

Should, or can, this be moved to a CMake or setup.hpp flag (e.g., GLM_FORCE_FMA)? Unfortunately not all compilers define __FMA__ and issues emerge elsewhere (in addition to arguments about FMA itself).

I don't remember exactly the deal with FMA and AVX2, only that FMA3 and FMA4 was a mess... My recollection is that this was unified in AVX2 so at least it's easy to use from here.

If we want to add a dedicated FMA flag, then clarification on whether it's FMA4 or FMA3 should be made and on a dedicated PR.

@christophe-lunarg
Copy link

I created #1212 to run C.I. ... I still didn't manage to figure out how to get external PR to trigger C.I. :/

@laurentcau
Copy link
Author

I have builds running on my account every time I push to fix compilation but nothing appears in the PR itself.
example: https://github.com/laurentcau/glm/actions/runs/7398785977

@gottfriedleibniz
Copy link

gottfriedleibniz commented Jan 3, 2024

If we want to add a dedicated FMA flag, then clarification on whether it's FMA4 or FMA3 should be made and on a dedicated PR.

Issue is that how FMA is handled in the current PR is a bit rough. Consider, for example, how the issue noted above may impact people updating this library.

Also, few issues I have hit so far:

#include <stdio.h>
#include <glm/glm.hpp>

// g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0
//
// g++ -DGLM_FORCE_INTRINSICS ...
// error: ‘call’ is not a member of ‘glm::detail::compute_vec_shift_right<3, long long unsigned int, glm::aligned_highp, 0, 64, true>’
void issue_shift(void) {
    using uvec3 = glm::vec<3, unsigned long long, glm::aligned_highp>;
    uvec3 b = glm::bitfieldExtract(uvec3{1, 2, 3}, 0, 1);
    printf("%lld %lld %lld\n", b.x, b.y, b.z);
}

// g++ -O3 -Wall -DGLM_FORCE_INTRINSICS -DGLM_FORCE_INLINE ...
// warning: array subscript 3 is outside array bounds of ‘glm::mat<3, 3, float, glm::aligned_highp> [1]’ [-Warray-bounds=]
void issue_transpose(void) {
    glm::mat<3, 3, float, glm::aligned_highp> m{0};
    glm::mat<3, 3, float, glm::aligned_highp> t = glm::transpose(m);
    printf("%f %f %f\n", t[0][0], t[1][1], t[2][2]);
}

int main (void) {
    issue_shift();
    issue_transpose();
    return 0;
}

In the shifting case compute_vec_or will cause similar issues.

@christophe-lunarg
Copy link

Oh so it's not an AVX2 instruction but something additional. Ok this is breaking the AVX2 code path then and we need an additional flag which clarify what FMA we are talking about FMA3 or FMA4?

if(NOT GLM_DISABLE_AUTO_DETECTION)
add_compile_options(-Werror -Weverything)
endif()
if (NOT APPLE)
add_compile_options(-Wno-unsafe-buffer-usage) #this option makes MacOS compilation fail but prevent error on windows

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubt about this too. Were this warning was trigger?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many places this warning is triggered when you compile on windows / visual studio 2022 / clang / ARM.
It's not generate by modifications of this PR, they are also triggered on the master branch, so I decided to ignore these warning to be able to compile ARM version.
I'm not sure there is an easy fix for this. For example, this code generate one on this->value[i]:

	template<typename T, qualifier Q>
	GLM_FUNC_QUALIFIER GLM_CONSTEXPR typename mat<4, 4, T, Q>::col_type const& mat<4, 4, T, Q>::operator[](typename mat<4, 4, T, Q>::length_type i) const GLM_NOEXCEPT
	{
		assert(i < this->length());
		return this->value[i];
	}

If that possible, this configuration should be also added in continuous integration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: -Wno-unsafe-buffer-usage this option is not recognized on all clang version (at least on MacOS) that's why i used a if (NOT APPLE). I don't know if there is a better way to achieve this.

@laurentcau laurentcau force-pushed the b5 branch 2 times, most recently from cb3e837 to 2c60db4 Compare January 4, 2024 11:52
@laurentcau
Copy link
Author

All CI tests are green on my side:
https://github.com/laurentcau/glm/actions/runs/7409137272

@gottfriedleibniz
Copy link

Few more issues:

#include <stdio.h>
#include <glm/glm.hpp>

//└> g++ --version
// g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0

// g++ -DGLM_FORCE_INTRINSICS ...
// error: ‘call’ is not a member of ‘glm::detail::compute_vec_xor<4, unsigned int, glm::aligned_highp, -1, 32, true>’
void issue_xor(void) {
    using uvec4 = glm::vec<4, unsigned int, glm::aligned_highp>;
    uvec4 b = uvec4{1, 2, 3, 4} ^ 4u;
    printf("%d %d %d\n", b.x, b.y, b.z);
}

// g++ -O3 -Wall -DGLM_FORCE_INTRINSICS -DGLM_FORCE_INLINE ...
// matrix.h:173:37: warning: array subscript 3 is outside array bounds of ‘glm::mat<3, 3, float, glm::aligned_highp> [1]’ [-Warray-bounds=]
void issue_transpose(void) {
    glm::mat<3, 3, float, glm::aligned_highp> m{0};
    glm::mat<3, 3, float, glm::aligned_highp> t = glm::transpose(m);
    printf("%f %f %f\n", t[0][0], t[1][1], t[2][2]);
}

- Add simd aligned_vec3 (and sse aligned_dvec3 - 2 x xmm)
- Fast packed_vec3 <=> aligned_vec3 and packed_vec4 <=> aligned_vec4 conversion
- Fast aligned_vec3 <=> aligned_vec4 conversion
- Optimized aligned_mat x aligned_mat and aligned_mat x aligned_vec
- Inverse aligned_mat3 simd version (actually slower than ssid on my computer even it has 30% less instruction ?)
@christophe-lunarg
Copy link

I pulled the branched and fixed some warnings :
#1245

The tests looks like they are going to be green to the changes are going to be merge soon if you want to have a last look! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants