Simd improvement #1190

laurentcau · 2023-12-22T16:40:02Z

Add simd aligned_vec3 (and sse aligned_dvec3 - 2 x xmm)
Fast packed_vec3 <=> aligned_vec3 and packed_vec4 <=> aligned_vec4 conversion
Fast aligned_vec3 <=> aligned_vec4 conversion
Optimized aligned_mat x aligned_mat and aligned_mat x aligned_vec
Inverse aligned_mat3 simd version (actually slower than ssid on my computer even it has 30% less instruction ?)

christophe-lunarg · 2023-12-23T09:03:27Z

Hi Laurent,

Could you rebase with master branch? Not sure why C.I. wasn't triggered but hopefully that will trigger it. I created #1191 to trigger C.I. and realized the conflicts.

gottfriedleibniz · 2024-01-03T14:24:20Z

Are we okay forcing FMA when compiling w/ AVX2?

Should, or can, this be moved to a CMake or setup.hpp flag (e.g., GLM_FORCE_FMA)? Unfortunately not all compilers define __FMA__ and issues emerge elsewhere (in addition to arguments about FMA itself).

christophe-lunarg · 2024-01-03T14:39:17Z

Are we okay forcing FMA when compiling w/ AVX2?

Should, or can, this be moved to a CMake or setup.hpp flag (e.g., GLM_FORCE_FMA)? Unfortunately not all compilers define __FMA__ and issues emerge elsewhere (in addition to arguments about FMA itself).

I don't remember exactly the deal with FMA and AVX2, only that FMA3 and FMA4 was a mess... My recollection is that this was unified in AVX2 so at least it's easy to use from here.

If we want to add a dedicated FMA flag, then clarification on whether it's FMA4 or FMA3 should be made and on a dedicated PR.

christophe-lunarg · 2024-01-03T14:57:21Z

I created #1212 to run C.I. ... I still didn't manage to figure out how to get external PR to trigger C.I. :/

laurentcau · 2024-01-03T15:01:26Z

I have builds running on my account every time I push to fix compilation but nothing appears in the PR itself.
example: https://github.com/laurentcau/glm/actions/runs/7398785977

gottfriedleibniz · 2024-01-03T15:46:42Z

If we want to add a dedicated FMA flag, then clarification on whether it's FMA4 or FMA3 should be made and on a dedicated PR.

Issue is that how FMA is handled in the current PR is a bit rough. Consider, for example, how the issue noted above may impact people updating this library.

Also, few issues I have hit so far:

#include <stdio.h>
#include <glm/glm.hpp>

// g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0
//
// g++ -DGLM_FORCE_INTRINSICS ...
// error: ‘call’ is not a member of ‘glm::detail::compute_vec_shift_right<3, long long unsigned int, glm::aligned_highp, 0, 64, true>’
void issue_shift(void) {
    using uvec3 = glm::vec<3, unsigned long long, glm::aligned_highp>;
    uvec3 b = glm::bitfieldExtract(uvec3{1, 2, 3}, 0, 1);
    printf("%lld %lld %lld\n", b.x, b.y, b.z);
}

// g++ -O3 -Wall -DGLM_FORCE_INTRINSICS -DGLM_FORCE_INLINE ...
// warning: array subscript 3 is outside array bounds of ‘glm::mat<3, 3, float, glm::aligned_highp> [1]’ [-Warray-bounds=]
void issue_transpose(void) {
    glm::mat<3, 3, float, glm::aligned_highp> m{0};
    glm::mat<3, 3, float, glm::aligned_highp> t = glm::transpose(m);
    printf("%f %f %f\n", t[0][0], t[1][1], t[2][2]);
}

int main (void) {
    issue_shift();
    issue_transpose();
    return 0;
}

In the shifting case compute_vec_or will cause similar issues.

christophe-lunarg · 2024-01-03T16:40:30Z

Oh so it's not an AVX2 instruction but something additional. Ok this is breaking the AVX2 code path then and we need an additional flag which clarify what FMA we are talking about FMA3 or FMA4?

test/CMakeLists.txt

christophe-lunarg · 2024-01-03T16:44:34Z

test/CMakeLists.txt

 	if(NOT GLM_DISABLE_AUTO_DETECTION)
 		add_compile_options(-Werror -Weverything)
 	endif()
+	if (NOT APPLE)
+		add_compile_options(-Wno-unsafe-buffer-usage) #this option makes MacOS compilation fail but prevent error on windows


I have some doubt about this too. Were this warning was trigger?

There are many places this warning is triggered when you compile on windows / visual studio 2022 / clang / ARM.
It's not generate by modifications of this PR, they are also triggered on the master branch, so I decided to ignore these warning to be able to compile ARM version.
I'm not sure there is an easy fix for this. For example, this code generate one on this->value[i]:

template<typename T, qualifier Q> GLM_FUNC_QUALIFIER GLM_CONSTEXPR typename mat<4, 4, T, Q>::col_type const& mat<4, 4, T, Q>::operator[](typename mat<4, 4, T, Q>::length_type i) const GLM_NOEXCEPT { assert(i < this->length()); return this->value[i]; }

If that possible, this configuration should be also added in continuous integration.

note: -Wno-unsafe-buffer-usage this option is not recognized on all clang version (at least on MacOS) that's why i used a if (NOT APPLE). I don't know if there is a better way to achieve this.

test/perf/perf_vector_mul_matrix.cpp

glm/gtx/vec_swizzle.hpp

laurentcau · 2024-01-04T13:39:06Z

All CI tests are green on my side:
https://github.com/laurentcau/glm/actions/runs/7409137272

gottfriedleibniz · 2024-01-04T14:43:52Z

Few more issues:

#include <stdio.h>
#include <glm/glm.hpp>

//└> g++ --version
// g++ (Ubuntu 13.2.0-4ubuntu3) 13.2.0

// g++ -DGLM_FORCE_INTRINSICS ...
// error: ‘call’ is not a member of ‘glm::detail::compute_vec_xor<4, unsigned int, glm::aligned_highp, -1, 32, true>’
void issue_xor(void) {
    using uvec4 = glm::vec<4, unsigned int, glm::aligned_highp>;
    uvec4 b = uvec4{1, 2, 3, 4} ^ 4u;
    printf("%d %d %d\n", b.x, b.y, b.z);
}

// g++ -O3 -Wall -DGLM_FORCE_INTRINSICS -DGLM_FORCE_INLINE ...
// matrix.h:173:37: warning: array subscript 3 is outside array bounds of ‘glm::mat<3, 3, float, glm::aligned_highp> [1]’ [-Warray-bounds=]
void issue_transpose(void) {
    glm::mat<3, 3, float, glm::aligned_highp> m{0};
    glm::mat<3, 3, float, glm::aligned_highp> t = glm::transpose(m);
    printf("%f %f %f\n", t[0][0], t[1][1], t[2][2]);
}

- Add simd aligned_vec3 (and sse aligned_dvec3 - 2 x xmm) - Fast packed_vec3 <=> aligned_vec3 and packed_vec4 <=> aligned_vec4 conversion - Fast aligned_vec3 <=> aligned_vec4 conversion - Optimized aligned_mat x aligned_mat and aligned_mat x aligned_vec - Inverse aligned_mat3 simd version (actually slower than ssid on my computer even it has 30% less instruction ?)

christophe-lunarg · 2024-02-08T01:23:10Z

I pulled the branched and fixed some warnings :
#1245

The tests looks like they are going to be green to the changes are going to be merge soon if you want to have a last look! :D

laurentcau force-pushed the b5 branch 7 times, most recently from 8882aaf to 9143125 Compare December 22, 2023 18:34

laurentcau force-pushed the b5 branch 6 times, most recently from 1702dc2 to b03ab62 Compare January 3, 2024 14:14

laurentcau force-pushed the b5 branch from b03ab62 to 0f9b9c8 Compare January 3, 2024 14:54

christophe-lunarg requested changes Jan 3, 2024

View reviewed changes

test/CMakeLists.txt Outdated Show resolved Hide resolved

christophe-lunarg requested changes Jan 3, 2024

View reviewed changes

laurentcau force-pushed the b5 branch 2 times, most recently from cb3e837 to 2c60db4 Compare January 4, 2024 11:52

laurentcau force-pushed the b5 branch from 2c60db4 to 3b37bf5 Compare January 4, 2024 15:13

laurentcau force-pushed the b5 branch from 3b37bf5 to 5c1da05 Compare January 4, 2024 15:37

laurentcau requested a review from christophe-lunarg January 5, 2024 10:43

christophe-lunarg closed this Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simd improvement #1190

Simd improvement #1190

laurentcau commented Dec 22, 2023

christophe-lunarg commented Dec 23, 2023

gottfriedleibniz commented Jan 3, 2024 •

edited

Loading

christophe-lunarg commented Jan 3, 2024

christophe-lunarg commented Jan 3, 2024

laurentcau commented Jan 3, 2024

gottfriedleibniz commented Jan 3, 2024 •

edited

Loading

christophe-lunarg commented Jan 3, 2024

christophe-lunarg Jan 3, 2024

laurentcau Jan 4, 2024

laurentcau Jan 4, 2024

laurentcau commented Jan 4, 2024

gottfriedleibniz commented Jan 4, 2024

christophe-lunarg commented Feb 8, 2024

Simd improvement #1190

Simd improvement #1190

Conversation

laurentcau commented Dec 22, 2023

christophe-lunarg commented Dec 23, 2023

gottfriedleibniz commented Jan 3, 2024 • edited Loading

christophe-lunarg commented Jan 3, 2024

christophe-lunarg commented Jan 3, 2024

laurentcau commented Jan 3, 2024

gottfriedleibniz commented Jan 3, 2024 • edited Loading

christophe-lunarg commented Jan 3, 2024

christophe-lunarg Jan 3, 2024

Choose a reason for hiding this comment

laurentcau Jan 4, 2024

Choose a reason for hiding this comment

laurentcau Jan 4, 2024

Choose a reason for hiding this comment

laurentcau commented Jan 4, 2024

gottfriedleibniz commented Jan 4, 2024

christophe-lunarg commented Feb 8, 2024

gottfriedleibniz commented Jan 3, 2024 •

edited

Loading

gottfriedleibniz commented Jan 3, 2024 •

edited

Loading