-
Notifications
You must be signed in to change notification settings - Fork 21
Fedora 40: Compile failure after GCC14 patch #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Did you run the install_deps.script before starting the build? I need to setup Fedora 40 env for testing, so far I have tested with little older fedora versions. (39) |
Yes, I ran the ./install_deps.sh before starting the build.
|
I can confirm the both build breaks on Fedora 40 and will now submit a patch for the first one based on your gentoo bug finding. Need to then test with other distros before applying it. After that I will check the fix for the second build break. |
fix the modify of readonly object detected by gcc_14 - original patch and bugreports 1) #12 2) https://bugs.gentoo.org/918709 3) ROCm/rocm_smi_lib#170 [ 24%] Building CXX object oam/CMakeFiles/oam.dir/__/src/rocm_smi_utils.cc.o In file included from /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/rocm_smi_lib/src/rocm_smi_power_mon.cc:52: /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/rocm_smi_lib/include/rocm_smi/rocm_smi_utils.h: In member function ‘amd::smi::ScopeGuard<lambda>& amd::smi::ScopeGuard<lambda>::operator=(const amd::smi::ScopeGuard<lambda>&)’: /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/rocm_smi_lib/include/rocm_smi/rocm_smi_utils.h:237:18: error: assignment of member ‘dismiss_’ in read-only object 237 | rhs.dismiss_ = true; Thank you for Crizle for bug report and link for the fix patch. Signed-off-by: Mika Laitio <[email protected]>
fix the modify of readonly object detected by gcc_14 - original patch and bugreports 1) #12 2) https://bugs.gentoo.org/918709 3) ROCm/rocm_smi_lib#170 [ 24%] Building CXX object oam/CMakeFiles/oam.dir/__/src/rocm_smi_utils.cc.o In file included from /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/rocm_smi_lib/src/rocm_smi_power_mon.cc:52: /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/rocm_smi_lib/include/rocm_smi/rocm_smi_utils.h: In member function ‘amd::smi::ScopeGuard<lambda>& amd::smi::ScopeGuard<lambda>::operator=(const amd::smi::ScopeGuard<lambda>&)’: /home/lamikr/own/rocm/src/sdk/rocm_sdk_builder_611/src_projects/rocm_smi_lib/include/rocm_smi/rocm_smi_utils.h:237:18: error: assignment of member ‘dismiss_’ in read-only object 237 | rhs.dismiss_ = true; Thank you for Crizle for bug report and link for the fix patch. Signed-off-by: Mika Laitio <[email protected]>
For the second build problem, there seems to be bug report and fix in here: |
Thanks for looking, glad to help the best I can. |
- add missing construct_event_strings.py - add fix from upstream bug open-mpi/ompi#12169 which then triggers the usage of construct_event_strings.py fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
- add missing construct_event_strings.py - add fix from upstream bug open-mpi/ompi#12169 which then triggers the usage of construct_event_strings.py fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
rocm_smi_lib and openmpi builds now ok on Fedora 40, but next failed package is amd_fftw which will also need some parameter typefixes for gcc14.
|
fixes detected on fedora40/gcc 14 build fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
fix patch apply dir in case where same source directory is used by multiple projects. (to build same sources multiple times) fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
fix patch apply dir in case where same source directory is used by multiple projects. (to build same sources multiple times) fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
fixes detected on fedora40/gcc 14 build fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
fixes detected on fedora40/gcc 14 build fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
@Crizle Thanks for help. There are now quite many fixes in place for Fedora 40, but my build is still on going so it's more than likely that there will show up more places where something will break on gcc 14.
|
Been noticing the fixes! All good, I just started the build again with the latest patches and will report back. |
Enable code for exception_ptr only if it is used. Not sure whether this error was caused by optimizion as patch was not needed on Fedora 39, Ubuntu 23.10 or on Mageia 9. Error message was: AMDMIGraphX/src/include/migraphx/par.hpp:58:22: error: no member named 'exception_ptr' in namespace 'std' Fixes: #12 Signed-off-by: Mika Laitio <[email protected]>
Try to delete the src_projects folder of pytorch and execute the init command again, after that apply the patch and the build should work fine, had a similar issue on Arch. |
Thank you, that has built pytorch and the build is now continuing. |
Thanks for confirming! Did the build finish and are the examples in /opt/rocm_sdk_611/docs/examples working? For example in opencl and pytorch directories? |
No problem! Anyway, I think as you mentioned about Deepseed above - all has been built up to that. So seems all great progress! I'll get around to trying the test a bit later or tomorrow and let you know how it went. Here is the last build status:
|
Just quickly tried Pytorch example:
|
fixes issue on: #12 Signed-off-by: Mika Laitio <[email protected]>
fixes issue on: #12 Signed-off-by: Mika Laitio <[email protected]>
One test script was missing that did not work for you, I added it now. You can get it installed by doing
|
Btw, Eitch was also having build probem on DeepSpeed and was able to solve it somehow. Wondering whether you have also same /dev/kfd permission problem. Can you send me the output of:
|
And btw. I have not done the .sh executable script for most of the tests on docs/examples/pytorch dir but you can run them by using python. For example:
|
fedora 40/gcc 14 build failure. Replace clamp with min and max. pytorch/pytorch#127666 #12 Signed-off-by: Mika Laitio <[email protected]>
Hi Lamikr, I don't think I have that same issue, seems the permissions are correct?
|
I'll post the results of most of the tests(that seem to work!):
|
test_torch_migraphx_resnet50.py
|
./run_pytorch_gpu_simple_test.sh
|
jupyter-notebook pytorch_amd_gpu_intro.ipynb
|
jupyter-notebook pytorch_simple_cpu_vs_gpu_benchmark.ipynb
|
./test_migraphx_install.sh
|
test_onnxruntime_providers.py
|
Test HIPCC compiler
|
OpenCL Integration(I assume this isn't working as it hasn't built due to the DeepSeed build failing, which I haven't looked into more to try and resolve yet):
|
./run_torchvision_gpu_benchmarks.sh
|
This shoud be fixed by pull request #59 |
Hi, all the tests should now work. I had now also time to do the gpu_benchmark and checked that the upstream version has also been fixed to run on pytorch 2.0. I got all test executed on it without any of the test throwing any errors. I only needed to change the test.sh. I will check how to get the gpu benchmark results visualized, so lets close this bug and open new discussion one for benchmark result visualization and possible problems on example codes. (which should now all work with latest version) |
Initial compile attempt failed with this issue here: ROCm/rocm_smi_lib#170
After applying the following patch, this fixed the initial issue above:
Now the rocm_sdk_builder compilation fails here:
My skill level prevents me fixing this, any idea?
The text was updated successfully, but these errors were encountered: