-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port to alpaka #173
Port to alpaka #173
Conversation
bernhardmgruber
commented
May 14, 2020
•
edited
Loading
edited
- integrated some changes from hipifycation in alpaka
- replaced mallocMC CUDA macros by alpaka macros, removed mallocMC_prefixes.hpp
- replaced all CUDA kernel invocations by alpaka kernel enqueues
- removed all code that targetted CUDA < 9
- merged example02 into example01 since they are almost the same
- inlined content of mallocMC_example01_config.hpp
- ported kernel invocations to alpaka
- replaced cuda allocation routines by alpaka
- renamed .cu source files to .cpp
- reworked CMakeLists.txt (removed all CUDA stuff, removed big block comments, ...)
- added new ReservePoolPolicies SimpleMalloc, intended for running allocator in host memory
- passing Alpaka Accelerator through almost all device functions
- replaced all atomit operations by alpaka atomics
- replaced all CUDA intrinsics by custom implementations in mallocMC_utils.hpp, which default to the intrinsics of the corresponding platform or a default CPU implementation
- tried to #ifdef some CUDA thread sync primitives
- replaced CUDA thread IDs with alpaka indices and workdivs
- replaced shared memory by alpaka shared allocVar
- SimpleCudaMalloc and XMallocSIMD are not available, when CUDA is not available, because they are too hard to port for now
- refactored thread indexing
- incorporating changes from psychocoderHPC from: dev...psychocoderHPC:topic-hip-port
- added a target mallocMCIde to CMakeLists.txt, so developers can browse the code in IDEs
- setting compiler warnings via a warnings target, instead of global CMAKE_CXX_FLAGS
- setting include directories on targets instead of globally
- removed check for CUDA compute capability, since capability 3 is required since CUDA 9
- removed cudaSetDeviceFlags, as it's not needed
- remove workaround commit cd97fe8 (return type for alpaka min/max)
Just curious because there is no issue open describing this: why do we port mallocMC to Alpaka? |
The main motivation is to integrate it into PIConGPU. I have a prototype for HIP but do not like to port mallocMC to any possible platform. |
For the documentation: there is an open issue to port mallocMC to HIP #166 |
@psychocoderHPC the CI build now fails because the cmake version on the CI slave is too old. alpaka requires cmake 3.15 or newer. can you have a look at this for me please? thank you! |
NP, I will update the travis script tomorrow. |
@bernhardmgruber Could you restructure your commits. So that the alpaka subtree creation is the first commit and all your mallocMC changes are in a second. Currently it is not possible to review review your changes. |
@psychocoderHPC I just tried, but it seems I can no longer rebase with the git subtree :/ whenever git rebase processes the addition of the alpaka subtree, it tries to integrate all changes into the root working copy instead of the alpaka subfolder. and that messes up everything. I will try to come up with something different. |
Yes I know that is always the case with subtrees.
@bernhardmgruber Could you please squash all your commits into one commit. Currently it is still hard to review the changes. Since the subtree is the first commit it should be easy ( |
Thx! I created a new branch alpaka2 from dev and added the alpaka subtree there. Then I rebased the subtree out of the alpaka branch. And then I put alpaka on top of the alpaka2. But your solution is probably easier.
I can do that. But can you give me a bit of a rational? This looses all the intermediate changes I did. Or are they of no interest? The resulting diff will also be huge. |
Currently if I try to review the PR I need to got commit by commit to 22 commits. I will see changes you have already reverted in newer commits. If it end with 3 commits to review it's also ok if you group it by any logic. Since we introduced alpaka here and we can not deselect this commit and review the full diff it is the only way. |
@psychocoderHPC: I can see that the diff is unwieldy with the addition of the alpaka subtree. So this I guess we should split this out anyway. Here is a separate PR: #176 And for reviewing the diff, don't you use the Github Files changed tag? It provides a nice unified diff. |
58c1c8c
to
a736380
Compare
@psychocoderHPC I squashed all changes. |
src/include/mallocMC/reservePoolPolicies/SimpleCudaMalloc_impl.hpp
Outdated
Show resolved
Hide resolved
#ifdef _MSC_VER | ||
-> Tx // FIXME(bgruber): return type is deduced as void by MSVC as host compiler (nvcc deduces correct return type) | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no clue how to solve this. Maybe it is an issue with MSVC. Maybe also in the CUDA SDK for windows. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funnily enough, I just tried the whole thing on Gentoo with CUDA 10.2 and have the same issue with g++8.4 as host compiler:
error: void value not ignored as it ought to be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have an idea, but can try on my local machine with MSVS + CUDA. It may be that something does wrong with ::max(x, y)
that MSVS and g++ 8.4 for some reason think it returns void
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I eventually just explicitely specified the return type as decltype(::max(x, y))
. It seems this fixes the misscompilation.
How do you want to handle this in alpaka upstream? Should I open an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change only a preview?
Please do not change any line in a subtree, this change will get lost after with the next subtree update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this in the alpaka subtree to fix compilation for MSVC and g++. I think it needs to be addressed within alpaka. Here is the issue: alpaka-group/alpaka#1013
I added the tests to the CI as well and now the Catch main fails to compile. Likely because it is preprocessed by nvcc. Any ideas? |
@psychocoderHPC please merge #179 first! It adds Catch2. |
* replaced mallocMC CUDA macros by alpaka macros, removed mallocMC_prefixes.hpp * replaced all CUDA kernel invocations by alpaka kernel enqueues * removed all code that targetted CUDA < 9 * merged example02 into example01 since they are almost the same * inlined content of mallocMC_example01_config.hpp * ported kernel invocations to alpaka * replaced cuda allocation routines by alpaka * renamed .cu source files to .cpp * reworked CMakeLists.txt (removed all CUDA stuff, removed big block comments, ...) * added new ReservePoolPolicies SimpleMalloc, intended for running allocator in host memory * passing Alpaka Accelerator through almost all device functions * replaced all atomit operations by alpaka atomics * replaced all CUDA intrinsics by custom implementations in mallocMC_utils.hpp, which default to the intrinsics of the corresponding platform or a default CPU implementation * tried to #ifdef some CUDA thread sync primitives * replaced CUDA thread IDs with alpaka indices and workdivs * replaced __shared__ memory by alpaka shared allocVar * SimpleCudaMalloc and XMallocSIMD are not available, when CUDA is not available, because they are too hard to port for now * refactored thread indexing * incorporating changes from psychocoderHPC from: alpaka-group/mallocMC@dev...psychocoderHPC:topic-hip-port * added a target mallocMCIde to CMakeLists.txt, so developers can browse the code in IDEs * setting compiler warnings via a warnings target, instead of global CMAKE_CXX_FLAGS * setting include directories on targets instead of globally * removed check for CUDA compute capability, since capability 3 is required since CUDA 9 * removed cudaSetDeviceFlags, as it's not needed
8212175
to
2791975
Compare
I ported the interface change to a prototype branch of PIConGPU psychocoderHPC/picongpu@9edd9e0 I will update this prototype again when the |
* merged SimpleMalloc and CudaMalloc policies into AlpakaBuf policy * since the AlpakaBuf policy is stateful now, Allocator now contains an instance of the reserve pool policy
* disabled calling cudaDeviceSetLimit(cudaLimitMallocHeapSize, ...) more than once
src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp
Outdated
Show resolved
Hide resolved
cudaDeviceSetLimit(cudaLimitMallocHeapSize, 8192U); | ||
// see: | ||
// https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g05956f16eaa47ef3a4efee84563ccb7d | ||
// "Setting cudaLimitMallocHeapSize must not be performed after | ||
// launching any kernel that uses the malloc() or free() device | ||
// system calls" | ||
// cudaDeviceSetLimit(cudaLimitMallocHeapSize, 8192U); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA does not allow us to call this a second time :/ What should we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us add this behavior to the documentation of the policy CudaSetLimits
and disable the failing test for this policy. If possible write out a message that the test is disabled for policy XY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted to the old behavior of the policy and documented the problem. As for the tests, I could not solve them using a cudaDeviceReset()
alone. It seems I also need to clear the error of the call to cudaDeviceSetLimit()
in resetMemPool()
… on free * documented problem of CudaSetLimits policy
* clearing error code after the call to cudaDeviceSetLimit() in resetMemPool()
@bernhardmgruber Could you please integrate the latest alpaka to be able to remove the changes in the alpaka subtree. IMO this is the last part we need to change before we can merge it. |
dcf09d485 Enable subdirectories and use alpaka_ROOT (#1022) 3fb905b59 Fix doxygen html generation f271f06c8 Use std::invoke_result_t instead of std::result_of_t when available 043df1a20 Fix GitHub workflow for building the doxygen documentation 05d237189 Reduce test buffer sizes to fix tests with small Idx types 00ea668ae Apply some clang-tidy fixes 25e2ee27c Sphinx Doc: Fix Doxygen integration on readthedocs 97dac827a ExampleDefaultAcc: style fixes e555b6e2e ExampleDefaultAcc: Fix AccCpuTbbBlocks 4b7e56099 examples: Add use ExampleDefaultAcc 0a001dfc5 Add alpaka::example::ExampleDefaultAcc a8eb12e1d example/vectorAdd: Fix Acc choice f77784d42 BlockSharedMem*Member: Add check type alignment requirements 3301dc2aa alpaka.hpp: Add include BlockSharedMem{Dyn,St}Member.hpp f11a44610 BlockSharedMemDynMember: suppress msvc warning C4324: 9b8a572c8 BlockSharedMemStMember::alignPitch: Fix size_t->unisigned int 1da7f9d27 BlockSharedMem*Member: Fix core::vectorization::defaultAlignment/8 5ad68a911 BlockSharedMemDynMember: KB -> KiB fc93619b1 BlockSharedMem*Member: Fix style, comments a100cc741 BlockSharedMem*Member: Add suppress gcc -Wcast-align diagnostics 17f553b64 Add cmake option ALPAKA_DEBUG_OFFLOAD_ASSUME_HOST 275078c98 Add cmake option ALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KB 76626f4d6 Add BlockSharedMem*Member to avoid malloc in CPU Accs 6b9f24c6c Fix Doxygen CI build e8b70cc2a Use structed offset for SubView tests instead of uniform 1 762b2dab5 Randomly initialize see for math tests ba950fcc3 Use TEMPLATE_LIST_TEST_CASE for math operation tests 64d923aae test gcc 10 1e06fbc83 Remove saving created docker images 0af7a5f3e Finally fix doc 079161245 Adapt doc b8c2eb121 fix /usr/local doc 445020db5 Incorporate review comments c4845e010 Add cmake example to the documentation 322e639dc Add unit tests for ffs intrinsic f31a27115 Add implementation of ffs() intrinsic dc50d6bfb Add missing doxygen for TIdx parameter of mem::buf::alloc 6b9435df9 Update the install documentation a49f4d325 simplify alpaka usage (#1017) 1e1a1d9e5 emulate hip/cuda-Memcpy3D with a kernel 0f157cd0f Disable automatic build of examples and test cases (#1016) ccce2c8d9 Add popcount intrinsic (#1004) 20d4e4ecf Vec: rm unused static constexpr to enable GCC OMP4 83ea8b84c Move CI builds completely from Travis CI to Github Actions 7c6eefb7f Fix clang-CUDA warning in 2D memory allocation 0243eb44a Test compatibility with Ubuntu 20.04 28dc151c0 fix HIP and update to 3.3.0 04dffc8c6 Converts documentation to sphinx/rst for readthedocs 494a08429 GCC: Suppress old style casts e0b84f80a TinyMT: Upstream Update git-subtree-dir: alpaka git-subtree-split: dcf09d48548c0deeb7b58021be21df257620dc34
big thanks for this PR!! |
Uff, you just nevertheless, thank you for this, great work! |