Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to alpaka #173

Merged
merged 35 commits into from
Jun 23, 2020
Merged

Port to alpaka #173

merged 35 commits into from
Jun 23, 2020

Conversation

bernhardmgruber
Copy link
Member

@bernhardmgruber bernhardmgruber commented May 14, 2020

  • integrated some changes from hipifycation in alpaka
  • replaced mallocMC CUDA macros by alpaka macros, removed mallocMC_prefixes.hpp
  • replaced all CUDA kernel invocations by alpaka kernel enqueues
  • removed all code that targetted CUDA < 9
  • merged example02 into example01 since they are almost the same
  • inlined content of mallocMC_example01_config.hpp
  • ported kernel invocations to alpaka
  • replaced cuda allocation routines by alpaka
  • renamed .cu source files to .cpp
  • reworked CMakeLists.txt (removed all CUDA stuff, removed big block comments, ...)
  • added new ReservePoolPolicies SimpleMalloc, intended for running allocator in host memory
  • passing Alpaka Accelerator through almost all device functions
  • replaced all atomit operations by alpaka atomics
  • replaced all CUDA intrinsics by custom implementations in mallocMC_utils.hpp, which default to the intrinsics of the corresponding platform or a default CPU implementation
  • tried to #ifdef some CUDA thread sync primitives
  • replaced CUDA thread IDs with alpaka indices and workdivs
  • replaced shared memory by alpaka shared allocVar
  • SimpleCudaMalloc and XMallocSIMD are not available, when CUDA is not available, because they are too hard to port for now
  • refactored thread indexing
  • incorporating changes from psychocoderHPC from: dev...psychocoderHPC:topic-hip-port
  • added a target mallocMCIde to CMakeLists.txt, so developers can browse the code in IDEs
  • setting compiler warnings via a warnings target, instead of global CMAKE_CXX_FLAGS
  • setting include directories on targets instead of globally
  • removed check for CUDA compute capability, since capability 3 is required since CUDA 9
  • removed cudaSetDeviceFlags, as it's not needed
  • remove workaround commit cd97fe8 (return type for alpaka min/max)

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@ax3l
Copy link
Member

ax3l commented May 18, 2020

Just curious because there is no issue open describing this: why do we port mallocMC to Alpaka?

@psychocoderHPC
Copy link
Member

The main motivation is to integrate it into PIConGPU. I have a prototype for HIP but do not like to port mallocMC to any possible platform.

@psychocoderHPC
Copy link
Member

For the documentation: there is an open issue to port mallocMC to HIP #166

@bernhardmgruber
Copy link
Member Author

@psychocoderHPC the CI build now fails because the cmake version on the CI slave is too old. alpaka requires cmake 3.15 or newer. can you have a look at this for me please? thank you!

@psychocoderHPC
Copy link
Member

@psychocoderHPC the CI build now fails because the cmake version on the CI slave is too old. alpaka requires cmake 3.15 or newer. can you have a look at this for me please? thank you!

NP, I will update the travis script tomorrow.

src/include/mallocMC/allocator.hpp Outdated Show resolved Hide resolved
src/include/mallocMC/allocator.hpp Outdated Show resolved Hide resolved
src/include/mallocMC/allocator.hpp Outdated Show resolved Hide resolved
src/include/mallocMC/allocator.hpp Outdated Show resolved Hide resolved
src/include/mallocMC/reservePoolPolicies/SimpleMalloc.hpp Outdated Show resolved Hide resolved
tests/verify_heap.cpp Outdated Show resolved Hide resolved
tests/verify_heap.cpp Show resolved Hide resolved
tests/verify_heap.cpp Show resolved Hide resolved
tests/verify_heap.cpp Outdated Show resolved Hide resolved
@psychocoderHPC
Copy link
Member

@bernhardmgruber Could you restructure your commits. So that the alpaka subtree creation is the first commit and all your mallocMC changes are in a second. Currently it is not possible to review review your changes.

@bernhardmgruber
Copy link
Member Author

@psychocoderHPC I just tried, but it seems I can no longer rebase with the git subtree :/ whenever git rebase processes the addition of the alpaka subtree, it tries to integrate all changes into the root working copy instead of the alpaka subfolder. and that messes up everything. I will try to come up with something different.

@psychocoderHPC
Copy link
Member

psychocoderHPC commented May 20, 2020

@psychocoderHPC I just tried, but it seems I can no longer rebase with the git subtree :/ whenever git rebase processes the addition of the alpaka subtree, it tries to integrate all changes into the root working copy instead of the alpaka subfolder. and that messes up everything. I will try to come up with something different.

Yes I know that is always the case with subtrees.
The workflow is

  • remove the commit where you introduced alpaka
  • stash all other commits
  • init/update the alpaka subtree
  • stash pop your changes

@bernhardmgruber Could you please squash all your commits into one commit. Currently it is still hard to review the changes. Since the subtree is the first commit it should be easy (git rebase -i HEAD~21).

@bernhardmgruber
Copy link
Member Author

The workflow is

  • remove the commit where you introduced alpaka
  • stash all other commits
  • init/update the alpaka subtree
  • stash pop your changes

Thx! I created a new branch alpaka2 from dev and added the alpaka subtree there. Then I rebased the subtree out of the alpaka branch. And then I put alpaka on top of the alpaka2. But your solution is probably easier.

@bernhardmgruber Could you please squash all your commits into one commit. Currently it is still hard to review the changes.

I can do that. But can you give me a bit of a rational? This looses all the intermediate changes I did. Or are they of no interest? The resulting diff will also be huge.

@psychocoderHPC
Copy link
Member

I can do that. But can you give me a bit of a rational? This looses all the intermediate changes I did. Or are they of no interest? The resulting diff will also be huge.

Currently if I try to review the PR I need to got commit by commit to 22 commits. I will see changes you have already reverted in newer commits. If it end with 3 commits to review it's also ok if you group it by any logic. Since we introduced alpaka here and we can not deselect this commit and review the full diff it is the only way.
An other option would be to push the alpaka subtree in a separate PR and switch in the same moment to C++14.

@bernhardmgruber
Copy link
Member Author

@psychocoderHPC: I can see that the diff is unwieldy with the addition of the alpaka subtree. So this I guess we should split this out anyway. Here is a separate PR: #176

And for reviewing the diff, don't you use the Github Files changed tag? It provides a nice unified diff.

@bernhardmgruber bernhardmgruber force-pushed the alpaka branch 3 times, most recently from 58c1c8c to a736380 Compare May 20, 2020 14:34
@bernhardmgruber
Copy link
Member Author

bernhardmgruber commented May 20, 2020

@psychocoderHPC I squashed all changes.

tests/dimensions.cpp Outdated Show resolved Hide resolved
src/include/mallocMC/reservePoolPolicies/Malloc.hpp Outdated Show resolved Hide resolved
Comment on lines 74 to 76
#ifdef _MSC_VER
-> Tx // FIXME(bgruber): return type is deduced as void by MSVC as host compiler (nvcc deduces correct return type)
#endif
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no clue how to solve this. Maybe it is an issue with MSVC. Maybe also in the CUDA SDK for windows. Any ideas?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funnily enough, I just tried the whole thing on Gentoo with CUDA 10.2 and have the same issue with g++8.4 as host compiler:
error: void value not ignored as it ought to be

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have an idea, but can try on my local machine with MSVS + CUDA. It may be that something does wrong with ::max(x, y) that MSVS and g++ 8.4 for some reason think it returns void?

Copy link
Member Author

@bernhardmgruber bernhardmgruber May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I eventually just explicitely specified the return type as decltype(::max(x, y)). It seems this fixes the misscompilation.

How do you want to handle this in alpaka upstream? Should I open an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change only a preview?
Please do not change any line in a subtree, this change will get lost after with the next subtree update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this in the alpaka subtree to fix compilation for MSVC and g++. I think it needs to be addressed within alpaka. Here is the issue: alpaka-group/alpaka#1013

@bernhardmgruber
Copy link
Member Author

I added the tests to the CI as well and now the Catch main fails to compile. Likely because it is preprocessed by nvcc. Any ideas?
Alpaka seems to have a separate library for the test main. Can I disable alpaka_add_executable from treating all cpp files as CUDA files?

examples/mallocMC_example03.cpp Show resolved Hide resolved
src/include/mallocMC/mallocMC_utils.hpp Outdated Show resolved Hide resolved
@bernhardmgruber
Copy link
Member Author

@psychocoderHPC please merge #179 first! It adds Catch2.

* replaced mallocMC CUDA macros by alpaka macros, removed mallocMC_prefixes.hpp
* replaced all CUDA kernel invocations by alpaka kernel enqueues
* removed all code that targetted CUDA < 9
* merged example02 into example01 since they are almost the same
* inlined content of mallocMC_example01_config.hpp
* ported kernel invocations to alpaka
* replaced cuda allocation routines by alpaka
* renamed .cu source files to .cpp
* reworked CMakeLists.txt (removed all CUDA stuff, removed big block comments, ...)
* added new ReservePoolPolicies SimpleMalloc, intended for running allocator in host memory
* passing Alpaka Accelerator through almost all device functions
* replaced all atomit operations by alpaka atomics
* replaced all CUDA intrinsics by custom implementations in mallocMC_utils.hpp, which default to the intrinsics of the corresponding platform or a default CPU implementation
* tried to #ifdef some CUDA thread sync primitives
* replaced CUDA thread IDs with alpaka indices and workdivs
* replaced __shared__ memory by alpaka shared allocVar
* SimpleCudaMalloc and XMallocSIMD are not available, when CUDA is not available, because they are too hard to port for now
* refactored thread indexing
* incorporating changes from psychocoderHPC from: alpaka-group/mallocMC@dev...psychocoderHPC:topic-hip-port
* added a target mallocMCIde to CMakeLists.txt, so developers can browse the code in IDEs
* setting compiler warnings via a warnings target, instead of global CMAKE_CXX_FLAGS
* setting include directories on targets instead of globally
* removed check for CUDA compute capability, since capability 3 is required since CUDA 9
* removed cudaSetDeviceFlags, as it's not needed
@bernhardmgruber bernhardmgruber force-pushed the alpaka branch 2 times, most recently from 8212175 to 2791975 Compare May 28, 2020 15:36
@bernhardmgruber bernhardmgruber changed the title [WIP] Port to alpaka Port to alpaka May 29, 2020
@psychocoderHPC
Copy link
Member

I ported the interface change to a prototype branch of PIConGPU psychocoderHPC/picongpu@9edd9e0

I will update this prototype again when the ReservePoolPolicies are introduced back.

* merged SimpleMalloc and CudaMalloc policies into AlpakaBuf policy
* since the AlpakaBuf policy is stateful now, Allocator now contains an instance of the reserve pool policy
* disabled calling cudaDeviceSetLimit(cudaLimitMallocHeapSize, ...) more than once
Comment on lines 49 to 63
cudaDeviceSetLimit(cudaLimitMallocHeapSize, 8192U);
// see:
// https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g05956f16eaa47ef3a4efee84563ccb7d
// "Setting cudaLimitMallocHeapSize must not be performed after
// launching any kernel that uses the malloc() or free() device
// system calls"
// cudaDeviceSetLimit(cudaLimitMallocHeapSize, 8192U);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA does not allow us to call this a second time :/ What should we do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us add this behavior to the documentation of the policy CudaSetLimits and disable the failing test for this policy. If possible write out a message that the test is disabled for policy XY.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted to the old behavior of the policy and documented the problem. As for the tests, I could not solve them using a cudaDeviceReset() alone. It seems I also need to clear the error of the call to cudaDeviceSetLimit() in resetMemPool()

… on free

* documented problem of CudaSetLimits policy
* clearing error code after the call to cudaDeviceSetLimit() in resetMemPool()
@psychocoderHPC
Copy link
Member

@bernhardmgruber Could you please integrate the latest alpaka to be able to remove the changes in the alpaka subtree. IMO this is the last part we need to change before we can merge it.

dcf09d485 Enable subdirectories and use alpaka_ROOT (#1022)
3fb905b59 Fix doxygen html generation
f271f06c8 Use std::invoke_result_t instead of std::result_of_t when available
043df1a20 Fix GitHub workflow for building the doxygen documentation
05d237189 Reduce test buffer sizes to fix tests with small Idx types
00ea668ae Apply some clang-tidy fixes
25e2ee27c Sphinx Doc: Fix Doxygen integration on readthedocs
97dac827a ExampleDefaultAcc: style fixes
e555b6e2e ExampleDefaultAcc: Fix AccCpuTbbBlocks
4b7e56099 examples: Add use ExampleDefaultAcc
0a001dfc5 Add alpaka::example::ExampleDefaultAcc
a8eb12e1d example/vectorAdd: Fix Acc choice
f77784d42 BlockSharedMem*Member: Add check type alignment requirements
3301dc2aa alpaka.hpp: Add include BlockSharedMem{Dyn,St}Member.hpp
f11a44610 BlockSharedMemDynMember: suppress msvc warning C4324:
9b8a572c8 BlockSharedMemStMember::alignPitch: Fix size_t->unisigned int
1da7f9d27 BlockSharedMem*Member: Fix core::vectorization::defaultAlignment/8
5ad68a911 BlockSharedMemDynMember: KB -> KiB
fc93619b1 BlockSharedMem*Member: Fix style, comments
a100cc741 BlockSharedMem*Member: Add suppress gcc -Wcast-align diagnostics
17f553b64 Add cmake option ALPAKA_DEBUG_OFFLOAD_ASSUME_HOST
275078c98 Add cmake option ALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KB
76626f4d6 Add BlockSharedMem*Member to avoid malloc in CPU Accs
6b9f24c6c Fix Doxygen CI build
e8b70cc2a Use structed offset for SubView tests instead of uniform 1
762b2dab5 Randomly initialize see for math tests
ba950fcc3 Use TEMPLATE_LIST_TEST_CASE for math operation tests
64d923aae test gcc 10
1e06fbc83 Remove saving created docker images
0af7a5f3e Finally fix doc
079161245 Adapt doc
b8c2eb121 fix /usr/local doc
445020db5 Incorporate review comments
c4845e010 Add cmake example to the documentation
322e639dc Add unit tests for ffs intrinsic
f31a27115 Add implementation of ffs() intrinsic
dc50d6bfb Add missing doxygen for TIdx parameter of mem::buf::alloc
6b9435df9 Update the install documentation
a49f4d325 simplify alpaka usage (#1017)
1e1a1d9e5 emulate hip/cuda-Memcpy3D with a kernel
0f157cd0f Disable automatic build of examples and test cases (#1016)
ccce2c8d9 Add popcount intrinsic (#1004)
20d4e4ecf Vec: rm unused static constexpr to enable GCC OMP4
83ea8b84c Move CI builds completely from Travis CI to Github Actions
7c6eefb7f Fix clang-CUDA warning in 2D memory allocation
0243eb44a Test compatibility with Ubuntu 20.04
28dc151c0 fix HIP and update to 3.3.0
04dffc8c6 Converts documentation to sphinx/rst for readthedocs
494a08429 GCC: Suppress old style casts
e0b84f80a TinyMT: Upstream Update

git-subtree-dir: alpaka
git-subtree-split: dcf09d48548c0deeb7b58021be21df257620dc34
@psychocoderHPC psychocoderHPC merged commit cc54375 into alpaka-group:dev Jun 23, 2020
@psychocoderHPC
Copy link
Member

big thanks for this PR!!

@ax3l
Copy link
Member

ax3l commented Jul 13, 2020

Uff, you just git added all Alpaka third party sources under your user name ;-)

nevertheless, thank you for this, great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants