Port to alpaka #173

bernhardmgruber · 2020-05-14T14:47:53Z

integrated some changes from hipifycation in alpaka
replaced mallocMC CUDA macros by alpaka macros, removed mallocMC_prefixes.hpp
replaced all CUDA kernel invocations by alpaka kernel enqueues
removed all code that targetted CUDA < 9
merged example02 into example01 since they are almost the same
inlined content of mallocMC_example01_config.hpp
ported kernel invocations to alpaka
replaced cuda allocation routines by alpaka
renamed .cu source files to .cpp
reworked CMakeLists.txt (removed all CUDA stuff, removed big block comments, ...)
added new ReservePoolPolicies SimpleMalloc, intended for running allocator in host memory
passing Alpaka Accelerator through almost all device functions
replaced all atomit operations by alpaka atomics
replaced all CUDA intrinsics by custom implementations in mallocMC_utils.hpp, which default to the intrinsics of the corresponding platform or a default CPU implementation
tried to #ifdef some CUDA thread sync primitives
replaced CUDA thread IDs with alpaka indices and workdivs
replaced shared memory by alpaka shared allocVar
SimpleCudaMalloc and XMallocSIMD are not available, when CUDA is not available, because they are too hard to port for now
refactored thread indexing
incorporating changes from psychocoderHPC from: dev...psychocoderHPC:topic-hip-port
added a target mallocMCIde to CMakeLists.txt, so developers can browse the code in IDEs
setting compiler warnings via a warnings target, instead of global CMAKE_CXX_FLAGS
setting include directories on targets instead of globally
removed check for CUDA compute capability, since capability 3 is required since CUDA 9
removed cudaSetDeviceFlags, as it's not needed

remove workaround commit cd97fe8 (return type for alpaka min/max)

CMakeLists.txt

ax3l · 2020-05-18T08:16:02Z

Just curious because there is no issue open describing this: why do we port mallocMC to Alpaka?

psychocoderHPC · 2020-05-18T10:00:22Z

The main motivation is to integrate it into PIConGPU. I have a prototype for HIP but do not like to port mallocMC to any possible platform.

psychocoderHPC · 2020-05-18T10:02:09Z

For the documentation: there is an open issue to port mallocMC to HIP #166

bernhardmgruber · 2020-05-19T15:22:50Z

@psychocoderHPC the CI build now fails because the cmake version on the CI slave is too old. alpaka requires cmake 3.15 or newer. can you have a look at this for me please? thank you!

psychocoderHPC · 2020-05-19T16:23:17Z

@psychocoderHPC the CI build now fails because the cmake version on the CI slave is too old. alpaka requires cmake 3.15 or newer. can you have a look at this for me please? thank you!

NP, I will update the travis script tomorrow.

src/include/mallocMC/allocator.hpp

src/include/mallocMC/creationPolicies/Scatter_impl.hpp

src/include/mallocMC/reservePoolPolicies/SimpleMalloc.hpp

tests/verify_heap.cpp

psychocoderHPC · 2020-05-20T08:50:24Z

@bernhardmgruber Could you restructure your commits. So that the alpaka subtree creation is the first commit and all your mallocMC changes are in a second. Currently it is not possible to review review your changes.

bernhardmgruber · 2020-05-20T09:15:37Z

@psychocoderHPC I just tried, but it seems I can no longer rebase with the git subtree :/ whenever git rebase processes the addition of the alpaka subtree, it tries to integrate all changes into the root working copy instead of the alpaka subfolder. and that messes up everything. I will try to come up with something different.

psychocoderHPC · 2020-05-20T09:59:06Z

@psychocoderHPC I just tried, but it seems I can no longer rebase with the git subtree :/ whenever git rebase processes the addition of the alpaka subtree, it tries to integrate all changes into the root working copy instead of the alpaka subfolder. and that messes up everything. I will try to come up with something different.

Yes I know that is always the case with subtrees.
The workflow is

remove the commit where you introduced alpaka
stash all other commits
init/update the alpaka subtree
stash pop your changes

@bernhardmgruber Could you please squash all your commits into one commit. Currently it is still hard to review the changes. Since the subtree is the first commit it should be easy (git rebase -i HEAD~21).

bernhardmgruber · 2020-05-20T10:02:07Z

The workflow is

remove the commit where you introduced alpaka

stash all other commits

init/update the alpaka subtree

stash pop your changes

Thx! I created a new branch alpaka2 from dev and added the alpaka subtree there. Then I rebased the subtree out of the alpaka branch. And then I put alpaka on top of the alpaka2. But your solution is probably easier.

@bernhardmgruber Could you please squash all your commits into one commit. Currently it is still hard to review the changes.

I can do that. But can you give me a bit of a rational? This looses all the intermediate changes I did. Or are they of no interest? The resulting diff will also be huge.

psychocoderHPC · 2020-05-20T10:08:26Z

I can do that. But can you give me a bit of a rational? This looses all the intermediate changes I did. Or are they of no interest? The resulting diff will also be huge.

Currently if I try to review the PR I need to got commit by commit to 22 commits. I will see changes you have already reverted in newer commits. If it end with 3 commits to review it's also ok if you group it by any logic. Since we introduced alpaka here and we can not deselect this commit and review the full diff it is the only way.
An other option would be to push the alpaka subtree in a separate PR and switch in the same moment to C++14.

bernhardmgruber · 2020-05-20T10:31:34Z

@psychocoderHPC: I can see that the diff is unwieldy with the addition of the alpaka subtree. So this I guess we should split this out anyway. Here is a separate PR: #176

And for reviewing the diff, don't you use the Github Files changed tag? It provides a nice unified diff.

bernhardmgruber · 2020-05-20T14:34:50Z

@psychocoderHPC I squashed all changes.

src/include/mallocMC/mallocMC_utils.hpp

tests/dimensions.cpp

src/include/mallocMC/reservePoolPolicies/Malloc.hpp

src/include/mallocMC/reservePoolPolicies/SimpleCudaMalloc_impl.hpp

bernhardmgruber · 2020-05-22T14:37:23Z

alpaka/include/alpaka/math/max/MaxUniformCudaHipBuiltIn.hpp

+#ifdef _MSC_VER
+                    -> Tx // FIXME(bgruber): return type is deduced as void by MSVC as host compiler (nvcc deduces correct return type)
+#endif


I have no clue how to solve this. Maybe it is an issue with MSVC. Maybe also in the CUDA SDK for windows. Any ideas?

Funnily enough, I just tried the whole thing on Gentoo with CUDA 10.2 and have the same issue with g++8.4 as host compiler:
error: void value not ignored as it ought to be

I do not have an idea, but can try on my local machine with MSVS + CUDA. It may be that something does wrong with ::max(x, y) that MSVS and g++ 8.4 for some reason think it returns void?

I eventually just explicitely specified the return type as decltype(::max(x, y)). It seems this fixes the misscompilation.

How do you want to handle this in alpaka upstream? Should I open an issue?

Is this change only a preview?
Please do not change any line in a subtree, this change will get lost after with the next subtree update.

I changed this in the alpaka subtree to fix compilation for MSVC and g++. I think it needs to be addressed within alpaka. Here is the issue: alpaka-group/alpaka#1013

bernhardmgruber · 2020-05-25T14:37:26Z

I added the tests to the CI as well and now the Catch main fails to compile. Likely because it is preprocessed by nvcc. Any ideas?
Alpaka seems to have a separate library for the test main. Can I disable alpaka_add_executable from treating all cpp files as CUDA files?

examples/mallocMC_example03.cpp

src/include/mallocMC/mallocMC_utils.hpp

bernhardmgruber · 2020-05-26T07:19:31Z

@psychocoderHPC please merge #179 first! It adds Catch2.

* replaced mallocMC CUDA macros by alpaka macros, removed mallocMC_prefixes.hpp * replaced all CUDA kernel invocations by alpaka kernel enqueues * removed all code that targetted CUDA < 9 * merged example02 into example01 since they are almost the same * inlined content of mallocMC_example01_config.hpp * ported kernel invocations to alpaka * replaced cuda allocation routines by alpaka * renamed .cu source files to .cpp * reworked CMakeLists.txt (removed all CUDA stuff, removed big block comments, ...) * added new ReservePoolPolicies SimpleMalloc, intended for running allocator in host memory * passing Alpaka Accelerator through almost all device functions * replaced all atomit operations by alpaka atomics * replaced all CUDA intrinsics by custom implementations in mallocMC_utils.hpp, which default to the intrinsics of the corresponding platform or a default CPU implementation * tried to #ifdef some CUDA thread sync primitives * replaced CUDA thread IDs with alpaka indices and workdivs * replaced __shared__ memory by alpaka shared allocVar * SimpleCudaMalloc and XMallocSIMD are not available, when CUDA is not available, because they are too hard to port for now * refactored thread indexing * incorporating changes from psychocoderHPC from: alpaka-group/mallocMC@dev...psychocoderHPC:topic-hip-port * added a target mallocMCIde to CMakeLists.txt, so developers can browse the code in IDEs * setting compiler warnings via a warnings target, instead of global CMAKE_CXX_FLAGS * setting include directories on targets instead of globally * removed check for CUDA compute capability, since capability 3 is required since CUDA 9 * removed cudaSetDeviceFlags, as it's not needed

src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp

psychocoderHPC · 2020-06-02T12:35:37Z

I ported the interface change to a prototype branch of PIConGPU psychocoderHPC/picongpu@9edd9e0

I will update this prototype again when the ReservePoolPolicies are introduced back.

* merged SimpleMalloc and CudaMalloc policies into AlpakaBuf policy * since the AlpakaBuf policy is stateful now, Allocator now contains an instance of the reserve pool policy

* disabled calling cudaDeviceSetLimit(cudaLimitMallocHeapSize, ...) more than once

src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp

bernhardmgruber · 2020-06-02T16:05:34Z

src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp

-                cudaDeviceSetLimit(cudaLimitMallocHeapSize, 8192U);
+                // see:
+                // https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DEVICE.html#group__CUDART__DEVICE_1g05956f16eaa47ef3a4efee84563ccb7d
+                // "Setting cudaLimitMallocHeapSize must not be performed after
+                // launching any kernel that uses the malloc() or free() device
+                // system calls"
+                // cudaDeviceSetLimit(cudaLimitMallocHeapSize, 8192U);


CUDA does not allow us to call this a second time :/ What should we do?

Let us add this behavior to the documentation of the policy CudaSetLimits and disable the failing test for this policy. If possible write out a message that the test is disabled for policy XY.

I reverted to the old behavior of the policy and documented the problem. As for the tests, I could not solve them using a cudaDeviceReset() alone. It seems I also need to clear the error of the call to cudaDeviceSetLimit() in resetMemPool()

… on free * documented problem of CudaSetLimits policy

* clearing error code after the call to cudaDeviceSetLimit() in resetMemPool()

src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp

psychocoderHPC · 2020-06-23T07:59:40Z

@bernhardmgruber Could you please integrate the latest alpaka to be able to remove the changes in the alpaka subtree. IMO this is the last part we need to change before we can merge it.

dcf09d485 Enable subdirectories and use alpaka_ROOT (#1022) 3fb905b59 Fix doxygen html generation f271f06c8 Use std::invoke_result_t instead of std::result_of_t when available 043df1a20 Fix GitHub workflow for building the doxygen documentation 05d237189 Reduce test buffer sizes to fix tests with small Idx types 00ea668ae Apply some clang-tidy fixes 25e2ee27c Sphinx Doc: Fix Doxygen integration on readthedocs 97dac827a ExampleDefaultAcc: style fixes e555b6e2e ExampleDefaultAcc: Fix AccCpuTbbBlocks 4b7e56099 examples: Add use ExampleDefaultAcc 0a001dfc5 Add alpaka::example::ExampleDefaultAcc a8eb12e1d example/vectorAdd: Fix Acc choice f77784d42 BlockSharedMem*Member: Add check type alignment requirements 3301dc2aa alpaka.hpp: Add include BlockSharedMem{Dyn,St}Member.hpp f11a44610 BlockSharedMemDynMember: suppress msvc warning C4324: 9b8a572c8 BlockSharedMemStMember::alignPitch: Fix size_t->unisigned int 1da7f9d27 BlockSharedMem*Member: Fix core::vectorization::defaultAlignment/8 5ad68a911 BlockSharedMemDynMember: KB -> KiB fc93619b1 BlockSharedMem*Member: Fix style, comments a100cc741 BlockSharedMem*Member: Add suppress gcc -Wcast-align diagnostics 17f553b64 Add cmake option ALPAKA_DEBUG_OFFLOAD_ASSUME_HOST 275078c98 Add cmake option ALPAKA_BLOCK_SHARED_DYN_MEMBER_ALLOC_KB 76626f4d6 Add BlockSharedMem*Member to avoid malloc in CPU Accs 6b9f24c6c Fix Doxygen CI build e8b70cc2a Use structed offset for SubView tests instead of uniform 1 762b2dab5 Randomly initialize see for math tests ba950fcc3 Use TEMPLATE_LIST_TEST_CASE for math operation tests 64d923aae test gcc 10 1e06fbc83 Remove saving created docker images 0af7a5f3e Finally fix doc 079161245 Adapt doc b8c2eb121 fix /usr/local doc 445020db5 Incorporate review comments c4845e010 Add cmake example to the documentation 322e639dc Add unit tests for ffs intrinsic f31a27115 Add implementation of ffs() intrinsic dc50d6bfb Add missing doxygen for TIdx parameter of mem::buf::alloc 6b9435df9 Update the install documentation a49f4d325 simplify alpaka usage (#1017) 1e1a1d9e5 emulate hip/cuda-Memcpy3D with a kernel 0f157cd0f Disable automatic build of examples and test cases (#1016) ccce2c8d9 Add popcount intrinsic (#1004) 20d4e4ecf Vec: rm unused static constexpr to enable GCC OMP4 83ea8b84c Move CI builds completely from Travis CI to Github Actions 7c6eefb7f Fix clang-CUDA warning in 2D memory allocation 0243eb44a Test compatibility with Ubuntu 20.04 28dc151c0 fix HIP and update to 3.3.0 04dffc8c6 Converts documentation to sphinx/rst for readthedocs 494a08429 GCC: Suppress old style casts e0b84f80a TinyMT: Upstream Update git-subtree-dir: alpaka git-subtree-split: dcf09d48548c0deeb7b58021be21df257620dc34

psychocoderHPC · 2020-06-23T10:36:43Z

big thanks for this PR!!

ax3l · 2020-07-13T23:39:38Z

Uff, you just git added all Alpaka third party sources under your user name ;-)

nevertheless, thank you for this, great work!

psychocoderHPC reviewed May 18, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

bernhardmgruber force-pushed the alpaka branch from 50b92c1 to d05e3c3 Compare May 18, 2020 07:46

psychocoderHPC reviewed May 18, 2020

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

sbastrakov mentioned this pull request May 18, 2020

Add popcount intrinsic alpaka-group/alpaka#1004

Merged

psychocoderHPC mentioned this pull request May 20, 2020

update cmake to 3.15.0 #174

Merged

bernhardmgruber commented May 20, 2020

View reviewed changes

bernhardmgruber force-pushed the alpaka branch from 3aa14fe to 29de154 Compare May 20, 2020 09:31

bernhardmgruber force-pushed the alpaka branch 3 times, most recently from 58c1c8c to a736380 Compare May 20, 2020 14:34

psychocoderHPC requested changes May 20, 2020

View reviewed changes

src/include/mallocMC/mallocMC_utils.hpp Outdated Show resolved Hide resolved

bernhardmgruber force-pushed the alpaka branch from be93d65 to 722ffb6 Compare May 22, 2020 14:36

bernhardmgruber commented May 22, 2020

View reviewed changes

psychocoderHPC requested changes May 25, 2020

View reviewed changes

examples/mallocMC_example03.cpp Show resolved Hide resolved

src/include/mallocMC/mallocMC_utils.hpp Outdated Show resolved Hide resolved

bernhardmgruber force-pushed the alpaka branch from e7649de to dd22280 Compare May 26, 2020 07:12

bernhardmgruber force-pushed the alpaka branch 2 times, most recently from 8212175 to 2791975 Compare May 28, 2020 15:36

bernhardmgruber added 4 commits May 29, 2020 09:49

simplified specifying workdivs in 2/3D case

fef23d6

preventing nvcc from swallowing catch2 header for generating main()

d15d8b8

moved test main into separate library

6cc5200

creating a single unit test executable if CUDA version is 10.2 or higher

739236e

bernhardmgruber force-pushed the alpaka branch from 1816979 to 739236e Compare May 29, 2020 07:50

bernhardmgruber changed the title ~~[WIP] Port to alpaka~~ Port to alpaka May 29, 2020

psychocoderHPC requested changes Jun 2, 2020

View reviewed changes

src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp Show resolved Hide resolved

* reverted deletion of reserve pool policies

a0f391c

* merged SimpleMalloc and CudaMalloc policies into AlpakaBuf policy * since the AlpakaBuf policy is stateful now, Allocator now contains an instance of the reserve pool policy

bernhardmgruber force-pushed the alpaka branch from f8a9d92 to a0f391c Compare June 2, 2020 15:40

* added unit tests for a few combinations of policies

11a8063

* disabled calling cudaDeviceSetLimit(cudaLimitMallocHeapSize, ...) more than once

bernhardmgruber commented Jun 2, 2020

View reviewed changes

psychocoderHPC added feature refactoring labels Jun 3, 2020

psychocoderHPC added this to the 2.5.0crp milestone Jun 3, 2020

bernhardmgruber added 2 commits June 3, 2020 12:03

* reverted to old behavior of CudaSetLimits policy to reset heap size…

07e5d80

… on free * documented problem of CudaSetLimits policy

* added cudaDeviceReset() after unit tests involving CudaSetLimits

bae8658

* clearing error code after the call to cudaDeviceSetLimit() in resetMemPool()

bernhardmgruber commented Jun 3, 2020

View reviewed changes

src/include/mallocMC/reservePoolPolicies/CudaSetLimits_impl.hpp Show resolved Hide resolved

bernhardmgruber added 3 commits June 23, 2020 10:38

Merge commit 'a478f53116b9e941c662cf60df9f3642c7973d29' into alpaka

4ebadba

simplified querying alpaka device

7963ca2

psychocoderHPC approved these changes Jun 23, 2020

View reviewed changes

psychocoderHPC merged commit cc54375 into alpaka-group:dev Jun 23, 2020

bernhardmgruber deleted the alpaka branch June 23, 2020 10:49

psychocoderHPC mentioned this pull request Jul 6, 2020

mallocMC with alpaka is slow #183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port to alpaka #173

Port to alpaka #173

bernhardmgruber commented May 14, 2020 •

edited

Loading

ax3l commented May 18, 2020

psychocoderHPC commented May 18, 2020

psychocoderHPC commented May 18, 2020

bernhardmgruber commented May 19, 2020

psychocoderHPC commented May 19, 2020

psychocoderHPC commented May 20, 2020

bernhardmgruber commented May 20, 2020

psychocoderHPC commented May 20, 2020 •

edited

Loading

bernhardmgruber commented May 20, 2020

psychocoderHPC commented May 20, 2020

bernhardmgruber commented May 20, 2020

bernhardmgruber commented May 20, 2020 •

edited

Loading

bernhardmgruber May 22, 2020

bernhardmgruber May 22, 2020

sbastrakov May 25, 2020

bernhardmgruber May 25, 2020 •

edited

Loading

psychocoderHPC May 25, 2020

bernhardmgruber May 26, 2020

bernhardmgruber commented May 25, 2020

bernhardmgruber commented May 26, 2020

psychocoderHPC commented Jun 2, 2020

bernhardmgruber Jun 2, 2020

psychocoderHPC Jun 3, 2020

bernhardmgruber Jun 3, 2020

psychocoderHPC commented Jun 23, 2020

psychocoderHPC commented Jun 23, 2020

ax3l commented Jul 13, 2020 •

edited

Loading

Port to alpaka #173

Port to alpaka #173

Conversation

bernhardmgruber commented May 14, 2020 • edited Loading

ax3l commented May 18, 2020

psychocoderHPC commented May 18, 2020

psychocoderHPC commented May 18, 2020

bernhardmgruber commented May 19, 2020

psychocoderHPC commented May 19, 2020

psychocoderHPC commented May 20, 2020

bernhardmgruber commented May 20, 2020

psychocoderHPC commented May 20, 2020 • edited Loading

bernhardmgruber commented May 20, 2020

psychocoderHPC commented May 20, 2020

bernhardmgruber commented May 20, 2020

bernhardmgruber commented May 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernhardmgruber May 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bernhardmgruber commented May 25, 2020

bernhardmgruber commented May 26, 2020

psychocoderHPC commented Jun 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psychocoderHPC commented Jun 23, 2020

psychocoderHPC commented Jun 23, 2020

ax3l commented Jul 13, 2020 • edited Loading

bernhardmgruber commented May 14, 2020 •

edited

Loading

psychocoderHPC commented May 20, 2020 •

edited

Loading

bernhardmgruber commented May 20, 2020 •

edited

Loading

bernhardmgruber May 25, 2020 •

edited

Loading

ax3l commented Jul 13, 2020 •

edited

Loading