Skip to content
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions Intro_Tutorial/README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
# RAJA Portability Suite Intro Tutorial

Welcome to the RAJA Portability Suite Intro tutorial. In this tutorial you will learn
how to write an simple application that can target different hardware
architectures using the RAJA and Umpire libraries.
Welcome to the RAJA Portability Suite Intro tutorial. In this tutorial, you
will learn how to use RAJA and Umpire to write simple platform portable code
that can be compiled to target different hardware architectures.

## Lessons

You can find lessons in the lessons subdirectory. Each lesson has a README file
which will introduce new concepts and provide instructions to move forward.

Each lesson builds upon the previous one, so if you get stuck, you can look at
the next lesson to see the complete code. Additionally, some tutorials have
solutions folder with a provided solution.
Lessons are in the `lessons` subdirectory. Each lesson has a README file
that introduces new concepts and provides instructions to complete the lesson.
Each lesson builds on the previous ones to allow you to practice using RAJA
and Umpire capabilities and to reinforce the content.

Lessons contain source files with missing code and instructions for you to fill
in the missing parts along with solution files that contain the completed
lesson code. If you get stuck, you can diff the lesson and solution files to see
the code that the lesson is asking you to fill in.
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,12 @@
#include "RAJA/RAJA.hpp"
#include "umpire/Umpire.hpp"

// TODO: Uncomment this in order to build!
//#define COMPILE

int main()
{
#if defined(COMPILE)
double* data{nullptr};

// TODO: allocate an array of 100 doubles using the HOST allocator
Expand All @@ -18,5 +22,6 @@ int main()

// TODO: deallocate the array

#endif
return 0;
}
39 changes: 19 additions & 20 deletions Intro_Tutorial/lessons/03_umpire_allocator/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,46 @@
# Lesson 3
# Lesson 3: Umpire Allocators

In this lesson, you will learn how to use Umpire to allocate memory. The file
`03_umpire_allocator.cpp` contains some `TODO:` comments where you can add code to allocate and
deallocate memory.
`03_umpire_allocator.cpp` contains `TODO:` comments where you will code to
allocate and deallocate memory.

The fundamental concept for accessing memory through Umpire is the
`umpire::Allocator`. An `umpire::Allocator` is a C++ object that can be used to
allocate and deallocate memory, as well as query a pointer to get
information about it. (Note: in this lesson, we will see how to query the name of the Allocator!)
information about it. In this lesson, we will see how to query the name of an Allocator.

All `umpire::Allocator` objects are created and managed by Umpire’s
`umpire::ResourceManager`. To create an allocator, first obtain a handle to the
ResourceManager, and then request the Allocator corresponding to the desired
memory resource using the `getAllocator` function:
All `umpire::Allocator` objects are created and managed by the
`umpire::ResourceManager` *Singleton* object. To create an allocator,
first obtain a handle to the ResourceManager, and then request the Allocator
corresponding to the desired memory resource using the `getAllocator` function:

```
auto& rm = umpire::ResourceManager::getInstance();
auto allocator = rm.getAllocator("HOST");
```

The Allocator class provides methods for allocating and deallocating memory. You
can view these methods in the Umpire source code documentation here:
https://umpire.readthedocs.io/en/develop/doxygen/html/classumpire_1_1Allocator.html
can view these methods in the [Umpire AllocatorInterface](https://umpire.readthedocs.io/en/develop/doxygen/html/classumpire_1_1Allocator.html).

To use an Umpire allocator, use the following code, replacing "size in bytes" with
the desired size for your allocation:
To use an Umpire allocator, use the following code, replacing "size in bytes"
with the desired size for your allocation:

```
void* memory = allocator.allocate(size in bytes);
```

Moving and modifying data in a heterogenous memory system can be annoying since you
have to keep track of the source and destination, and often use vendor-specific APIs
to perform the modifications. In Umpire, all data modification and movement, regardless
of memory resource or platform, is done using Operations.
Moving and modifying data in a heterogenous memory system can be subtle
because you have to keep track of the source and destination memory spaces,
and often use vendor-specific APIs to perform the modifications. In Umpire,
all data modification and movement, regardless of memory resource or platform,
is done using **Umpire Operations**.

Next, we will use the `memset` Operator provided by Umpire's Resource Manager to
set the memory we just allocated to zero.
Next, we will use the `memset` Operator provided by Umpire's Resource Manager
to set the memory we just allocated to zero.

Don't forget to deallocate your memory afterwards!

For more details, you can check out the Umpire documentation:
https://umpire.readthedocs.io/en/develop/sphinx/tutorial/allocators.html
For more details, you can check out the [Umpire Allocator Documentation](https://umpire.readthedocs.io/en/develop/sphinx/tutorial/allocators.html).

Once you have made your changes, you can compile and run the lesson:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,12 @@
#include "RAJA/RAJA.hpp"
#include "umpire/Umpire.hpp"

// TODO: Uncomment this in order to build!
#define COMPILE

int main()
{
#if defined(COMPILE)
double* data{nullptr};

// TODO: allocate an array of 100 doubles using the HOST allocator
Expand All @@ -23,5 +27,6 @@ int main()
// TODO: deallocate the array
allocator.deallocate(data);

#endif
return 0;
}
2 changes: 1 addition & 1 deletion Intro_Tutorial/lessons/04_raja_forall/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Lesson Four
# Lesson 4: RAJA Simple Loops

Data parallel kernels are common in many parallel HPC applications. In a data
parallel loop kernel, the processing of data that occurs at each iterate **is
Expand Down
2 changes: 1 addition & 1 deletion Intro_Tutorial/lessons/05_raja_reduce/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Lesson 5
# Lesson 5: RAJA Reductions

In lesson 4, we looked at a data parallel loop kernel in which each loop
iterate was independent of the others. In this lesson, we consider a kernel
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Lesson 6
# Lesson 6: Host-Device Memory and Device Kernels

Now, let's learn about Umpire's different memory resources and, in
particular, those used to allocate memory on a GPU.
Expand Down
8 changes: 7 additions & 1 deletion Intro_Tutorial/lessons/07_raja_algs/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Lesson 07
# Lesson 07: RAJA Algorithms

So far, we've looked at RAJA kernel launch methods, where a user passes a kernel
body that defines what an algorithm does at each iterate. RAJA provides
Expand Down Expand Up @@ -87,6 +87,9 @@ $ make 07_raja_atomic
$ .bin/07_raja_atomic
```

Additional information about RAJA atomic operation support can be found in
[RAJA Atomic Operations](https://raja.readthedocs.io/en/develop/sphinx/user_guide/tutorial/atomic_histogram.html).

## Parallel Scan

A **scan operation** is an important building block for parallel algorithms. It
Expand Down Expand Up @@ -204,3 +207,6 @@ $ .bin/07_raja_scan

Is the result what you expected it to be? Can you explain why the first value
in the output is what it is?

Additional information about RAJA scan operations can be found in
[RAJA Parallel Scan Operations](https://raja.readthedocs.io/en/develop/sphinx/user_guide/tutorial/scan.html).
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

#include "RAJA/RAJA.hpp"
#include "umpire/Umpire.hpp"
// TODO: include the header file for the Umpire QuickPool strategy so you can
// use it in the code below

//Uncomment to compile
//#define COMPILE
Expand Down
45 changes: 29 additions & 16 deletions Intro_Tutorial/lessons/08_raja_umpire_quick_pool/README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,46 @@
# Lesson 8
# Lesson 8: Umpire Memory Pools

In this lesson, you will learn to create a memory pool using Umpire.
In this lesson, you will learn to create and use an Umpire memory pool.

Frequently allocating and deallocating memory can be quite costly, especially when you are making large allocations or allocating on different memory resources.
Memory pools are a more efficient way to allocate large amounts of memory, especially when dealing with HPC environments.
Frequently allocating and deallocating memory can be quite costly, especially
when you are making large allocations or allocating on different memory
resources. Memory pools are a more efficient way to allocate large amounts of
memory, especially in HPC environments.

Additionally, Umpire provides allocation strategies that can be used to customize how data is obtained from the system.
In this lesson, we will learn about one such strategy called `QuickPool`.
Umpire provides **allocation strategies** that can be used to customize how
data is obtained from the system. In this lesson, we will learn about one such
strategy called `QuickPool`.

The `QuickPool` strategy describes a certain type of pooling algorithm provided in the Umpire API.
As its name suggests, `QuickPool` has been shown to be performant for many use cases.
The `QuickPool` strategy describes a certain type of pooling algorithm provided
by Umpire. As its name suggests, `QuickPool` is performant for many use cases.

Umpire also provides other types of pooling strategies such as `DynamicPoolList` and `FixedPool`.
You can visit the documentation to learn more: https://umpire.readthedocs.io/en/develop/index.html
Umpire also provides other types of pooling strategies such as `DynamicPoolList`
and `FixedPool`. More information about Umpire memory pools and other features
is available in the [Umpire User Guide](https://umpire.readthedocs.io/en/develop/index.html).

To create a new memory pool allocator using the `QuickPool` strategy, we can use the `ResourceManager`:
To create a new memory pool allocator using the `QuickPool` strategy, we use
the `ResourceManager`:
```
umpire::Allocator pool = rm.makeAllocator<umpire::strategy::QuickPool>("pool_name", my_allocator);
```

This newly created `pool` is an `umpire::Allocator` using the `QuickPool` strategy. As you can see above, we can use the `ResourceManager::makeAllocator` function to create the pool allocator. We just need to pass
in: (1) the name we would like the pool to have, and (2) the allocator we previously created with the `ResourceManager` (see line 17 in the
file `08_raja_umpire_quick_pool.cpp`). Remember that you will also need to include the `umpire/strategy/QuickPool.hpp` header file.

There are other arguments that could be passed to the pool constructor if needed. These additional option arguments are a bit advanced and are beyond the scope of this tutorial. However, you can visit the documentation page for more: https://umpire.readthedocs.io/en/develop/doxygen/html/index.html
This newly created `pool` is an `umpire::Allocator` that uses the `QuickPool`
allocation strategy. In the code example above, we call the
`ResourceManager::makeAllocator` function to create the pool allocator. We
pass in: (1) the name we choose for the the pool, and (2) an allocator we
previously created with the `ResourceManager`. Note that you will need to
include the Umpire header file for the pool type you wish to use, in this case
```
#include "umpire/strategy/QuickPool.hpp"
```

When you have created your QuickPool allocator, uncomment the COMPILE define on line 7;
then compile and run the code:
```
$ make 08_raja_umpire_quick_pool
$ ./bin/08_raja_umpire_quick_pool
```

Other arguments can be passed to the pool constructor if needed. However, they
are beyond the scope of this tutorial. Please visit the [Umpire User Guide](https://umpire.readthedocs.io/en/develop/index.html) to learn more.

Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

#include "RAJA/RAJA.hpp"
#include "umpire/Umpire.hpp"
// TODO: include the header file for the Umpire QuickPool strategy so you can
// use it in the code below
#include "umpire/strategy/QuickPool.hpp"

int main()
Expand Down
2 changes: 1 addition & 1 deletion Intro_Tutorial/lessons/09_raja_view/09_raja_view.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
// TODO: Uncomment this in order to build!
//#define COMPILE

// Method to print arrays associated with the Views constructed above
// Method to print arrays associated with the Views in the lesson
void printArrayAsMatrix( double * array, int row, int col )
{
for ( int ii = 0; ii < row * col; ++ii )
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

#define COMPILE

// Method to print arrays associated with the Views constructed above
// Method to print arrays associated with the Views in the lesson
void printArrayAsMatrix( double * array, int row, int col )
{
for ( int ii = 0; ii < row * col; ++ii )
Expand Down
66 changes: 39 additions & 27 deletions Intro_Tutorial/lessons/09_raja_view/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Lesson 9
# Lesson 9: RAJA Views and Layouts

In this lesson, you will learn how to use `RAJA::View` to simplify
multidimensional indexing in a matrix-matrix multiplication kernel.
Expand All @@ -7,48 +7,60 @@ As is commonly done for efficiency in C and C++, we have allocated the data for
the matrices as one-dimensional arrays. Thus, we need to manually compute the
data pointer offsets for the row and column indices in the kernel.

The `View` has two template parameters `RAJA::View< TYPE, LAYOUT>`. The `TYPE`
is the data type of the underlying data, and `LAYOUT` is a `RAJA::Layout` type that describes how the data is arranged.

The `Layout` takes two template parameters: `RAJA::Layout<DIM, TYPE>`. Here,
`DIM` is the dimension of the layout, and `TYPE` is the type used to index into
the underlying data. For example, a 2D layout could be described as:
A `RAJA::View<TYPE, LAYOUT>` type takes two template parameters. The `TYPE`
parameter is the data type of the underlying data. The `LAYOUT` parameter
is a `RAJA::Layout` type that describes how the View indices are ordered
with respect to data access. A two-dimensional RAJA View constructor takes
three arguments; for example,
```
RAJA::View<TYPE, LAYOUT> my_view(data_ptr, extent0, extent1);
```
The `data_ptr` is a pointer to the data array that the View will be used to
index into. The extent arguments specify the ranges of the indices in each
dimension. A three-dimensional RAJA View constructor takes four arguments,
a data pointer and three extents, one for each View dimension. And so on for
higher dimensions.

The `RAJA::Layout<DIM, TYPE>` takes two template parameters. The `DIM` parameter
is the number of indexing dimension, and the `TYPE` parameter is the data type
of the indices used to index into the underlying data. For example, a
two-dimensional layout for a view that takes `int` values to index into the
data is defined as:
```
RAJA::Layout<2, int>
```

The default data layout ordering in RAJA is row-major, which is the convention
for multi-dimensional array indexing in C and C++. This means that the rightmost
index will be stride-1, the index to the left of the rightmost index will have
stride equal to the extent of the rightmost dimension, and so on.
It is essential to note that the default data layout ordering in RAJA is
row-major, which is the convention for multi-dimensional array indexing in C
and C++. This means that the rightmost index will be stride-1, the index to
the left of the rightmost index will have stride equal to the extent of the
rightmost dimension, and so on.

When constructing a `View`, the first argument is the data pointer, and the
remaining arguments are those required by the `Layout` constructor. For example:
Tying everything together, we construct a two-dimensional MxN View that uses
integer indices to access entries in an array of doubles as follows:

```
RAJA::View<double, RAJA::Layout<2, int>> view(data, N, N);
double* data = ...;
RAJA::View<double, RAJA::Layout<2, int>> view(data, M, N);
```
Note that the size of the data array must be at least MxN to avoid issues.

where `data` is a `double*`, and `N` is the size of each dimension. The size of
`data` should be at least `N*N`.

In the file `09_raja_view.cpp`, there are two `TODO` comments where you should create two
views, A, and R. R will be created via a permuted view with the same right-oriented layout
as A. Knowledge of `RAJA::make_permuted_view` is not required to complete this task, but
more information can be found here:
https://raja.readthedocs.io/en/develop/sphinx/user_guide/feature/view.html#make-permuted-view.
There are two `TODO` comments where you should complete the loop bounds, and fill
in A and R with their respective index values.
When you are ready, uncomment the COMPILE define on line 8; then you can compile and run the code:
In the file `09_raja_view.cpp`, there are `TODO` comments asking you to create
two views, `A` and `R`. `A` will be a standard RAJA view and `R` will be
created using the `RAJA::make_permuted_view` helper method that takes a
right-oriented layout, which is the same as `A`. Knowledge of
`RAJA::make_permuted_view` is not required to complete this task. If you
wish to learn more details, please see [RAJA Make Permuted View](https://raja.readthedocs.io/en/develop/sphinx/user_guide/feature/view.html#make-permuted-view).

There are additional `TODO` comments asking you to insert bounds of nested
for-loops, and fill in `A` and `R` with their respective index values.
When you are ready, compiler and run the code:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When you are ready, compiler and run the code:
When you are ready, uncomment the COMPILE define on line 8; then compile and run the code:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

```
$ make 09_raja_view
$ ./bin/09_raja_view
```

For more information on Views and Layouts, see the RAJA
documentation: https://raja.readthedocs.io/en/develop/sphinx/user_guide/tutorial/view_layout.html
For more information on RAJA Views and Layouts, please see [RAJA Views and Layouts](https://raja.readthedocs.io/en/develop/sphinx/user_guide/tutorial/view_layout.html).



2 changes: 1 addition & 1 deletion Intro_Tutorial/lessons/10_raja_launch/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Lesson ten
# Lesson 10: RAJA Launch

In this lesson, we begin exploring the RAJA abstraction for exposing parallelism within nested loops and the GPU thread/block programming model.

Expand Down