From e33537cf80dcb5c0daa264d7b8e22c5f48ae573d Mon Sep 17 00:00:00 2001 From: Carl Pearson Date: Tue, 29 Jan 2019 11:49:07 -0600 Subject: [PATCH] Wei Ren's changes to readme and template --- labs/stencil/README.md | 22 +++++++++++++++++----- labs/stencil/template.cu | 2 +- 2 files changed, 18 insertions(+), 6 deletions(-) diff --git a/labs/stencil/README.md b/labs/stencil/README.md index 22f413b..220712f 100644 --- a/labs/stencil/README.md +++ b/labs/stencil/README.md @@ -1,18 +1,30 @@ # 7-point Stencil with Thread-coarsening and Register Tiling -## Objective +## Objective The purpose of this lab is to practice the thread coarsening and register tiling optimization techniques using 7-point stencil as an example. ## Procedure -1. Edit the `kernel` function in `template.cu` to implement a 7-point stencil with combined register tiling and x-y shared memory tiling, and thread coarsening along the z-dimension. +1. Edit the `kernel` function in `template.cu` to implement a 7-point stencil (refer to the [lecture slides](https://bw-course.ncsa.illinois.edu/mod/resource/view.php?id=574)) with combined register tiling and x-y shared memory tiling, and thread coarsening along the z-dimension. + + ``` + out(i, j, k) = C0 *in(i, j, k) + + C1 * ( in(i-1, j, k) + + in(i, j-1, k) + + in(i, j, k-1) + + in(i+1, j, k) + + in(i, j+1, k) + + in(i, j, k+1) ) + ``` 2. Edit the `launchStencil` function in `template.cu` to launch the kernel you implemented. The function should launch 2D CUDA grid and blocks, where each thread is responsible for computing an entire column in the z-deminsion. + `A0` and `Anext` in the code template correspond to `in` and `out`, respectively. The output dimension of the 7-point stencil computation is one smaller than the input dimension on both sides for all boundaries (e.g., output dimension is 6x6x6 for an input of 8x8x8). Only those "internal" elements needs to be calculated. + 3. Test your code using rai -`rai -p ` + `rai -p ` -Be sure to add any additional flags that are required by your course (`--queue` or others). + Be sure to add any additional flags that are required by your course (`--queue` or others). 4. Submit your code on rai @@ -28,4 +40,4 @@ __global__ void kernel(...) {} // your kernel code #undef A0 } -``` \ No newline at end of file +``` diff --git a/labs/stencil/template.cu b/labs/stencil/template.cu index b96f540..56d4a21 100644 --- a/labs/stencil/template.cu +++ b/labs/stencil/template.cu @@ -70,7 +70,7 @@ static int eval(const int nx, const int ny, const int nz) { -TEST_CASE("Convlayer", "[convlayer]") { +TEST_CASE("Stencil", "[stencil]") { SECTION("[dims:32,32,32]") { eval(32,32,32);