From 4c6e39433b5c3342cddc243821fd3f2f76a73085 Mon Sep 17 00:00:00 2001 From: Wisdom Ogwu <40731160+iammadab@users.noreply.github.com> Date: Fri, 24 Oct 2025 07:53:09 +0100 Subject: [PATCH] correct values for y_s and b_s updated the number of versions for variables y_s and b_s in the shared memory section. --- chapter-05/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chapter-05/README.md b/chapter-05/README.md index 7fcd05b..e8e039f 100644 --- a/chapter-05/README.md +++ b/chapter-05/README.md @@ -194,11 +194,11 @@ As above, there is one copy of the array `x[]` for each thread in the grid, so ` **c. How many versions of the variable y_s are there?** -`y_s` is the variable stored in the shared memory. There is one copy of a variable per block in the grid. Since we have 128 blocks in the grid (see a), therefore we have `128` versions of the variable `y_s`. +`y_s` is the variable stored in the shared memory. There is one copy of a variable per block in the grid. Since we have 8 blocks in the grid (see a), therefore we have `8` versions of the variable `y_s`. **d. How many versions of the array b_s[] are there?** -Same as in c, 128 blocks, so `128` versions of `b_s` stored in the shared memory. +Same as in c, 8 blocks, so `8` versions of `b_s` stored in the shared memory. **e. What is the amount of shared memory used per block (in bytes)?** @@ -230,4 +230,4 @@ The SM supports up to 32 blocks per SM, each block running `64` threads. This br **b. The kernel uses 256 threads/block, 31 registers/thread, and 8 KB of shared memory/SM.** -The kernel is using the 256 threads per block, meaning we can have up to `2048/256=8` blocks max. With this configuration, we run `8x256=2048` threads in total. Each thread will use 64 registers, bringing us to the total of `2048x31=63488` registers in total, slightly below our register upper bound. The kernel is using 8 KB per block, and since we have 8 blocks, we will be using `8 x 8 KB = 64 KB` of memory total, considerably below our memory limit. This means that we can run 2048 threads and that we will achieve a 100% occupancy rate. \ No newline at end of file +The kernel is using the 256 threads per block, meaning we can have up to `2048/256=8` blocks max. With this configuration, we run `8x256=2048` threads in total. Each thread will use 64 registers, bringing us to the total of `2048x31=63488` registers in total, slightly below our register upper bound. The kernel is using 8 KB per block, and since we have 8 blocks, we will be using `8 x 8 KB = 64 KB` of memory total, considerably below our memory limit. This means that we can run 2048 threads and that we will achieve a 100% occupancy rate.