Add gif and begin README.

CIS565-Fall-2015 · Sep 7, 2015 · 81f92c4 · 81f92c4
1 parent a57cb27
commit 81f92c4
Show file tree

Hide file tree

Showing 2 changed files with 10 additions and 179 deletions.
diff --git a/README.md b/README.md
@@ -3,161 +3,21 @@ CUDA Introduction
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+Terry Sun; Arch Linux, Intel i5-4670, GTX 750
 
-### (TODO: Your README)
+Part1 contains a basic nbody simulation:
 
-Include screenshots, analysis, etc. (Remember, this is public, so don't put
-anything here that you don't want to share with the world.)
+![](images/nbody.gif)
 
-Instructions (delete me)
-========================
+(2500 planets, 0.5s per step)
 
-This is due Monday, September 7.
-
-**Summary:** In this project, you will get some real experience writing simple
-CUDA kernels, using them, and analyzing their performance. You'll implement the
-simulation step of an N-body simulation, and you'll write some GPU-accelerated
-matrix math operations.
-
-## Part 0: Nothing New
-
-This project (and all other CUDA projects in this course) requires an NVIDIA
-graphics card with CUDA capability. Any card with Compute Capability 2.0
-(`sm_20`) or greater will work. Check your GPU on this
-[compatibility table](https://developer.nvidia.com/cuda-gpus).
-If you do not have a personal machine with these specs, you may use
-computers in the SIG Lab and Moore 100B/C.
-
-**HOWEVER**: If you need to use the lab computer for your development, you will
-not presently be able to do GPU performance profiling. This will be very
-important for debugging performance bottlenecks in your program. If you do not
-have administrative access to any CUDA-capable machine, please email the TA.
-
-## Part 1: N-body Simulation
-
-### 1.0. The Usual
-
-See Project 0, Parts 1-3 for reference.
-
-If you are using the Nsight IDE (not Visual Studio) and started Project 0
-early, note that things have
-changed slightly. Instead of creating a new project, use
-*File->Import->General->Existing Projects Into Workspace*, and select the
-`Project1-Part1` folder as the root directory. Under *Project->Build
-Configurations->Set Active...*, you can now select various Release and Debug
-builds.
-
-* `src/` contains the source code.
-* `external/` contains the binaries and headers for GLEW, GLFW, and GLM.
-
-**CMake note:** Do not change any build settings or add any files to your
-project directly (in Visual Studio, Nsight, etc.) Instead, edit the
-`src/CMakeLists.txt` file. Any files you create must be added here. If you edit
-it, just rebuild your VS/Nsight project to sync the changes into the IDE.
-
-
-### 1.1. CUDA Done That With My Eyes Closed
-
-To get used to using CUDA kernels, you'll write simple CUDA kernels and
-kernel invocations for performing an N-body gravitational simulation.
-The following source files are included in the project:
-
-* `src/main.cpp`: Performs all of the CUDA/OpenGL setup and OpenGL
-  visualization.
-* `src/kernel.cu`: CUDA device functions, state, kernels, and CPU functions for
-  kernel invocations.
-
-1. Search the code for `TODO`:
-   * `src/kernel.cu`: Use what you learned in the first lectures to
-     figure out how to resolve these 4 TODOs.
-
-Take a screenshot. Commit and push your code changes.
-
-
-## Part 2: Matrix Math
-
-In this part, you'll set up a CUDA project with some simple matrix math
-functionality. Put this in the `Project1-Part2` directory in your repository.
-
-### 1.1. Create Your Project
-
-You'll need to copy over all of the boilerplate project-related files from
-Part 1:
-
-* `cmake/`
-* `external/`
-* `.cproject`
-* `.project`
-* `GNUmakefile`
-* `CMakeLists.txt`
-* `src/CMakeLists.txt`
-
-Next, create empty text files for your main function and CUDA kernels:
-
-* `src/main.cpp`
-* `src/matrix_math.h`
-* `src/matrix_math.cu`
-
-As you work through the next steps, find and use relevant code from Part 1 to
-get the new project set up: includes, error checking, initialization, etc.
-
-### 1.2. Setting Up CUDA Memory
-
-As discussed in class, there are two separate memory spaces: host memory and
-device memory. Host memory is accessible by the CPU, while device memory is
-accessible by the GPU.
-
-In order to allocate memory on the GPU, we need to use the CUDA library
-function `cudaMalloc`. This reserves a portion of the GPU memory and returns a
-pointer, like standard `malloc` - but the pointer returned by `cudaMalloc` is
-in the GPU memory space and is only accessible from GPU code. You can use
-`cudaFree` to free GPU memory allocated using `cudaMalloc`.
-
-We can copy memory to and from the GPU using `cudaMemcpy`. Like C `memcpy`,
-you will need to specify the size of memory that you are copying. But
-`cudaMemcpy` has an additional argument - the last argument specifies the
-whether the copy is from host to device, device to host, device to device, or
-host to host.
-
-* Look up documentation on `cudaMalloc`, 'cudaFree', and `cudaMemcpy` to find
-  out how to use them - they're not quite obvious.
-
-In an initialization function in `matrix_math.cu`, initialize three 5x5 matrices
-on the host and three on the device. Prefix your variables with `hst_` and
-`dev_`, respectively, so you know what kind of pointers they are!
-These arrays can each be represented as a 1D array of floats:
-
-`{ A_00, A_01, A_02, A_03, A_04, A_10, A_11, A_12, ... }`
-
-You should also create cleanup method(s) to free the CPU and GPU memory you
-allocated. Don't forget to initialize and cleanup in main!
-
-### 1.3. Creating CUDA Kernels
-
-Given 5x5 matrices A, B, and C (each represented as above), implement the
-following functions as CUDA kernels (`__global__`):
-
-* `mat_add(A, B, C)`: `C` is overwritten with the result of `A + B`
-* `mat_sub(A, B, C)`: `C` is overwritten with the result of `A - B`
-* `mat_mul(A, B, C)`: `C` is overwritten with the result of `A * B`
-
-You should write some tests to make sure that the results of these operations
-are as you expect.
-
-Tips:
-
-* `__global__` and `__device__` functions only have access to memory that is
-  stored on the device. Any data that you want to use on the CPU or GPU must
-  exist in the right memory space. If you need to move data, you can use
-  `cudaMemcpy`.
-* The triple angle brackets `<<< >>>` provide parameters to the CUDA kernel
-  invocation: `<<<blocks_per_tile, threads_per_block, ...>>>`.
-* Don't worry if your IDE doesn't understand some CUDA syntax (e.g.
-  `__device__` or `<<< >>>`). By default, it may not understand CUDA
-  extensions.
+Part2 contains an even more basic matrix math library that provides addition,
+subtraction, and multiplication.
 
+## TODO
+- [ ] write tests for matrix operations
+- [ ] performance analysis
+- [ ] respond to questions
 
 ## Part 3: Performance Analysis
 
@@ -197,32 +57,3 @@ For Part 1, there are two ways to measure performance:
 * Part 1: How does changing the number of planets affect performance? Why?
 * Part 2: Without running comparisons of CPU code vs. GPU code, how would you
   expect the performance to compare? Why? What might be the trade-offs?
-
-**NOTE: Nsight performance analysis tools *cannot* presently be used on the lab
-computers, as they require administrative access.** If you do not have access
-to a CUDA-capable computer, the lab computers still allow you to do timing
-mesasurements! However, the tools are very useful for performance debugging.
-
-
-## Part 4: Write-up
-
-1. Update all of the TODOs at the top of this README.
-2. Add your performance analysis.
-
-
-## Submit
-
-If you have modified any of the `CMakeLists.txt` files at all (aside from the
-list of `SOURCE_FILES`), you must test that your project can build in Moore
-100B/C. Beware of any build issues discussed on the Google Group.
-
-1. Open a GitHub pull request so that we can see that you have finished.
-   The title should be "Submission: YOUR NAME".
-2. Send an email to the TA (gmail: kainino1+cis565@) with:
-   * **Subject**: in the form of `[CIS565] Project 0: PENNKEY`
-   * Direct link to your pull request on GitHub
-   * In the form of a grade (0-100+), evaluate your own performance on the
-     project.
-   * Feedback on the project itself, if any.
-
-And you're done!
diff --git a/images/nbody.gif b/images/nbody.gif