Finish All!

CIS565-Fall-2016 · xueyinw · Sep 7, 2016 · Sep 9, 2016 · Sep 9, 2016 · Sep 9, 2016
commit d9881b5f6b0497f19310d14a3223db8833af4930
diff --git a/README.md b/README.md
@@ -1,10 +1,63 @@
-**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
-Project 1 - Flocking**
+
+####University of Pennsylvania
+####CIS 565: GPU Programming and Architecture
+
+##Project 1 - Flocking
 
 * Xueyin Wan
 * Tested on: Windows 10, i7-4870 @ 2.50GHz 16GB, NVIDIA GeForce GT 750M 2GB (Personal Laptop)
 
-### (TODO: Your README)
+==================================================================
+###Final Result Screenshot
+![alt text](https://github.com/xueyinw/Project1-CUDA-Flocking/blob/master/images/Xueyin_Performance.gif "Xueyin's Performance Analysis")
+
+####Parameters:
+* Number of boids = 15000
+* dT = 0.2
+* Algorithm used in the screenshot : Coherent Uniform Grid
+* BlockSize = 128
+* rule1Distance  = 5.0f,  rule1Scale = 0.01f
+* rule2Distance = 3.0f, rule2Scale = 0.1f
+* rule3Distance = 5.0f, rule3Scale = 0.1f
+* maxSpeed = 1.0f
+* scene_scale = 100.0f
+
+==================================================================
+###Performance Analysis
+
+
+I choose use 1st method : Disable visualization (#define VISUALIZE to 0 ) to  measure performance.
+###Without Visualization
+####(#define VISUALIZE 0)
+| Algorithm       | Number of boids            | Framerate (FPS) |
+| ------------- |:-------------:| -----:|
+| Brute Force neighbor search    | 5000 |57 |
+| Uniform Grid neighbor search     | 5000      |   580 |
+| Coherent Uniform Grid neighbor search | 5000      |   680 |
+
+###With Visualization
+####(#define VISUALIZE 1)
+| Algorithm       |  Framerate (FPS)          | Max Boid Count  |
+| ------------- |:-------------:| -----:|
+| Brute Force neighbor search    | 60 | 5000 |
+| Uniform Grid neighbor search     | 60    |   80000 |
+| Coherent Uniform Grid neighbor search | 60     |   100000 |
+
+###Questions & Answer
+####1. For each implementation, how does changing the number of boids affect performance? Why do you think this is?
+Answer:
+
+* Brute Force neighbor search algorithm: as the number of boids increases, frame-rate decreases very fast
+* Uniform Grid neighbor search: the number of boids could as many as almost  80000 as the fps keeps at 60, performance is much better than  Brute Force neighbor search algorithm.
+* Coherent Uniform Grid neighbor search: the number of boids could as many as almost  100000 as the fps keeps at 60, performance is much better than Brute Force neighbor search algorithm and little better than Uniform Grid neighbor search.
+####2.For each implementation, how does changing the block count and block size affect performance? Why do you think this is?
+
+Answer:
+
+* Generally speaking, when block count decreases and block size increases , the performance will be better.
+* But in order to get a great performance, we should balance between block count and block size, and set their value wisely in order to improve memory perfomance.
+
+####3. For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?
+Answer:
 
-Include screenshots, analysis, etc. (Remember, this is public, so don't put
-anything here that you don't want to share with the world.)
+* My answer is yes. As my first two tables at Performance Analysis part, we can see that Coherent Uniform Grid neighbor search is better than Uniform Grid neighbor search. When writing codes to implement Coherent Uniform Grid neighbor search in part 2.3 , I rearranged the boid data itself so that all the velocities and positions of boids in one cell were also contiguous in memory, so this data can be accessed directly and much more convenient than Uniform Grid neighbor search in part 2.1 .  The result is as I expected, since GPU performance will be better when dealing with continuous memory.
diff --git a/src/main.cpp b/src/main.cpp
@@ -18,7 +18,7 @@
 #define COHERENT_GRID 1
 
 // LOOK-1.2 - change this to adjust particle count in the simulation
-const int N_FOR_VIS = 15000;
+const int N_FOR_VIS = 5000;
 const float DT = 0.2f;
 
 /**