Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 1: Xueyin Wan #24

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Prev Previous commit
Next Next commit
Finish All!
Xueyin committed Sep 14, 2016
commit d9881b5f6b0497f19310d14a3223db8833af4930
63 changes: 58 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,63 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

####University of Pennsylvania
####CIS 565: GPU Programming and Architecture

##Project 1 - Flocking

* Xueyin Wan
* Tested on: Windows 10, i7-4870 @ 2.50GHz 16GB, NVIDIA GeForce GT 750M 2GB (Personal Laptop)

### (TODO: Your README)
==================================================================
###Final Result Screenshot
![alt text](https://github.com/xueyinw/Project1-CUDA-Flocking/blob/master/images/Xueyin_Performance.gif "Xueyin's Performance Analysis")

####Parameters:
* Number of boids = 15000
* dT = 0.2
* Algorithm used in the screenshot : Coherent Uniform Grid
* BlockSize = 128
* rule1Distance = 5.0f, rule1Scale = 0.01f
* rule2Distance = 3.0f, rule2Scale = 0.1f
* rule3Distance = 5.0f, rule3Scale = 0.1f
* maxSpeed = 1.0f
* scene_scale = 100.0f

==================================================================
###Performance Analysis


I choose use 1st method : Disable visualization (#define VISUALIZE to 0 ) to measure performance.
###Without Visualization
####(#define VISUALIZE 0)
| Algorithm | Number of boids | Framerate (FPS) |
| ------------- |:-------------:| -----:|
| Brute Force neighbor search | 5000 |57 |
| Uniform Grid neighbor search | 5000 | 580 |
| Coherent Uniform Grid neighbor search | 5000 | 680 |

###With Visualization
####(#define VISUALIZE 1)
| Algorithm | Framerate (FPS) | Max Boid Count |
| ------------- |:-------------:| -----:|
| Brute Force neighbor search | 60 | 5000 |
| Uniform Grid neighbor search | 60 | 80000 |
| Coherent Uniform Grid neighbor search | 60 | 100000 |

###Questions & Answer
####1. For each implementation, how does changing the number of boids affect performance? Why do you think this is?
Answer:

* Brute Force neighbor search algorithm: as the number of boids increases, frame-rate decreases very fast
* Uniform Grid neighbor search: the number of boids could as many as almost 80000 as the fps keeps at 60, performance is much better than Brute Force neighbor search algorithm.
* Coherent Uniform Grid neighbor search: the number of boids could as many as almost 100000 as the fps keeps at 60, performance is much better than Brute Force neighbor search algorithm and little better than Uniform Grid neighbor search.
####2.For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

Answer:

* Generally speaking, when block count decreases and block size increases , the performance will be better.
* But in order to get a great performance, we should balance between block count and block size, and set their value wisely in order to improve memory perfomance.

####3. For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?
Answer:

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
* My answer is yes. As my first two tables at Performance Analysis part, we can see that Coherent Uniform Grid neighbor search is better than Uniform Grid neighbor search. When writing codes to implement Coherent Uniform Grid neighbor search in part 2.3 , I rearranged the boid data itself so that all the velocities and positions of boids in one cell were also contiguous in memory, so this data can be accessed directly and much more convenient than Uniform Grid neighbor search in part 2.1 . The result is as I expected, since GPU performance will be better when dealing with continuous memory.
2 changes: 1 addition & 1 deletion src/main.cpp
Original file line number Diff line number Diff line change
@@ -18,7 +18,7 @@
#define COHERENT_GRID 1

// LOOK-1.2 - change this to adjust particle count in the simulation
const int N_FOR_VIS = 15000;
const int N_FOR_VIS = 5000;
const float DT = 0.2f;

/**