Skip to content

Commit

Permalink
Minor README updats.
Browse files Browse the repository at this point in the history
  • Loading branch information
terrynsun committed Sep 8, 2015
1 parent 40dd226 commit ea79aaa
Showing 1 changed file with 16 additions and 6 deletions.
22 changes: 16 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,28 @@ Terry Sun; Arch Linux, Intel i5-4670, GTX 750

![](images/nbody_perf_plot.png)

The graph shows time taken (in ms) to update one frame at block sizes from 16
to 1024 in steps of 8, for various values of N (planets in the system).

I measured performance by disabling visualization and using `CudaEvent`s to time
the kernel invocations (measuring the time elapsed for both `kernUpdateVelPos`
and `kernUpdateAcc`). The graph shows time elapsed (in ms) to update one frame
at block sizes from 16 to 1024 in steps of 8.
and `kernUpdateAcc`). The recorded value is an average over 100 frames.

Code for performance measuring can be found on the `performance` branch.

Changing the number of planets, as expected, increases the time elapsed for the
kernels, due to a for-loop in the acceleration calculation (which increases
linearly by the number of total planets in the system. More interestingly, it
kernels, due to a for-loop in the acceleration calculation (which increases the
time with the number of total planets in the system). More interestingly, it
also changes the way that performance reacts to block size (see n=4096 in the
above plot).
above plot). The difference in performance as block size changes is much greater
with greater N, and also exhibits different behaviors.

At certain block sizes, the time per frame sharply decreases, such as at N=4096,
block size=1024, 512, 256, 128. These are points where each block would be
saturated (ie. no threads are started that are not needed).

I have no idea what's going on with the spikes peaking at N=4096, block size~800
or N=3072, block size~600.

# Part2: An Even More Basic Matrix Library

Expand All @@ -41,4 +51,4 @@ host), which are also linear time operations.
However, matrix multiplication is a O(n^{1.5}) operation on a CPU and becomes a
O(n) operation on a GPU (becoming O(3n) after taking into account the 2x memory
copy). So I would expect multiplication to exhibit much better performance on
the GPU for larger matrices.
the GPU (except on very small matrices).

0 comments on commit ea79aaa

Please sign in to comment.