diff --git a/README.md b/README.md index f1e4738..24e40ff 100644 --- a/README.md +++ b/README.md @@ -3,377 +3,10 @@ CUDA Rasterizer **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Sally Kong +* Tested on: Windows 8, i7-5500U CPU @ 2.40GHz 2.40 GHz, GEForce 920M (Personal) -### (TODO: Your README) +A simplified rasterized graphics pipeline, similar to the OpenGL pipeline. I implemented a vertex shader, primitive assembly, rasterization, and a fragment shader (lambert). -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. - - -Instructions (delete me) -======================== - -This is due Sunday, October 11, evening at midnight. - -**Summary:** -In this project, you will use CUDA to implement a simplified -rasterized graphics pipeline, similar to the OpenGL pipeline. You will -implement vertex shading, primitive assembly, rasterization, fragment shading, -and a framebuffer. More information about the rasterized graphics pipeline can -be found in the class slides and in the CIS 560 lecture notes. - -The base code provided includes an OBJ loader and much of the I/O and -bookkeeping code. It also includes some functions that you may find useful, -described below. The core rasterization pipeline is left for you to implement. - -You are not required to use this base code if you don't want -to. You may also change any part of the base code as you please. -**This is YOUR project.** - -**Recommendation:** -Every image you save should automatically get a different -filename. Don't delete all of them! For the benefit of your README, keep a -bunch of them around so you can pick a few to document your progress. - - -### Contents - -* `src/` C++/CUDA source files. -* `util/` C++ utility files. -* `objs/` Example OBJ test files (# verts, # tris in buffers after loading) - * `tri.obj` (3v, 1t): The simplest possible geometric object. - * `cube.obj` (36v, 12t): A small model with low depth-complexity. - * `suzanne.obj` (2904 verts, 968 tris): A medium model with low depth-complexity. - * `suzanne_smooth.obj` (2904 verts, 968 tris): A medium model with low depth-complexity. - This model has normals which must be interpolated. - * `cow.obj` (17412 verts, 5804 tris): A large model with low depth-complexity. - * `cow_smooth.obj` (17412 verts, 5804 tris): A large model with low depth-complexity. - This model has normals which must be interpolated. - * `flower.obj` (1920 verts, 640 tris): A medium model with very high depth-complexity. - * `sponza.obj` (837,489 verts, 279,163 tris): A huge model with very high depth-complexity. -* `renders/` Debug render of an example OBJ. -* `external/` Includes and static libraries for 3rd party libraries. - -### Running the code - -The main function requires a scene description file. Call the program with -one as an argument: `cis565_rasterizer objs/cow.obj`. -(In Visual Studio, `../objs/cow.obj`.) - -If you are using Visual Studio, you can set this in the Debugging > Command -Arguments section in the Project properties. Note that this value is different -for every different configuration type. Make sure you get the path right; read -the console for errors. - -## Requirements - -**Ask on the mailing list for any clarifications.** - -In this project, you are given the following code: - -* A library for loading standard Alias/Wavefront `.obj` format mesh - files and converting them to OpenGL-style buffers of index and vertex data. - * This library does NOT read materials, and provides all colors as white by - default. You can use another library if you wish. -* Simple structs for some parts of the pipeline. -* Depth buffer to framebuffer copy. -* CUDA-GL interop. - -You will need to implement the following features/pipeline stages: - -* Vertex shading. -* (Vertex shader) perspective transformation. -* Primitive assembly with support for triangles read from buffers of index and - vertex data. -* Rasterization. -* Fragment shading. -* A depth buffer for storing and depth testing fragments. -* Fragment to depth buffer writing (**with** atomics for race avoidance). -* (Fragment shader) simple lighting scheme, such as Lambert or Blinn-Phong. - -See below for more guidance. - -You are also required to implement at least 3.0 "points" worth in extra features. -(point values are given in parentheses): - -* (1.0) Tile-based pipeline. -* Additional pipeline stages. - * (1.0) Tessellation shader. - * (1.0) Geometry shader, able to output a variable number of primitives per - input primitive, optimized using stream compaction (thrust allowed). - * (0.5 **if not doing geometry shader**) Backface culling, optimized using - stream compaction (thrust allowed). - * (1.0) Transform feedback. - * (0.5) Scissor test. - * (0.5) Blending (when writing into framebuffer). -* (1.0) Instancing: draw one set of vertex data multiple times, each run - through the vertex shader with a different ID. -* (0.5) Correct color interpolation between points on a primitive. -* (1.0) UV texture mapping with bilinear texture filtering and perspective - correct texture coordinates. -* Support for rasterizing additional primitives: - * (0.5) Lines or line strips. - * (0.5) Points. -* (1.0) Anti-aliasing. -* (1.0) Occlusion queries. -* (1.0) Order-independent translucency using a k-buffer. -* (0.5) **Mouse**-based interactive camera support. - -This extra feature list is not comprehensive. If you have a particular idea -you would like to implement, please **contact us first**. - -**IMPORTANT:** -For each extra feature, please provide the following brief analysis: - -* Concise overview write-up of the feature. -* Performance impact of adding the feature (slower or faster). -* If you did something to accelerate the feature, what did you do and why? -* How might this feature be optimized beyond your current implementation? - - -## Base Code Tour - -You will be working primarily in two files: `rasterize.cu`, and -`rasterizeTools.h`. Within these files, areas that you need to complete are -marked with a `TODO` comment. Areas that are useful to and serve as hints for -optional features are marked with `TODO (Optional)`. Functions that are useful -for reference are marked with the comment `CHECKITOUT`. **You should look at -all TODOs and CHECKITOUTs before starting!** There are not many. - -* `src/rasterize.cu` contains the core rasterization pipeline. - * A few pre-made structs are included for you to use, but those marked with - TODO will also be needed for a simple rasterizer. As with any part of the - base code, you may modify or replace these as you see fit. - -* `src/rasterizeTools.h` contains various useful tools - * Includes a number of barycentric coordinate related functions that you may - find useful in implementing scanline based rasterization. - -* `util/utilityCore.hpp` serves as a kitchen-sink of useful functions. - - -## Rasterization Pipeline - -Possible pipelines are described below. Pseudo-type-signatures are given. -Not all of the pseudocode arrays will necessarily actually exist in practice. - -### First-Try Pipeline - -This describes a minimal version of *one possible* graphics pipeline, similar -to modern hardware (DX/OpenGL). Yours need not match precisely. To begin, try -to write a minimal amount of code as described here. Verify some output after -implementing each pipeline step. This will reduce the necessary time spent -debugging. - -Start out by testing a single triangle (`tri.obj`). - -* Clear the depth buffer with some default value. -* Vertex shading: - * `VertexIn[n] vs_input -> VertexOut[n] vs_output` - * A minimal vertex shader will apply no transformations at all - it draws - directly in normalized device coordinates (-1 to 1 in each dimension). -* Primitive assembly. - * `VertexOut[n] vs_output -> Triangle[n/3] primitives` - * Start by supporting ONLY triangles. For a triangle defined by indices - `(a, b, c)` into `VertexOut` array `vo`, simply copy the appropriate values - into a `Triangle` object `(vo[a], vo[b], vo[c])`. -* Rasterization. - * `Triangle[n/3] primitives -> FragmentIn[m] fs_input` - * A scanline implementation is simpler to start with. -* Fragment shading. - * `FragmentIn[m] fs_input -> FragmentOut[m] fs_output` - * A super-simple test fragment shader: output same color for every fragment. - * Also try displaying various debug views (normals, etc.) -* Fragments to depth buffer. - * `FragmentOut[m] -> FragmentOut[width][height]` - * Results in race conditions - don't bother to fix these until it works! - * Can really be done inside the fragment shader, if you call the fragment - shader from the rasterization kernel for every fragment (including those - which get occluded). **OR,** this can be done before fragment shading, which - may be faster but means the fragment shader cannot change the depth. -* A depth buffer for storing and depth testing fragments. - * `FragmentOut[width][height] depthbuffer` - * An array of `fragment` objects. - * At the end of a frame, it should contain the fragments drawn to the screen. -* Fragment to framebuffer writing. - * `FragmentOut[width][height] depthbuffer -> vec3[width][height] framebuffer` - * Simply copies the colors out of the depth buffer into the framebuffer - (to be displayed on the screen). - -### A Useful Pipeline - -* Clear the depth buffer with some default value. -* Vertex shading: - * `VertexIn[n] vs_input -> VertexOut[n] vs_output` - * Apply some vertex transformation (e.g. model-view-projection matrix using - `glm::lookAt ` and `glm::perspective `). -* Primitive assembly. - * `VertexOut[n] vs_output -> Triangle[n/3] primitives` - * As above. - * Other primitive types are optional. -* Rasterization. - * `Triangle[n/3] primitives -> FragmentIn[m] fs_input` - * You may choose to do a tiled rasterization method, which should have lower - global memory bandwidth. - * A scanline optimization: when rasterizing a triangle, only scan over the - box around the triangle (`getAABBForTriangle`). -* Fragment shading. - * `FragmentIn[m] fs_input -> FragmentOut[m] fs_output` - * Add a shading method, such as Lambert or Blinn-Phong. Lights can be defined - by kernel parameters (like GLSL uniforms). -* Fragments to depth buffer. - * `FragmentOut[m] -> FragmentOut[width][height]` - * Can really be done inside the fragment shader, if you call the fragment - shader from the rasterization kernel for every fragment (including those - which get occluded). **OR,** this can be done before fragment shading, which - may be faster but means the fragment shader cannot change the depth. - * This result in an optimization: it allows you to do depth tests before - spending execution time in complex fragment shader code! - * Handle race conditions! Since multiple primitives write fragments to the - same fragment in the depth buffer, races must be avoided by using CUDA - atomics. - * *Approach 1:* Lock the location in the depth buffer during the time that - a thread is comparing old and new fragment depths (and possibly writing - a new fragment). This should work in all cases, but be slower. - See the section below on implementing this. - * *Approach 2:* Convert your depth value to a fixed-point `int`, and use - `atomicMin` to store it into an `int`-typed depth buffer `intdepth`. After - that, the value which is stored at `intdepth[i]` is (usually) that of the - fragment which should be stored into the `fragment` depth buffer. - * This may result in some rare race conditions (e.g. across blocks). - * The `flower.obj` test file is good for testing race conditions. -* A depth buffer for storing and depth testing fragments. - * `FragmentOut[width][height] depthbuffer` - * An array of `fragment` objects. - * At the end of a frame, it should contain the fragments drawn to the screen. -* Fragment to framebuffer writing. - * `FragmentOut[width][height] depthbuffer -> vec3[width][height] framebuffer` - * Simply copies the colors out of the depth buffer into the framebuffer - (to be displayed on the screen). - -This is a suggested sequence of pipeline steps, but you may choose to alter the -order of this sequence or merge entire kernels as you see fit. For example, if -you decide that doing has benefits, you can choose to merge the vertex shader -and primitive assembly kernels, or merge the perspective transform into another -kernel. There is not necessarily a right sequence of kernels, and you may -choose any sequence that works. Please document in your README what sequence -you choose and why. - - -## Resources - -### CUDA Mutexes - -Adapted from -[this StackOverflow question](http://stackoverflow.com/questions/21341495/cuda-mutex-and-atomiccas). - -``` -__global__ void kernelFunction(...) { - // Get a pointer to the mutex, which should be 0 right now. - unsigned int *mutex = ...; - - // Loop-wait until this thread is able to execute its critical section. - bool isSet; - do { - isSet = (atomicCAS(mutex, 0, 1) == 0); - if (isSet) { - // Critical section goes here. - // The critical section MUST be inside the wait loop; - // if it is afterward, a deadlock will occur. - } - if (isSet) { - mutex = 0; - } - } while (!isSet); -} -``` - -### Links - -The following resources may be useful for this project. - -* Line Rasterization slides, MIT EECS 6.837, Teller and Durand - * [Slides](http://groups.csail.mit.edu/graphics/classes/6.837/F02/lectures/6.837-7_Line.pdf) -* High-Performance Software Rasterization on GPUs - * [Paper (HPG 2011)](http://www.tml.tkk.fi/~samuli/publications/laine2011hpg_paper.pdf) - * [Code](http://code.google.com/p/cudaraster/) - * Note that looking over this code for reference with regard to the paper is - fine, but we most likely will not grant any requests to actually - incorporate any of this code into your project. - * [Slides](http://bps11.idav.ucdavis.edu/talks/08-gpuSoftwareRasterLaineAndPantaleoni-BPS2011.pdf) -* The Direct3D 10 System (SIGGRAPH 2006) - for those interested in doing - geometry shaders and transform feedback - * [Paper](http://dl.acm.org/citation.cfm?id=1141947) - * [Paper, through Penn Libraries proxy](http://proxy.library.upenn.edu:2247/citation.cfm?id=1141947) -* Multi-Fragment Effects on the GPU using the k-Buffer - for those who want to do - order-independent transparency using a k-buffer - * [Paper](http://www.inf.ufrgs.br/~comba/papers/2007/kbuffer_preprint.pdf) -* FreePipe: A Programmable, Parallel Rendering Architecture for Efficient - Multi-Fragment Effects (I3D 2010) - * [Paper](https://sites.google.com/site/hmcen0921/cudarasterizer) -* Writing A Software Rasterizer In Javascript - * [Part 1](http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html) - * [Part 2](http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html) - - -## Third-Party Code Policy - -* Use of any third-party code must be approved by asking on our Google Group. -* If it is approved, all students are welcome to use it. Generally, we approve - use of third-party code that is not a core part of the project. For example, - for the path tracer, we would approve using a third-party library for loading - models, but would not approve copying and pasting a CUDA function for doing - refraction. -* Third-party code **MUST** be credited in README.md. -* Using third-party code without its approval, including using another - student's code, is an academic integrity violation, and will, at minimum, - result in you receiving an F for the semester. - - -## README - -Replace the contents of this README.md in a clear manner with the following: - -* A brief description of the project and the specific features you implemented. -* At least one screenshot of your project running. -* A 30 second or longer video of your project running. -* A performance analysis (described below). - -### Performance Analysis - -The performance analysis is where you will investigate how to make your CUDA -programs more efficient using the skills you've learned in class. You must have -performed at least one experiment on your code to investigate the positive or -negative effects on performance. - -We encourage you to get creative with your tweaks. Consider places in your code -that could be considered bottlenecks and try to improve them. - -Provide summary of your optimizations (no more than one page), along with -tables and or graphs to visually explain any performance differences. - -* Include a breakdown of time spent in each pipeline stage for a few different - models. It is suggested that you use pie charts or 100% stacked bar charts. -* For optimization steps (like backface culling), include a performance - comparison to show the effectiveness. - - -## Submit - -If you have modified any of the `CMakeLists.txt` files at all (aside from the -list of `SOURCE_FILES`), you must test that your project can build in Moore -100B/C. Beware of any build issues discussed on the Google Group. - -1. Open a GitHub pull request so that we can see that you have finished. - The title should be "Submission: YOUR NAME". - * **ADDITIONALLY:** - In the body of the pull request, include a link to your repository. -2. Send an email to the TA (gmail: kainino1+cis565@) with: - * **Subject**: in the form of `[CIS565] Project N: PENNKEY`. - * Direct link to your pull request on GitHub. - * Estimate the amount of time you spent on the project. - * If there were any outstanding problems, or if you did any extra - work, *briefly* explain. - * Feedback on the project itself, if any. +### Sample Image of a cow +![](renders/cow.PNG) diff --git a/renders/cow.PNG b/renders/cow.PNG new file mode 100644 index 0000000..ff66c25 Binary files /dev/null and b/renders/cow.PNG differ diff --git a/src/main.cpp b/src/main.cpp index a125d7c..61eaa09 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -7,6 +7,7 @@ */ #include "main.hpp" +#include "rasterizeTools.h" //------------------------------- //-------------MAIN-------------- @@ -159,7 +160,7 @@ void initCuda() { // Use device with highest Gflops/s cudaGLSetGLDevice(0); - rasterizeInit(width, height); + rasterizeInit(width, height); // Clean up on program exit atexit(cleanupCuda); diff --git a/src/rasterize.cu b/src/rasterize.cu index 53103b5..1d321ce 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -13,28 +13,56 @@ #include #include #include -#include "rasterizeTools.h" +#include +#include struct VertexIn { glm::vec3 pos; glm::vec3 nor; glm::vec3 col; + glm::vec2 uv; // TODO (optional) add other vertex attributes (e.g. texture coordinates) }; struct VertexOut { // TODO + glm::vec3 pos; + glm::vec3 nor; + glm::vec3 col; + glm::vec2 uv; + }; struct Triangle { VertexOut v[3]; }; struct Fragment { glm::vec3 color; + glm::vec3 position; + glm::vec3 normal; + int depth; }; +__host__ __device__ inline unsigned int utilhash(unsigned int a) { + a = (a + 0x7ed55d16) + (a << 12); + a = (a ^ 0xc761c23c) ^ (a >> 19); + a = (a + 0x165667b1) + (a << 5); + a = (a + 0xd3a2646c) ^ (a << 9); + a = (a + 0xfd7046c5) + (a << 3); + a = (a ^ 0xb55a4f09) ^ (a >> 16); + return a; +} + + +__host__ __device__ +thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int depth) { + int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index); + return thrust::default_random_engine(h); +} + static int width = 0; static int height = 0; static int *dev_bufIdx = NULL; static VertexIn *dev_bufVertex = NULL; +static VertexOut *dev_bufVertex_out = NULL; static Triangle *dev_primitives = NULL; static Fragment *dev_depthbuffer = NULL; static glm::vec3 *dev_framebuffer = NULL; @@ -44,8 +72,7 @@ static int vertCount = 0; /** * Kernel that writes the image to the OpenGL PBO directly. */ -__global__ -void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { +__global__ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; int index = x + (y * w); @@ -64,8 +91,7 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { } // Writes fragment colors to the framebuffer -__global__ -void render(int w, int h, Fragment *depthbuffer, glm::vec3 *framebuffer) { +__global__ void render(int w, int h, Fragment *depthbuffer, glm::vec3 *framebuffer) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; int index = x + (y * w); @@ -75,18 +101,150 @@ void render(int w, int h, Fragment *depthbuffer, glm::vec3 *framebuffer) { } } +__global__ void depthBufferClearing(int w, int h, Fragment *fragments) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * w); + + if ( x < w && y < h) { + fragments[index].depth = INT_MAX; + fragments[index].color = glm::vec3(0.0f); + } +} + +__global__ void vertexShading(int n, glm::mat4 view_projection, + VertexIn *vs_input, VertexOut *vs_output) { + + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (index < n) { + + VertexOut vert_out; + glm::vec4 input_pos = glm::vec4( + vs_input[index].pos.x, + vs_input[index].pos.y, + vs_input[index].pos.z, + 1.0f); + + glm::vec3 transformedPoint = multiplyMV(view_projection, input_pos); + vert_out.pos = transformedPoint; + + glm::vec4 input_normal = glm::vec4( + vs_input[index].nor.x, + vs_input[index].nor.y, + vs_input[index].nor.z, + 1.0f); + + glm::vec3 output_normal = multiplyMV(view_projection,input_normal); + vert_out.nor = output_normal; + + vert_out.col = vs_input[index].col; + + vs_output[index] = vert_out; + + } +} + +__global__ void primitiveAssembling(int n, VertexOut *vs_output, + Triangle *primitives) { + + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (index < n) { + primitives[index].v[0] = vs_output[3*index]; + primitives[index].v[1] = vs_output[3*index+1]; + primitives[index].v[2] = vs_output[3*index+2]; + } + +} + +__global__ void rasterizing(int n, int w, int h, + Triangle *primitives, Fragment *fs_input) { + + int index = (blockIdx.x * blockDim.x) + threadIdx.x; + if (index < n) { + Triangle tri = primitives[index]; + glm::vec3 tri_verts[3] = {tri.v[0].pos, tri.v[1].pos, tri.v[2].pos}; + AABB aabb = getAABBForTriangle(tri_verts); + + glm::vec2 pixel_min; + pixel_min.x = (aabb.min.x + 1) * w / 2.0f; + pixel_min.y = (aabb.min.y + 1) * h / 2.0f; + + glm::vec2 pixel_max; + pixel_max.x = (aabb.max.x + 1) * w / 2.0f; + pixel_max.y = (aabb.max.y + 1) * h / 2.0f; + + for (int i = glm::max(0.0f, pixel_min.x); i <= pixel_max.x; i++) { + for (int j = glm::max(0.0f, pixel_min.y); j <= pixel_max.y; j++) { + + //thrust::default_random_engine rng = makeSeededRandomEngine(0, index, 0); + //thrust::uniform_real_distribution u01(0, 1); + + float x = (i/float(w)) * 2.0f - 1; + float y = (j/float(h)) * 2.0f - 1; + + glm::vec3 barycentric = calculateBarycentricCoordinate(tri_verts, + glm::vec2(x,y)); + if (isBarycentricCoordInBounds(barycentric)) { + + int frag_index = j*w + i; + int depth = getZAtCoordinate(barycentric, tri_verts) * INT_MAX; + atomicMin(&fs_input[frag_index].depth, depth); + + if(fs_input[frag_index].depth == depth) { + + Fragment frag; + frag.color = (primitives[index].v[0].col + + primitives[index].v[1].col + + primitives[index].v[2].col) / 3.0f; + + frag.normal = (primitives[index].v[0].nor + + primitives[index].v[1].nor + + primitives[index].v[2].nor) / 3.0f; + + frag.position = barycentric; + frag.depth = depth; + + fs_input[frag_index] = frag; + } + } + } + } + + + } +} + +__global__ void fragmentShading(int w, int h, Fragment *fs, glm::vec3 light_pos) { + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * w); + + if ( x < w && y < h) { + float diffuseTerm = 0.7f; + glm::vec3 light_color = glm::vec3(1.0f); + fs[index].color *= diffuseTerm * glm::max(0.0f, + glm::dot(glm::normalize(fs[index].normal), + glm::normalize(light_pos - fs[index].position))); + } + +} + /** * Called once at the beginning of the program to allocate memory. */ void rasterizeInit(int w, int h) { width = w; height = h; + cudaFree(dev_depthbuffer); cudaMalloc(&dev_depthbuffer, width * height * sizeof(Fragment)); cudaMemset(dev_depthbuffer, 0, width * height * sizeof(Fragment)); cudaFree(dev_framebuffer); cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); + checkCUDAError("rasterizeInit"); } @@ -110,10 +268,14 @@ void rasterizeSetBuffers( bufVertex[i].nor = glm::vec3(bufNor[j + 0], bufNor[j + 1], bufNor[j + 2]); bufVertex[i].col = glm::vec3(bufCol[j + 0], bufCol[j + 1], bufCol[j + 2]); } + cudaFree(dev_bufVertex); cudaMalloc(&dev_bufVertex, vertCount * sizeof(VertexIn)); cudaMemcpy(dev_bufVertex, bufVertex, vertCount * sizeof(VertexIn), cudaMemcpyHostToDevice); + cudaFree(dev_bufVertex_out); + cudaMalloc(&dev_bufVertex_out, vertCount * sizeof(VertexOut)); + cudaFree(dev_primitives); cudaMalloc(&dev_primitives, vertCount / 3 * sizeof(Triangle)); cudaMemset(dev_primitives, 0, vertCount / 3 * sizeof(Triangle)); @@ -130,9 +292,49 @@ void rasterize(uchar4 *pbo) { dim3 blockCount2d((width - 1) / blockSize2d.x + 1, (height - 1) / blockSize2d.y + 1); - // TODO: Execute your rasterization pipeline here - // (See README for rasterization pipeline outline.) + dim3 blockSize1d(128); + dim3 blockCount1d((vertCount + 128 - 1) / 128); + + //-----RATERIZATION PIPELINE---------- + + //---Clear Depth Buffer + depthBufferClearing<<>>(width, height, dev_depthbuffer); + checkCUDAError("depth buffer clearing"); + //---Vertex Shader + //view matrix + glm::mat4 view = glm::lookAt( + glm::vec3(0.0f, 1.5f, 5.0f), + glm::vec3(0.0f, 0.0f, -1.0f), + glm::vec3(0.0f, -1.0f, 0.0f)); + + //projection matrix + glm::mat4 projection = glm::perspective( + 20.0f, float(width)/float(height), 1.0f, 100.0f); + + glm::mat4 view_projection = projection * view; + + vertexShading<<>>(vertCount, view_projection, + dev_bufVertex, dev_bufVertex_out); + checkCUDAError("vertex shader"); + + //---Primitive Assembly + primitiveAssembling<<>>(vertCount/3, + dev_bufVertex_out, dev_primitives); + checkCUDAError("primitive assembling"); + + //---Rasterization + rasterizing<<>>(vertCount/3, width, height, + dev_primitives, dev_depthbuffer); + checkCUDAError("triangle rasterizing"); + + //--Fragment Shader + glm::vec3 light_pos = glm::vec3(-3.0f, 5.0f, 10.0f); + fragmentShading<<>>(width, height, dev_depthbuffer, + light_pos); + checkCUDAError("fragment shading"); + + // Copy depthbuffer colors into framebuffer render<<>>(width, height, dev_depthbuffer, dev_framebuffer); // Copy framebuffer into OpenGL buffer for OpenGL previewing @@ -150,12 +352,15 @@ void rasterizeFree() { cudaFree(dev_bufVertex); dev_bufVertex = NULL; + cudaFree(dev_bufVertex_out); + dev_bufVertex_out = NULL; + cudaFree(dev_primitives); dev_primitives = NULL; cudaFree(dev_depthbuffer); dev_depthbuffer = NULL; - + cudaFree(dev_framebuffer); dev_framebuffer = NULL; diff --git a/src/rasterize.h b/src/rasterize.h index a06b339..0807f59 100644 --- a/src/rasterize.h +++ b/src/rasterize.h @@ -9,6 +9,8 @@ #pragma once #include +#include +#include "rasterizeTools.h" void rasterizeInit(int width, int height); void rasterizeSetBuffers( diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h index 46c701e..53cc513 100644 --- a/src/rasterizeTools.h +++ b/src/rasterizeTools.h @@ -17,12 +17,24 @@ struct AABB { glm::vec3 max; }; +struct Camera { + glm::ivec2 resolution; + glm::vec3 position; + glm::vec3 target; + glm::vec3 up; + glm::vec2 fov; + float nearPlane; + float farPlane; +}; + /** * Multiplies a glm::mat4 matrix and a vec4. */ __host__ __device__ static glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) { - return glm::vec3(m * v); + //glm::vec4 product = m * v; + //return glm::vec3(product.x, product.y, product.z) / product.w; + return glm::vec3(m*v); } // CHECKITOUT @@ -99,3 +111,4 @@ float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3]) + barycentricCoord.y * tri[1].z + barycentricCoord.z * tri[2].z); } +