diff --git a/README.md b/README.md
index f1e4738..24e40ff 100644
--- a/README.md
+++ b/README.md
@@ -3,377 +3,10 @@ CUDA Rasterizer
 
 **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
+* Sally Kong
+* Tested on: Windows 8, i7-5500U CPU @ 2.40GHz 2.40 GHz, GEForce 920M (Personal)
 
-### (TODO: Your README)
+A simplified rasterized graphics pipeline, similar to the OpenGL pipeline. I implemented a vertex shader, primitive assembly, rasterization, and a fragment shader (lambert).
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
-
-
-Instructions (delete me)
-========================
-
-This is due Sunday, October 11, evening at midnight.
-
-**Summary:** 
-In this project, you will use CUDA to implement a simplified
-rasterized graphics pipeline, similar to the OpenGL pipeline. You will
-implement vertex shading, primitive assembly, rasterization, fragment shading,
-and a framebuffer. More information about the rasterized graphics pipeline can
-be found in the class slides and in the CIS 560 lecture notes.
-
-The base code provided includes an OBJ loader and much of the I/O and
-bookkeeping code. It also includes some functions that you may find useful,
-described below. The core rasterization pipeline is left for you to implement.
-
-You are not required to use this base code if you don't want
-to. You may also change any part of the base code as you please.
-**This is YOUR project.**
-
-**Recommendation:**
-Every image you save should automatically get a different
-filename. Don't delete all of them! For the benefit of your README, keep a
-bunch of them around so you can pick a few to document your progress.
-
-
-### Contents
-
-* `src/` C++/CUDA source files.
-* `util/` C++ utility files.
-* `objs/` Example OBJ test files (# verts, # tris in buffers after loading)
-  * `tri.obj` (3v, 1t): The simplest possible geometric object.
-  * `cube.obj` (36v, 12t): A small model with low depth-complexity.
-  * `suzanne.obj` (2904 verts, 968 tris): A medium model with low depth-complexity.
-  * `suzanne_smooth.obj` (2904 verts, 968 tris): A medium model with low depth-complexity.
-    This model has normals which must be interpolated.
-  * `cow.obj` (17412 verts, 5804 tris): A large model with low depth-complexity.
-  * `cow_smooth.obj` (17412 verts, 5804 tris): A large model with low depth-complexity.
-    This model has normals which must be interpolated.
-  * `flower.obj` (1920 verts, 640 tris): A medium model with very high depth-complexity.
-  * `sponza.obj` (837,489 verts, 279,163 tris): A huge model with very high depth-complexity.
-* `renders/` Debug render of an example OBJ.
-* `external/` Includes and static libraries for 3rd party libraries.
-
-### Running the code
-
-The main function requires a scene description file. Call the program with
-one as an argument: `cis565_rasterizer objs/cow.obj`.
-(In Visual Studio, `../objs/cow.obj`.)
-
-If you are using Visual Studio, you can set this in the Debugging > Command
-Arguments section in the Project properties. Note that this value is different
-for every different configuration type. Make sure you get the path right; read
-the console for errors.
-
-## Requirements
-
-**Ask on the mailing list for any clarifications.**
-
-In this project, you are given the following code:
-
-* A library for loading standard Alias/Wavefront `.obj` format mesh
-  files and converting them to OpenGL-style buffers of index and vertex data.
-  * This library does NOT read materials, and provides all colors as white by
-    default. You can use another library if you wish.
-* Simple structs for some parts of the pipeline.
-* Depth buffer to framebuffer copy.
-* CUDA-GL interop.
-
-You will need to implement the following features/pipeline stages:
-
-* Vertex shading.
-* (Vertex shader) perspective transformation.
-* Primitive assembly with support for triangles read from buffers of index and
-  vertex data.
-* Rasterization.
-* Fragment shading.
-* A depth buffer for storing and depth testing fragments.
-* Fragment to depth buffer writing (**with** atomics for race avoidance).
-* (Fragment shader) simple lighting scheme, such as Lambert or Blinn-Phong.
-
-See below for more guidance.
-
-You are also required to implement at least 3.0 "points" worth in extra features.
-(point values are given in parentheses):
-
-* (1.0) Tile-based pipeline.
-* Additional pipeline stages.
-   * (1.0) Tessellation shader.
-   * (1.0) Geometry shader, able to output a variable number of primitives per
-     input primitive, optimized using stream compaction (thrust allowed).
-   * (0.5 **if not doing geometry shader**) Backface culling, optimized using
-     stream compaction (thrust allowed).
-   * (1.0) Transform feedback.
-   * (0.5) Scissor test.
-   * (0.5) Blending (when writing into framebuffer).
-* (1.0) Instancing: draw one set of vertex data multiple times, each run
-  through the vertex shader with a different ID.
-* (0.5) Correct color interpolation between points on a primitive.
-* (1.0) UV texture mapping with bilinear texture filtering and perspective
-  correct texture coordinates.
-* Support for rasterizing additional primitives:
-   * (0.5) Lines or line strips.
-   * (0.5) Points.
-* (1.0) Anti-aliasing.
-* (1.0) Occlusion queries.
-* (1.0) Order-independent translucency using a k-buffer.
-* (0.5) **Mouse**-based interactive camera support.
-
-This extra feature list is not comprehensive. If you have a particular idea
-you would like to implement, please **contact us first**.
-
-**IMPORTANT:**
-For each extra feature, please provide the following brief analysis:
-
-* Concise overview write-up of the feature.
-* Performance impact of adding the feature (slower or faster).
-* If you did something to accelerate the feature, what did you do and why?
-* How might this feature be optimized beyond your current implementation?
-
-
-## Base Code Tour
-
-You will be working primarily in two files: `rasterize.cu`, and
-`rasterizeTools.h`. Within these files, areas that you need to complete are
-marked with a `TODO` comment. Areas that are useful to and serve as hints for
-optional features are marked with `TODO (Optional)`. Functions that are useful
-for reference are marked with the comment `CHECKITOUT`. **You should look at
-all TODOs and CHECKITOUTs before starting!** There are not many.
-
-* `src/rasterize.cu` contains the core rasterization pipeline. 
-  * A few pre-made structs are included for you to use, but those marked with
-    TODO will also be needed for a simple rasterizer. As with any part of the
-    base code, you may modify or replace these as you see fit.
-
-* `src/rasterizeTools.h` contains various useful tools
-  * Includes a number of barycentric coordinate related functions that you may
-    find useful in implementing scanline based rasterization.
-
-* `util/utilityCore.hpp` serves as a kitchen-sink of useful functions.
-
-
-## Rasterization Pipeline
-
-Possible pipelines are described below. Pseudo-type-signatures are given.
-Not all of the pseudocode arrays will necessarily actually exist in practice.
-
-### First-Try Pipeline
-
-This describes a minimal version of *one possible* graphics pipeline, similar
-to modern hardware (DX/OpenGL). Yours need not match precisely.  To begin, try
-to write a minimal amount of code as described here. Verify some output after
-implementing each pipeline step. This will reduce the necessary time spent
-debugging.
-
-Start out by testing a single triangle (`tri.obj`).
-
-* Clear the depth buffer with some default value.
-* Vertex shading: 
-  * `VertexIn[n] vs_input -> VertexOut[n] vs_output`
-  * A minimal vertex shader will apply no transformations at all - it draws
-    directly in normalized device coordinates (-1 to 1 in each dimension).
-* Primitive assembly.
-  * `VertexOut[n] vs_output -> Triangle[n/3] primitives`
-  * Start by supporting ONLY triangles. For a triangle defined by indices
-    `(a, b, c)` into `VertexOut` array `vo`, simply copy the appropriate values
-    into a `Triangle` object `(vo[a], vo[b], vo[c])`.
-* Rasterization.
-  * `Triangle[n/3] primitives -> FragmentIn[m] fs_input`
-  * A scanline implementation is simpler to start with.
-* Fragment shading.
-  * `FragmentIn[m] fs_input -> FragmentOut[m] fs_output`
-  * A super-simple test fragment shader: output same color for every fragment.
-    * Also try displaying various debug views (normals, etc.)
-* Fragments to depth buffer.
-  * `FragmentOut[m] -> FragmentOut[width][height]`
-  * Results in race conditions - don't bother to fix these until it works!
-  * Can really be done inside the fragment shader, if you call the fragment
-    shader from the rasterization kernel for every fragment (including those
-    which get occluded). **OR,** this can be done before fragment shading, which
-    may be faster but means the fragment shader cannot change the depth.
-* A depth buffer for storing and depth testing fragments.
-  * `FragmentOut[width][height] depthbuffer`
-  * An array of `fragment` objects.
-  * At the end of a frame, it should contain the fragments drawn to the screen.
-* Fragment to framebuffer writing.
-  * `FragmentOut[width][height] depthbuffer -> vec3[width][height] framebuffer`
-  * Simply copies the colors out of the depth buffer into the framebuffer
-    (to be displayed on the screen).
-
-### A Useful Pipeline
-
-* Clear the depth buffer with some default value.
-* Vertex shading: 
-  * `VertexIn[n] vs_input -> VertexOut[n] vs_output`
-  * Apply some vertex transformation (e.g. model-view-projection matrix using
-    `glm::lookAt ` and `glm::perspective `).
-* Primitive assembly.
-  * `VertexOut[n] vs_output -> Triangle[n/3] primitives`
-  * As above.
-  * Other primitive types are optional.
-* Rasterization.
-  * `Triangle[n/3] primitives -> FragmentIn[m] fs_input`
-  * You may choose to do a tiled rasterization method, which should have lower
-    global memory bandwidth.
-  * A scanline optimization: when rasterizing a triangle, only scan over the
-    box around the triangle (`getAABBForTriangle`).
-* Fragment shading.
-  * `FragmentIn[m] fs_input -> FragmentOut[m] fs_output`
-  * Add a shading method, such as Lambert or Blinn-Phong. Lights can be defined
-    by kernel parameters (like GLSL uniforms).
-* Fragments to depth buffer.
-  * `FragmentOut[m] -> FragmentOut[width][height]`
-  * Can really be done inside the fragment shader, if you call the fragment
-    shader from the rasterization kernel for every fragment (including those
-    which get occluded). **OR,** this can be done before fragment shading, which
-    may be faster but means the fragment shader cannot change the depth.
-    * This result in an optimization: it allows you to do depth tests before
-     spending execution time in complex fragment shader code!
-  * Handle race conditions! Since multiple primitives write fragments to the
-    same fragment in the depth buffer, races must be avoided by using CUDA
-    atomics.
-    * *Approach 1:* Lock the location in the depth buffer during the time that
-      a thread is comparing old and new fragment depths (and possibly writing
-      a new fragment). This should work in all cases, but be slower.
-      See the section below on implementing this.
-    * *Approach 2:* Convert your depth value to a fixed-point `int`, and use
-      `atomicMin` to store it into an `int`-typed depth buffer `intdepth`. After
-      that, the value which is stored at `intdepth[i]` is (usually) that of the
-      fragment which should be stored into the `fragment` depth buffer.
-      * This may result in some rare race conditions (e.g. across blocks).
-    * The `flower.obj` test file is good for testing race conditions.
-* A depth buffer for storing and depth testing fragments.
-  * `FragmentOut[width][height] depthbuffer`
-  * An array of `fragment` objects.
-  * At the end of a frame, it should contain the fragments drawn to the screen.
-* Fragment to framebuffer writing.
-  * `FragmentOut[width][height] depthbuffer -> vec3[width][height] framebuffer`
-  * Simply copies the colors out of the depth buffer into the framebuffer
-    (to be displayed on the screen).
-
-This is a suggested sequence of pipeline steps, but you may choose to alter the
-order of this sequence or merge entire kernels as you see fit.  For example, if
-you decide that doing has benefits, you can choose to merge the vertex shader
-and primitive assembly kernels, or merge the perspective transform into another
-kernel. There is not necessarily a right sequence of kernels, and you may
-choose any sequence that works.  Please document in your README what sequence
-you choose and why.
-
-
-## Resources
-
-### CUDA Mutexes
-
-Adapted from
-[this StackOverflow question](http://stackoverflow.com/questions/21341495/cuda-mutex-and-atomiccas).
-
-```
-__global__ void kernelFunction(...) {
-    // Get a pointer to the mutex, which should be 0 right now.
-    unsigned int *mutex = ...;
-
-    // Loop-wait until this thread is able to execute its critical section.
-    bool isSet;
-    do {
-        isSet = (atomicCAS(mutex, 0, 1) == 0);
-        if (isSet) {
-            // Critical section goes here.
-            // The critical section MUST be inside the wait loop;
-            // if it is afterward, a deadlock will occur.
-        }
-        if (isSet) {
-            mutex = 0;
-        }
-    } while (!isSet);
-}
-```
-
-### Links
-
-The following resources may be useful for this project.
-
-* Line Rasterization slides, MIT EECS 6.837, Teller and Durand
-  * [Slides](http://groups.csail.mit.edu/graphics/classes/6.837/F02/lectures/6.837-7_Line.pdf)
-* High-Performance Software Rasterization on GPUs
-  * [Paper (HPG 2011)](http://www.tml.tkk.fi/~samuli/publications/laine2011hpg_paper.pdf)
-  * [Code](http://code.google.com/p/cudaraster/)
-  * Note that looking over this code for reference with regard to the paper is
-    fine, but we most likely will not grant any requests to actually
-    incorporate any of this code into your project.
-  * [Slides](http://bps11.idav.ucdavis.edu/talks/08-gpuSoftwareRasterLaineAndPantaleoni-BPS2011.pdf)
-* The Direct3D 10 System (SIGGRAPH 2006) - for those interested in doing
-  geometry shaders and transform feedback
-  * [Paper](http://dl.acm.org/citation.cfm?id=1141947)
-  * [Paper, through Penn Libraries proxy](http://proxy.library.upenn.edu:2247/citation.cfm?id=1141947)
-* Multi-Fragment Eﬀects on the GPU using the k-Buﬀer - for those who want to do
-  order-independent transparency using a k-buffer
-  * [Paper](http://www.inf.ufrgs.br/~comba/papers/2007/kbuffer_preprint.pdf)
-* FreePipe: A Programmable, Parallel Rendering Architecture for Efficient
-  Multi-Fragment Effects (I3D 2010)
-  * [Paper](https://sites.google.com/site/hmcen0921/cudarasterizer)
-* Writing A Software Rasterizer In Javascript
-  * [Part 1](http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html)
-  * [Part 2](http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html)
-
-
-## Third-Party Code Policy
-
-* Use of any third-party code must be approved by asking on our Google Group.
-* If it is approved, all students are welcome to use it. Generally, we approve
-  use of third-party code that is not a core part of the project. For example,
-  for the path tracer, we would approve using a third-party library for loading
-  models, but would not approve copying and pasting a CUDA function for doing
-  refraction.
-* Third-party code **MUST** be credited in README.md.
-* Using third-party code without its approval, including using another
-  student's code, is an academic integrity violation, and will, at minimum,
-  result in you receiving an F for the semester.
-
-
-## README
-
-Replace the contents of this README.md in a clear manner with the following:
-
-* A brief description of the project and the specific features you implemented.
-* At least one screenshot of your project running.
-* A 30 second or longer video of your project running.
-* A performance analysis (described below).
-
-### Performance Analysis
-
-The performance analysis is where you will investigate how to make your CUDA
-programs more efficient using the skills you've learned in class. You must have
-performed at least one experiment on your code to investigate the positive or
-negative effects on performance. 
-
-We encourage you to get creative with your tweaks. Consider places in your code
-that could be considered bottlenecks and try to improve them. 
-
-Provide summary of your optimizations (no more than one page), along with
-tables and or graphs to visually explain any performance differences.
-
-* Include a breakdown of time spent in each pipeline stage for a few different
-  models. It is suggested that you use pie charts or 100% stacked bar charts.
-* For optimization steps (like backface culling), include a performance
-  comparison to show the effectiveness.
-
-
-## Submit
-
-If you have modified any of the `CMakeLists.txt` files at all (aside from the
-list of `SOURCE_FILES`), you must test that your project can build in Moore
-100B/C. Beware of any build issues discussed on the Google Group.
-
-1. Open a GitHub pull request so that we can see that you have finished.
-   The title should be "Submission: YOUR NAME".
-   * **ADDITIONALLY:**
-     In the body of the pull request, include a link to your repository.
-2. Send an email to the TA (gmail: kainino1+cis565@) with:
-   * **Subject**: in the form of `[CIS565] Project N: PENNKEY`.
-   * Direct link to your pull request on GitHub.
-   * Estimate the amount of time you spent on the project.
-   * If there were any outstanding problems, or if you did any extra
-     work, *briefly* explain.
-   * Feedback on the project itself, if any.
+### Sample Image of a cow
+![](renders/cow.PNG)
diff --git a/renders/cow.PNG b/renders/cow.PNG
new file mode 100644
index 0000000..ff66c25
Binary files /dev/null and b/renders/cow.PNG differ
diff --git a/src/main.cpp b/src/main.cpp
index a125d7c..61eaa09 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -7,6 +7,7 @@
  */
 
 #include "main.hpp"
+#include "rasterizeTools.h"
 
 //-------------------------------
 //-------------MAIN--------------
@@ -159,7 +160,7 @@ void initCuda() {
     // Use device with highest Gflops/s
     cudaGLSetGLDevice(0);
 
-    rasterizeInit(width, height);
+	rasterizeInit(width, height);
 
     // Clean up on program exit
     atexit(cleanupCuda);
diff --git a/src/rasterize.cu b/src/rasterize.cu
index 53103b5..1d321ce 100644
--- a/src/rasterize.cu
+++ b/src/rasterize.cu
@@ -13,28 +13,56 @@
 #include <cuda.h>
 #include <thrust/random.h>
 #include <util/checkCUDAError.h>
-#include "rasterizeTools.h"
+#include <glm/glm.hpp>
+#include <glm/gtc/matrix_transform.hpp>
 
 struct VertexIn {
     glm::vec3 pos;
     glm::vec3 nor;
     glm::vec3 col;
+	glm::vec2 uv;
     // TODO (optional) add other vertex attributes (e.g. texture coordinates)
 };
 struct VertexOut {
     // TODO
+	glm::vec3 pos;
+    glm::vec3 nor;
+    glm::vec3 col;
+	glm::vec2 uv;
+
 };
 struct Triangle {
     VertexOut v[3];
 };
 struct Fragment {
     glm::vec3 color;
+	glm::vec3 position;
+	glm::vec3 normal;
+	int depth;
 };
 
+__host__ __device__ inline unsigned int utilhash(unsigned int a) {
+    a = (a + 0x7ed55d16) + (a << 12);
+    a = (a ^ 0xc761c23c) ^ (a >> 19);
+    a = (a + 0x165667b1) + (a << 5);
+    a = (a + 0xd3a2646c) ^ (a << 9);
+    a = (a + 0xfd7046c5) + (a << 3);
+    a = (a ^ 0xb55a4f09) ^ (a >> 16);
+    return a;
+}
+
+
+__host__ __device__
+thrust::default_random_engine makeSeededRandomEngine(int iter, int index, int depth) {
+    int h = utilhash((1 << 31) | (depth << 22) | iter) ^ utilhash(index);
+    return thrust::default_random_engine(h);
+}
+
 static int width = 0;
 static int height = 0;
 static int *dev_bufIdx = NULL;
 static VertexIn *dev_bufVertex = NULL;
+static VertexOut *dev_bufVertex_out = NULL;
 static Triangle *dev_primitives = NULL;
 static Fragment *dev_depthbuffer = NULL;
 static glm::vec3 *dev_framebuffer = NULL;
@@ -44,8 +72,7 @@ static int vertCount = 0;
 /**
  * Kernel that writes the image to the OpenGL PBO directly.
  */
-__global__
-void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) {
+__global__ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) {
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
     int index = x + (y * w);
@@ -64,8 +91,7 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) {
 }
 
 // Writes fragment colors to the framebuffer
-__global__
-void render(int w, int h, Fragment *depthbuffer, glm::vec3 *framebuffer) {
+__global__ void render(int w, int h, Fragment *depthbuffer, glm::vec3 *framebuffer) {
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
     int index = x + (y * w);
@@ -75,18 +101,150 @@ void render(int w, int h, Fragment *depthbuffer, glm::vec3 *framebuffer) {
     }
 }
 
+__global__ void depthBufferClearing(int w, int h, Fragment *fragments) {
+	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+    int index = x + (y * w);
+
+	if ( x < w && y < h) {
+		fragments[index].depth = INT_MAX;
+		fragments[index].color = glm::vec3(0.0f);
+	}
+}
+
+__global__ void vertexShading(int n, glm::mat4 view_projection,
+	VertexIn *vs_input, VertexOut *vs_output) {
+
+	int index = (blockIdx.x * blockDim.x) + threadIdx.x;
+    
+    if (index < n) {
+
+		VertexOut vert_out;
+		glm::vec4 input_pos = glm::vec4(
+			vs_input[index].pos.x, 
+			vs_input[index].pos.y, 
+			vs_input[index].pos.z, 
+			1.0f);
+
+		glm::vec3 transformedPoint = multiplyMV(view_projection, input_pos);
+		vert_out.pos = transformedPoint;
+
+		glm::vec4 input_normal = glm::vec4(
+			vs_input[index].nor.x, 
+			vs_input[index].nor.y, 
+			vs_input[index].nor.z, 
+			1.0f);
+
+		glm::vec3 output_normal = multiplyMV(view_projection,input_normal);
+		vert_out.nor = output_normal;
+
+		vert_out.col = vs_input[index].col;
+
+		vs_output[index] = vert_out;
+		
+	}
+}
+
+__global__ void primitiveAssembling(int n, VertexOut *vs_output,
+	Triangle *primitives) {
+
+	int index = (blockIdx.x * blockDim.x) + threadIdx.x;
+    
+    if (index < n) {
+		primitives[index].v[0] = vs_output[3*index];
+		primitives[index].v[1] = vs_output[3*index+1];
+		primitives[index].v[2] = vs_output[3*index+2];
+	}
+
+}
+
+__global__ void rasterizing(int n, int w, int h,
+	Triangle *primitives, Fragment *fs_input) {
+
+	int index = (blockIdx.x * blockDim.x) + threadIdx.x;
+    if (index < n) {
+		Triangle tri = primitives[index];
+		glm::vec3 tri_verts[3] = {tri.v[0].pos, tri.v[1].pos, tri.v[2].pos};
+		AABB aabb = getAABBForTriangle(tri_verts);
+
+		glm::vec2 pixel_min;
+		pixel_min.x = (aabb.min.x + 1) * w / 2.0f;
+		pixel_min.y = (aabb.min.y + 1) * h / 2.0f;
+
+		glm::vec2 pixel_max;
+		pixel_max.x = (aabb.max.x + 1) * w / 2.0f;
+		pixel_max.y = (aabb.max.y + 1) * h / 2.0f;
+		
+		for (int i = glm::max(0.0f, pixel_min.x); i <= pixel_max.x; i++) {
+			for (int j = glm::max(0.0f, pixel_min.y); j <= pixel_max.y; j++) {
+				
+				//thrust::default_random_engine rng = makeSeededRandomEngine(0, index, 0);
+				//thrust::uniform_real_distribution<float> u01(0, 1);
+
+				float x = (i/float(w)) * 2.0f - 1;
+				float y = (j/float(h)) * 2.0f - 1;
+
+				glm::vec3 barycentric = calculateBarycentricCoordinate(tri_verts,
+					glm::vec2(x,y));
+				if (isBarycentricCoordInBounds(barycentric)) {
+
+					int frag_index = j*w + i;
+					int depth = getZAtCoordinate(barycentric, tri_verts) * INT_MAX;
+					atomicMin(&fs_input[frag_index].depth, depth);
+					
+					if(fs_input[frag_index].depth == depth) {
+
+						Fragment frag;
+						frag.color = (primitives[index].v[0].col
+							+ primitives[index].v[1].col
+							+ primitives[index].v[2].col) / 3.0f;
+
+						frag.normal = (primitives[index].v[0].nor
+							+ primitives[index].v[1].nor
+							+ primitives[index].v[2].nor) / 3.0f;
+
+						frag.position = barycentric;
+						frag.depth = depth;
+
+						fs_input[frag_index] = frag;
+					}
+				}
+			}
+		}
+
+		
+	}
+}
+
+__global__ void fragmentShading(int w, int h, Fragment *fs, glm::vec3 light_pos) {
+	int x = (blockIdx.x * blockDim.x) + threadIdx.x;
+    int y = (blockIdx.y * blockDim.y) + threadIdx.y;
+    int index = x + (y * w);
+
+	if ( x < w && y < h) {
+		float diffuseTerm = 0.7f;
+		glm::vec3 light_color = glm::vec3(1.0f);
+		fs[index].color *= diffuseTerm * glm::max(0.0f, 
+			glm::dot(glm::normalize(fs[index].normal),
+			glm::normalize(light_pos - fs[index].position)));
+	}
+
+}
+
 /**
  * Called once at the beginning of the program to allocate memory.
  */
 void rasterizeInit(int w, int h) {
     width = w;
     height = h;
+	
     cudaFree(dev_depthbuffer);
     cudaMalloc(&dev_depthbuffer,   width * height * sizeof(Fragment));
     cudaMemset(dev_depthbuffer, 0, width * height * sizeof(Fragment));
     cudaFree(dev_framebuffer);
     cudaMalloc(&dev_framebuffer,   width * height * sizeof(glm::vec3));
     cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3));
+
     checkCUDAError("rasterizeInit");
 }
 
@@ -110,10 +268,14 @@ void rasterizeSetBuffers(
         bufVertex[i].nor = glm::vec3(bufNor[j + 0], bufNor[j + 1], bufNor[j + 2]);
         bufVertex[i].col = glm::vec3(bufCol[j + 0], bufCol[j + 1], bufCol[j + 2]);
     }
+
     cudaFree(dev_bufVertex);
     cudaMalloc(&dev_bufVertex, vertCount * sizeof(VertexIn));
     cudaMemcpy(dev_bufVertex, bufVertex, vertCount * sizeof(VertexIn), cudaMemcpyHostToDevice);
 
+	cudaFree(dev_bufVertex_out);
+    cudaMalloc(&dev_bufVertex_out, vertCount * sizeof(VertexOut));
+   
     cudaFree(dev_primitives);
     cudaMalloc(&dev_primitives, vertCount / 3 * sizeof(Triangle));
     cudaMemset(dev_primitives, 0, vertCount / 3 * sizeof(Triangle));
@@ -130,9 +292,49 @@ void rasterize(uchar4 *pbo) {
     dim3 blockCount2d((width  - 1) / blockSize2d.x + 1,
                       (height - 1) / blockSize2d.y + 1);
 
-    // TODO: Execute your rasterization pipeline here
-    // (See README for rasterization pipeline outline.)
+	dim3 blockSize1d(128);
+	dim3 blockCount1d((vertCount + 128 - 1) / 128);
+
+    //-----RATERIZATION PIPELINE----------
+
+	//---Clear Depth Buffer
+	depthBufferClearing<<<blockCount2d, blockSize2d>>>(width, height, dev_depthbuffer);
+	checkCUDAError("depth buffer clearing");
 
+	//---Vertex Shader
+	//view matrix 
+	glm::mat4 view = glm::lookAt(
+			glm::vec3(0.0f, 1.5f, 5.0f), 
+			glm::vec3(0.0f, 0.0f, -1.0f), 
+			glm::vec3(0.0f, -1.0f, 0.0f));
+	
+	//projection matrix
+	glm::mat4 projection = glm::perspective(
+		20.0f, float(width)/float(height), 1.0f, 100.0f);
+
+	glm::mat4 view_projection = projection * view;
+
+	vertexShading<<<blockCount1d, blockSize1d>>>(vertCount, view_projection,
+		dev_bufVertex, dev_bufVertex_out);
+	checkCUDAError("vertex shader");
+
+	//---Primitive Assembly
+	primitiveAssembling<<<blockCount1d, blockSize1d>>>(vertCount/3, 
+		dev_bufVertex_out, dev_primitives);
+	checkCUDAError("primitive assembling");
+
+	//---Rasterization
+	rasterizing<<<blockCount1d, blockSize1d>>>(vertCount/3, width, height,
+		dev_primitives, dev_depthbuffer);
+	checkCUDAError("triangle rasterizing");
+
+	//--Fragment Shader
+	glm::vec3 light_pos = glm::vec3(-3.0f, 5.0f, 10.0f);
+	fragmentShading<<<blockCount2d, blockSize2d>>>(width, height, dev_depthbuffer,
+		light_pos);
+	checkCUDAError("fragment shading");
+   
+	
     // Copy depthbuffer colors into framebuffer
     render<<<blockCount2d, blockSize2d>>>(width, height, dev_depthbuffer, dev_framebuffer);
     // Copy framebuffer into OpenGL buffer for OpenGL previewing
@@ -150,12 +352,15 @@ void rasterizeFree() {
     cudaFree(dev_bufVertex);
     dev_bufVertex = NULL;
 
+	cudaFree(dev_bufVertex_out);
+    dev_bufVertex_out = NULL;
+
     cudaFree(dev_primitives);
     dev_primitives = NULL;
 
     cudaFree(dev_depthbuffer);
     dev_depthbuffer = NULL;
-
+	
     cudaFree(dev_framebuffer);
     dev_framebuffer = NULL;
 
diff --git a/src/rasterize.h b/src/rasterize.h
index a06b339..0807f59 100644
--- a/src/rasterize.h
+++ b/src/rasterize.h
@@ -9,6 +9,8 @@
 #pragma once
 
 #include <glm/glm.hpp>
+#include <glm/gtc/matrix_transform.hpp>
+#include "rasterizeTools.h"
 
 void rasterizeInit(int width, int height);
 void rasterizeSetBuffers(
diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h
index 46c701e..53cc513 100644
--- a/src/rasterizeTools.h
+++ b/src/rasterizeTools.h
@@ -17,12 +17,24 @@ struct AABB {
     glm::vec3 max;
 };
 
+struct Camera {
+    glm::ivec2 resolution;
+    glm::vec3 position;
+    glm::vec3 target;
+    glm::vec3 up;
+    glm::vec2 fov;
+	float nearPlane;
+	float farPlane;
+};
+
 /**
  * Multiplies a glm::mat4 matrix and a vec4.
  */
 __host__ __device__ static
 glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) {
-    return glm::vec3(m * v);
+    //glm::vec4 product = m * v;
+    //return glm::vec3(product.x, product.y, product.z) / product.w;
+	return glm::vec3(m*v);
 }
 
 // CHECKITOUT
@@ -99,3 +111,4 @@ float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3])
            + barycentricCoord.y * tri[1].z
            + barycentricCoord.z * tri[2].z);
 }
+