diff --git a/tf2.0/README.md b/tf2.0/README.md
new file mode 100644
index 000000000..5cbb13bc9
--- /dev/null
+++ b/tf2.0/README.md
@@ -0,0 +1,47 @@
+## Edge Machine Learning: Tensorflow Library 
+
+This directory includes, Tensorflow implementations of various techniques and
+algorithms developed as part of EdgeML. Currently, the following algorithms are
+available in Tensorflow:
+
+1. [Bonsai](../docs/publications/Bonsai.pdf)
+2. [EMI-RNN](../docs/publications/emi-rnn-nips18.pdf)
+3. [FastRNN & FastGRNN](../docs/publications/FastGRNN.pdf)
+4. [ProtoNN](../docs/publications/ProtoNN.pdf)
+
+The TensorFlow compute graphs for these algoriths are packaged as
+`edgeml.graph`. Trainers for these algorithms are in `edgeml.trainer`. Usage
+directions and examples for these algorithms are provided in `examples`
+directory. To get started with any of the provided algorithms, please follow
+the notebooks in the the `examples` directory.
+
+## Installation
+
+Use pip and the provided requirements file to first install required
+dependencies before installing the `edgeml` library. Details for cpu based
+installation and gpu based installation provided below.
+
+It is highly recommended that EdgeML be installed in a virtual environment. Please create
+a new virtual environment using your environment manager ([virtualenv](https://virtualenv.pypa.io/en/stable/userguide/#usage) or [Anaconda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands)).
+Make sure the new environment is active before running the below mentioned commands.
+
+### CPU
+
+``` 
+pip install -r requirements-cpu.txt
+pip install -e .
+```
+
+Tested on Python3.5 and python 2.7 with >= Tensorflow 1.6.0.
+
+### GPU
+
+Install appropriate CUDA and cuDNN [Tested with >= CUDA 8.1 and cuDNN >= 6.1]
+
+```
+pip install -r requirements-gpu.txt
+pip install -e .
+```
+
+Copyright (c) Microsoft Corporation. All rights reserved.
+Licensed under the MIT license.
diff --git a/tf2.0/docs/FastCells.md b/tf2.0/docs/FastCells.md
new file mode 100644
index 000000000..213dfe1cd
--- /dev/null
+++ b/tf2.0/docs/FastCells.md
@@ -0,0 +1,57 @@
+# FastRNN and FastGRNN - FastCells
+
+This document aims to explain and elaborate on specific details of FastCells 
+present as part of `tf/edgeml/graph/rnn.py`. The endpoint use case scripts with 
+3 phase training along with an example notebook are present in `tf/examples/FastCells/`.
+One can use the endpoint script to test out the RNN architectures on any dataset 
+while specifying budget constraints as part of hyper-parameters in terms of sparsity and rank 
+of weight matrices.
+
+# FastRNN
+![FastRNN](img/FastRNN.png)
+![FastRNN Equation](img/FastRNN_eq.png)
+
+# FastGRNN
+![FastGRNN Base Architecture](img/FastGRNN.png)
+![FastGRNN Base Equation](img/FastGRNN_eq.png)
+
+# Plug and Play Cells
+
+`FastRNNCell` and `FastGRNNCell` present in `edgeml.graph.rnn` are very similar to 
+Tensorflow's inbuilt `RNNCell`, `GRUCell`, `BasicLSTMCell`, and `UGRNNCell` allowing us to 
+replace any of the standard RNN Cell in our architecture with FastCells. 
+One can see the plug and play nature at the endpoint script for FastCells, where the graph 
+building is very similar to LSTM/GRU in Tensorflow. 
+
+Script: [Endpoint Script](../examples/FastCells/fastcell_example.py)
+
+Example Notebook: [iPython Notebook](../examples/FastCells/fastcell_example.ipynb)
+
+Cells: [FastRNNCell](../edgeml/graph/rnn.py#L206) and [FastGRNNCell](../edgeml/graph/rnn.py#L31).
+
+# 3 phase Fast Training
+
+`FastCells`, similar to `Bonsai` use a 3 phase training routine, to induce the right 
+support and sparsity for the weight matrices. With the low-rank parameterization of weights 
+followed by the 3 phase training, we obtain FastRNN and FastGRNN models which are compact 
+and they can be further compressed by using byte quantization without significant loss in accuracy.
+
+# Compression
+
+1) Low-Rank Parameterization of Weight Matrices (L)
+2) Sparsity (S)
+3) Quantization (Q)
+
+Low-rank is directly induced into the FastCells during initialization and the training happens with 
+the targetted low-rank versions of the weight matrices. One can use `wRank` and `uRank` parameters 
+of FastCells to achieve this.
+
+Sparsity is taken in as hyper-parameter during the 3 phase training into `fastTrainer.py` which at the 
+end spits out a sparse, low-rank model.
+
+Further compression is achieved by byte Quantization and can be performed using `quantizeFastModels.py` 
+script which is part of `tf/exampled/FastCells/`. This will give model size reduction of up to 4x if 8-bit
+integers are used. Lastly, to facilitate all integer arithmetic, including the non-linearities, one could 
+use `quantTanh` instead of `tanh` and `quantSigm` instead of `sigmoid` as the non-linearities in the RNN 
+Cells followed by byte quantization. These non-linearities can be set using the appropriate parameters in 
+the `FastRNNCell` and `FastGRNNCell`
diff --git a/tf2.0/docs/img/3PartsGraph.png b/tf2.0/docs/img/3PartsGraph.png
new file mode 100755
index 000000000..66ebdbcf9
Binary files /dev/null and b/tf2.0/docs/img/3PartsGraph.png differ
diff --git a/tf2.0/docs/img/FastGRNN.png b/tf2.0/docs/img/FastGRNN.png
new file mode 100644
index 000000000..2357165e7
Binary files /dev/null and b/tf2.0/docs/img/FastGRNN.png differ
diff --git a/tf2.0/docs/img/FastGRNN_eq.png b/tf2.0/docs/img/FastGRNN_eq.png
new file mode 100644
index 000000000..9df478954
Binary files /dev/null and b/tf2.0/docs/img/FastGRNN_eq.png differ
diff --git a/tf2.0/docs/img/FastRNN.png b/tf2.0/docs/img/FastRNN.png
new file mode 100644
index 000000000..d8826c493
Binary files /dev/null and b/tf2.0/docs/img/FastRNN.png differ
diff --git a/tf2.0/docs/img/FastRNN_eq.png b/tf2.0/docs/img/FastRNN_eq.png
new file mode 100644
index 000000000..bcf52cb29
Binary files /dev/null and b/tf2.0/docs/img/FastRNN_eq.png differ
diff --git a/tf2.0/docs/img/MIML_illustration.png b/tf2.0/docs/img/MIML_illustration.png
new file mode 100755
index 000000000..7c1ab5456
Binary files /dev/null and b/tf2.0/docs/img/MIML_illustration.png differ
diff --git a/tf2.0/edgeml/__init__.py b/tf2.0/edgeml/__init__.py
new file mode 100644
index 000000000..8ac062499
--- /dev/null
+++ b/tf2.0/edgeml/__init__.py
@@ -0,0 +1,13 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+'''
+package edgeml
+
+Provides: Bonsai, ProtoNN and BasicTrainer routines
+    for both
+'''
+
+# TODO Override the __all__ variable for the package
+# and limit the functions that are exposed.
+# Do not expose functions in utils - can be dangerous
diff --git a/tf2.0/edgeml/graph/__init__.py b/tf2.0/edgeml/graph/__init__.py
new file mode 100644
index 000000000..3d7ff8299
--- /dev/null
+++ b/tf2.0/edgeml/graph/__init__.py
@@ -0,0 +1,2 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
diff --git a/tf2.0/edgeml/graph/bonsai.py b/tf2.0/edgeml/graph/bonsai.py
new file mode 100644
index 000000000..10851a1fd
--- /dev/null
+++ b/tf2.0/edgeml/graph/bonsai.py
@@ -0,0 +1,180 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+import tensorflow as tf
+import numpy as np
+import warnings
+
+
+class Bonsai:
+    def __init__(self, numClasses, dataDimension, projectionDimension,
+                 treeDepth, sigma,
+                 isRegression=False, W=None, T=None, V=None, Z=None):
+        '''
+        Expected Dimensions:
+
+        Bonsai Params // Optional
+        W [numClasses*totalNodes, projectionDimension]
+        V [numClasses*totalNodes, projectionDimension]
+        Z [projectionDimension, dataDimension + 1]
+        T [internalNodes, projectionDimension]
+
+        internalNodes = 2**treeDepth - 1
+        totalNodes = 2*internalNodes + 1
+
+        sigma - tanh non-linearity
+        sigmaI - Indicator function for node probabilities
+        sigmaI - has to be set to infinity(1e9 for practicality)
+        while doing testing/inference
+        numClasses will be reset to 1 in binary case
+        '''
+        self.dataDimension = dataDimension
+        self.projectionDimension = projectionDimension
+        self.isRegression = isRegression
+
+        if ((self.isRegression == True) & (numClasses != 1)):
+            warnings.warn("Number of classes cannot be greater than 1 for regression")
+            self.numClasses = 1
+
+        if numClasses == 2:
+            self.numClasses = 1
+        else:
+            self.numClasses = numClasses
+
+        self.treeDepth = treeDepth
+        self.sigma = sigma
+
+        self.internalNodes = 2**self.treeDepth - 1
+        self.totalNodes = 2 * self.internalNodes + 1
+
+        self.W = self.initW(W)
+        self.V = self.initV(V)
+        self.T = self.initT(T)
+        self.Z = self.initZ(Z)
+
+        self.assertInit()
+
+        self.score = None
+        self.X_ = None
+        self.prediction = None
+
+    def initZ(self, Z):
+        if Z is None:
+            Z = tf.random.normal(
+                [self.projectionDimension, self.dataDimension])
+        Z = tf.Variable(Z, name='Z', dtype=tf.float32)
+        return Z
+
+    def initW(self, W):
+        if W is None:
+            W = tf.random.normal(
+                [self.numClasses * self.totalNodes, self.projectionDimension])
+        W = tf.Variable(W, name='W', dtype=tf.float32)
+        return W
+
+    def initV(self, V):
+        if V is None:
+            V = tf.random.normal(
+                [self.numClasses * self.totalNodes, self.projectionDimension])
+        V = tf.Variable(V, name='V', dtype=tf.float32)
+        return V
+
+    def initT(self, T):
+        if T is None:
+            T = tf.random.normal(
+                [self.internalNodes, self.projectionDimension])
+        T = tf.Variable(T, name='T', dtype=tf.float32)
+        return T
+
+    def __call__(self, X, sigmaI):
+        '''
+        Function to build the Bonsai Tree graph
+        Expected Dimensions
+
+        X is [_, self.dataDimension]
+        '''
+        errmsg = "Dimension Mismatch, X is [_, self.dataDimension]"
+        assert (len(X.shape) == 2 and int(
+            X.shape[1]) == self.dataDimension), errmsg
+        if self.score is not None:
+            return self.score, self.X_
+
+        X_ = tf.divide(tf.matmul(self.Z, X, transpose_b=True),
+                       self.projectionDimension)
+
+        W_ = self.W[0:(self.numClasses)]
+        V_ = self.V[0:(self.numClasses)]
+
+        self.__nodeProb = []
+        self.__nodeProb.append(1)
+
+        score_ = self.__nodeProb[0] * tf.multiply(
+            tf.matmul(W_, X_), tf.tanh(self.sigma * tf.matmul(V_, X_)))
+        for i in range(1, self.totalNodes):
+            W_ = self.W[i * self.numClasses:((i + 1) * self.numClasses)]
+            V_ = self.V[i * self.numClasses:((i + 1) * self.numClasses)]
+
+            T_ = tf.reshape(self.T[int(np.ceil(i / 2.0) - 1.0)],
+                            [-1, self.projectionDimension])
+            prob = (1 + ((-1)**(i + 1)) *
+                    tf.tanh(tf.multiply(sigmaI, tf.matmul(T_, X_))))
+
+            prob = tf.divide(prob, 2.0)
+            prob = self.__nodeProb[int(np.ceil(i / 2.0) - 1.0)] * prob
+            self.__nodeProb.append(prob)
+            score_ += self.__nodeProb[i] * tf.multiply(
+                tf.matmul(W_, X_), tf.tanh(self.sigma * tf.matmul(V_, X_)))
+
+        self.score = score_
+        self.X_ = X_
+        return self.score, self.X_
+
+    def getPrediction(self):
+        '''
+        Takes in a score tensor and outputs a integer class for each data point
+        '''
+
+        # Classification.
+        if (self.isRegression == False):
+            if self.prediction is not None:
+                return self.prediction
+
+            if self.numClasses > 2:
+                self.prediction = tf.argmax(input=tf.transpose(a=self.score), axis=1)
+            else:
+                self.prediction = tf.argmax(
+                    input=tf.concat([tf.transpose(a=self.score),
+                               0 * tf.transpose(a=self.score)], 1), axis=1)
+        # Regression.
+        elif (self.isRegression == True):
+            # For regression , scores are the actual predictions, just return them.
+            self.prediction = self.score
+
+        return self.prediction
+
+    def assertInit(self):
+        errmsg = "Number of Classes for regression can only be 1."
+        if (self.isRegression == True):
+            assert (self.numClasses == 1), errmsg
+        errRank = "All Parameters must has only two dimensions shape = [a, b]"
+        assert len(self.W.shape) == len(self.Z.shape), errRank
+        assert len(self.W.shape) == len(self.T.shape), errRank
+        assert len(self.W.shape) == 2, errRank
+        msg = "W and V should be of same Dimensions"
+        assert self.W.shape == self.V.shape, msg
+        errW = "W and V are [numClasses*totalNodes, projectionDimension]"
+        assert self.W.shape[0] == self.numClasses * self.totalNodes, errW
+        assert self.W.shape[1] == self.projectionDimension, errW
+        errZ = "Z is [projectionDimension, dataDimension]"
+        assert self.Z.shape[0] == self.projectionDimension, errZ
+        assert self.Z.shape[1] == self.dataDimension, errZ
+        errT = "T is [internalNodes, projectionDimension]"
+        assert self.T.shape[0] == self.internalNodes, errT
+        assert self.T.shape[1] == self.projectionDimension, errT
+        assert int(self.numClasses) > 0, "numClasses should be > 1"
+        msg = "# of features in data should be > 0"
+        assert int(self.dataDimension) > 0, msg
+        msg = "Projection should be  > 0 dims"
+        assert int(self.projectionDimension) > 0, msg
+        msg = "treeDepth should be >= 0"
+        assert int(self.treeDepth) >= 0, msg
diff --git a/tf2.0/edgeml/graph/protoNN.py b/tf2.0/edgeml/graph/protoNN.py
new file mode 100644
index 000000000..2ea5b85ff
--- /dev/null
+++ b/tf2.0/edgeml/graph/protoNN.py
@@ -0,0 +1,191 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+import numpy as np
+import tensorflow as tf
+
+
+class ProtoNN:
+    def __init__(self, inputDimension, projectionDimension, numPrototypes,
+                 numOutputLabels, gamma,
+                 W = None, B = None, Z = None):
+        '''
+        Forward computation graph for ProtoNN.
+
+        inputDimension: Input data dimension or feature dimension.
+        projectionDimension: hyperparameter
+        numPrototypes: hyperparameter
+        numOutputLabels: The number of output labels or classes
+        W, B, Z: Numpy matrices that can be used to initialize
+            projection matrix(W), prototype matrix (B) and prototype labels
+            matrix (B).
+            Expected Dimensions:
+                W   inputDimension (d) x projectionDimension (d_cap)
+                B   projectionDimension (d_cap) x numPrototypes (m)
+                Z   numOutputLabels (L) x numPrototypes (m)
+        '''
+        with tf.compat.v1.name_scope('protoNN') as ns:
+            self.__nscope = ns
+        self.__d = inputDimension
+        self.__d_cap = projectionDimension
+        self.__m = numPrototypes
+        self.__L = numOutputLabels
+
+        self.__inW = W
+        self.__inB = B
+        self.__inZ = Z
+        self.__inGamma = gamma
+        self.W, self.B, self.Z = None, None, None
+        self.gamma = None
+
+        self.__validInit = False
+        self.__initWBZ()
+        self.__initGamma()
+        self.__validateInit()
+        self.protoNNOut = None
+        self.predictions = None
+        self.accuracy = None
+
+    def __validateInit(self):
+        self.__validInit = False
+        errmsg = "Dimensions mismatch! Should be W[d, d_cap]"
+        errmsg += ", B[d_cap, m] and Z[L, m]"
+        d, d_cap, m, L, _ = self.getHyperParams()
+        assert self.W.shape[0] == d, errmsg
+        assert self.W.shape[1] == d_cap, errmsg
+        assert self.B.shape[0] == d_cap, errmsg
+        assert self.B.shape[1] == m, errmsg
+        assert self.Z.shape[0] == L, errmsg
+        assert self.Z.shape[1] == m, errmsg
+        self.__validInit = True
+
+    def __initWBZ(self):
+        with tf.compat.v1.name_scope(self.__nscope):
+            W = self.__inW
+            if W is None:
+                W = tf.compat.v1.initializers.random_normal()
+                W = W([self.__d, self.__d_cap])
+            self.W = tf.Variable(W, name='W', dtype=tf.float32)
+
+            B = self.__inB
+            if B is None:
+                B = tf.compat.v1.initializers.random_uniform()
+                B = B([self.__d_cap, self.__m])
+            self.B = tf.Variable(B, name='B', dtype=tf.float32)
+
+            Z = self.__inZ
+            if Z is None:
+                Z = tf.compat.v1.initializers.random_normal()
+                Z = Z([self.__L, self.__m])
+            Z = tf.Variable(Z, name='Z', dtype=tf.float32)
+            self.Z = Z
+        return self.W, self.B, self.Z
+
+    def __initGamma(self):
+        with tf.compat.v1.name_scope(self.__nscope):
+            gamma = self.__inGamma
+            self.gamma = tf.constant(gamma, name='gamma')
+
+    def getHyperParams(self):
+        '''
+        Returns the model hyperparameters:
+            [inputDimension, projectionDimension,
+            numPrototypes, numOutputLabels, gamma]
+        '''
+        d = self.__d
+        dcap = self.__d_cap
+        m = self.__m
+        L = self.__L
+        return d, dcap, m, L, self.gamma
+
+    def getModelMatrices(self):
+        '''
+        Returns Tensorflow tensors of the model matrices, which
+        can then be evaluated to obtain corresponding numpy arrays.
+
+        These can then be exported as part of other implementations of
+        ProtonNN, for instance a C++ implementation or pure python
+        implementation.
+        Returns
+            [ProjectionMatrix (W), prototypeMatrix (B),
+             prototypeLabelsMatrix (Z), gamma]
+        '''
+        return self.W, self.B, self.Z, self.gamma
+
+    def __call__(self, X, Y=None):
+        '''
+        This method is responsible for construction of the forward computation
+        graph. The end point of the computation graph, or in other words the
+        output operator for the forward computation is returned. Additionally,
+        if the argument Y is provided, a classification accuracy operator with
+        Y as target will also be created. For this, Y is assumed to in one-hot
+        encoded format and the class with the maximum prediction score is
+        compared to the encoded class in Y.  This accuracy operator is returned
+        by getAccuracyOp() method. If a different accuracyOp is required, it
+        can be defined by overriding the createAccOp(protoNNScoresOut, Y)
+        method.
+
+        X: Input tensor or placeholder of shape [-1, inputDimension]
+        Y: Optional tensor or placeholder for targets (labels or classes).
+            Expected shape is [-1, numOutputLabels].
+        returns: The forward computation outputs, self.protoNNOut
+        '''
+        # This should never execute
+        assert self.__validInit is True, "Initialization failed!"
+        if self.protoNNOut is not None:
+            return self.protoNNOut
+
+        W, B, Z, gamma = self.W, self.B, self.Z, self.gamma
+        with tf.compat.v1.name_scope(self.__nscope):
+            WX = tf.matmul(X, W)
+            # Convert WX to tensor so that broadcasting can work
+            dim = [-1, WX.shape.as_list()[1], 1]
+            WX = tf.reshape(WX, dim)
+            dim = [1, B.shape.as_list()[0], -1]
+            B_ = tf.reshape(B, dim)
+            l2sim = B_ - WX
+            l2sim = tf.pow(l2sim, 2)
+            l2sim = tf.reduce_sum(input_tensor=l2sim, axis=1, keepdims=True)
+            self.l2sim = l2sim
+            gammal2sim = (-1 * gamma * gamma) * l2sim
+            M = tf.exp(gammal2sim)
+            dim = [1] + Z.shape.as_list()
+            Z_ = tf.reshape(Z, dim)
+            y = tf.multiply(Z_, M)
+            y = tf.reduce_sum(input_tensor=y, axis=2, name='protoNNScoreOut')
+            self.protoNNOut = y
+            self.predictions = tf.argmax(input=y, axis=1, name='protoNNPredictions')
+            if Y is not None:
+                self.createAccOp(self.protoNNOut, Y)
+        return y
+
+    def createAccOp(self, outputs, target):
+        '''
+        Define an accuracy operation on ProtoNN's output scores and targets.
+        Here a simple classification accuracy operator is defined. More
+        complicated operators (for multiple label problems and so forth) can be
+        defined by overriding this method
+        '''
+        assert self.predictions is not None
+        target = tf.argmax(input=target, axis=1)
+        correctPrediction = tf.equal(self.predictions, target)
+        acc = tf.reduce_mean(input_tensor=tf.cast(correctPrediction, tf.float32),
+                             name='protoNNAccuracy')
+        self.accuracy = acc
+
+    def getPredictionsOp(self):
+        '''
+        The predictions operator is defined as argmax(protoNNScores) for each
+        prediction.
+        '''
+        return self.predictions
+
+    def getAccuracyOp(self):
+        '''
+        returns accuracyOp as defined by createAccOp. It defaults to
+        multi-class classification accuracy.
+        '''
+        msg = "Accuracy operator not defined in graph. Did you provide Y as an"
+        msg += " argument to _call_?"
+        assert self.accuracy is not None, msg
+        return self.accuracy
diff --git a/tf2.0/edgeml/trainer/__init__.py b/tf2.0/edgeml/trainer/__init__.py
new file mode 100644
index 000000000..3d7ff8299
--- /dev/null
+++ b/tf2.0/edgeml/trainer/__init__.py
@@ -0,0 +1,2 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
diff --git a/tf2.0/edgeml/trainer/bonsaiTrainer.py b/tf2.0/edgeml/trainer/bonsaiTrainer.py
new file mode 100644
index 000000000..2e86663ae
--- /dev/null
+++ b/tf2.0/edgeml/trainer/bonsaiTrainer.py
@@ -0,0 +1,560 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+from __future__ import print_function
+import tensorflow as tf
+import edgeml.utils as utils
+import numpy as np
+import os
+import sys
+
+
+class BonsaiTrainer:
+
+    def __init__(self, bonsaiObj, lW, lT, lV, lZ, sW, sT, sV, sZ,
+                 learningRate, X, Y, useMCHLoss=False, outFile=None, regLoss='huber'):
+        '''
+        bonsaiObj - Initialised Bonsai Object and Graph
+        lW, lT, lV and lZ are regularisers to Bonsai Params
+        sW, sT, sV and sZ are sparsity factors to Bonsai Params
+        learningRate - learningRate fro optimizer
+        X is the Data Placeholder - Dims [_, dataDimension]
+        Y - Label placeholder for loss computation
+        useMCHLoss - For choice between HingeLoss vs CrossEntropy
+        useMCHLoss - True - MultiClass - multiClassHingeLoss
+        useMCHLoss - False - MultiClass - crossEntropyLoss
+        '''
+
+        self.bonsaiObj = bonsaiObj
+        self.regressionLoss = regLoss
+
+        self.lW = lW
+        self.lV = lV
+        self.lT = lT
+        self.lZ = lZ
+
+        self.sW = sW
+        self.sV = sV
+        self.sT = sT
+        self.sZ = sZ
+
+        self.Y = Y
+        self.X = X
+
+        self.useMCHLoss = useMCHLoss
+
+        if outFile is not None:
+            print("Outfile : ", outFile)
+            self.outFile = open(outFile, 'w')
+        else:
+            self.outFile = sys.stdout
+
+        self.learningRate = learningRate
+
+        self.assertInit()
+
+        self.sigmaI = tf.compat.v1.placeholder(tf.float32, name='sigmaI')
+
+        self.score, self.X_ = self.bonsaiObj(self.X, self.sigmaI)
+
+        self.loss, self.marginLoss, self.regLoss = self.lossGraph()
+
+        self.trainStep = self.trainGraph()
+        '''
+        self.accuracy -> 'MAE' for Regression.
+        self.accuracy -> 'Accuracy' for Classification.
+        '''
+        self.accuracy = self.accuracyGraph()
+        self.prediction = self.bonsaiObj.getPrediction()
+
+        if self.sW > 0.99 and self.sV > 0.99 and self.sZ > 0.99 and self.sT > 0.99:
+            self.isDenseTraining = True
+        else:
+            self.isDenseTraining = False
+
+        self.hardThrsd()
+        self.sparseTraining()
+
+    def lossGraph(self):
+        '''
+        Loss Graph for given Bonsai Obj
+        '''
+        self.regLoss = 0.5 * (self.lZ * tf.square(tf.norm(tensor=self.bonsaiObj.Z)) +
+                              self.lW * tf.square(tf.norm(tensor=self.bonsaiObj.W)) +
+                              self.lV * tf.square(tf.norm(tensor=self.bonsaiObj.V)) +
+                              self.lT * tf.square(tf.norm(tensor=self.bonsaiObj.T)))
+
+        # Loss functions for classification.
+        if (self.bonsaiObj.isRegression is False):
+            if (self.bonsaiObj.numClasses > 2):
+                if self.useMCHLoss is True:
+                    self.batch_th = tf.compat.v1.placeholder(tf.int64, name='batch_th')
+                    self.marginLoss = utils.multiClassHingeLoss(
+                        tf.transpose(a=self.score), self.Y,
+                        self.batch_th)
+                else:
+                    self.marginLoss = utils.crossEntropyLoss(
+                        tf.transpose(a=self.score), self.Y)
+                self.loss = self.marginLoss + self.regLoss
+            else:
+                self.marginLoss = tf.reduce_mean(input_tensor=tf.nn.relu(
+                    1.0 - (2 * self.Y - 1) * tf.transpose(a=self.score)))
+                self.loss = self.marginLoss + self.regLoss
+
+        # Loss functions for regression.
+        elif (self.bonsaiObj.isRegression is True):
+            if(self.regressionLoss == 'huber'):
+                # Use of Huber Loss , because it is more robust to outliers.
+                self.marginLoss = tf.compat.v1.losses.huber_loss(
+                    self.Y, tf.transpose(a=self.score))
+                self.loss = self.marginLoss + self.regLoss
+            elif (self.regressionLoss == 'l2'):
+                # L2 loss function.
+                self.marginLoss = tf.nn.l2_loss(
+                    self.Y - tf.transpose(a=self.score))
+                self.loss = self.marginLoss + self.regLoss
+
+        return self.loss, self.marginLoss, self.regLoss
+
+    def trainGraph(self):
+        '''
+        Train Graph for the loss generated by Bonsai
+        '''
+        self.bonsaiObj.TrainStep = tf.compat.v1.train.AdamOptimizer(
+            self.learningRate).minimize(self.loss)
+
+        return self.bonsaiObj.TrainStep
+
+    def accuracyGraph(self):
+        '''
+        Accuracy Graph to evaluate accuracy when needed
+        '''
+        if(self.bonsaiObj.isRegression is False):
+            if (self.bonsaiObj.numClasses > 2):
+                correctPrediction = tf.equal(
+                    tf.argmax(input=tf.transpose(a=self.score), axis=1), tf.argmax(input=self.Y, axis=1))
+                self.accuracy = tf.reduce_mean(
+                    input_tensor=tf.cast(correctPrediction, tf.float32))
+            else:
+                y_ = self.Y * 2 - 1
+                correctPrediction = tf.multiply(tf.transpose(a=self.score), y_)
+                correctPrediction = tf.nn.relu(correctPrediction)
+                correctPrediction = tf.math.ceil(tf.tanh(correctPrediction))
+                self.accuracy = tf.reduce_mean(
+                    input_tensor=tf.cast(correctPrediction, tf.float32))
+
+        elif (self.bonsaiObj.isRegression is True):
+            # Accuracy for regression , in terms of mean absolute error.
+            self.accuracy = utils.mean_absolute_error(tf.reshape(
+                self.score, [-1, 1]), tf.reshape(self.Y, [-1, 1]))
+        return self.accuracy
+
+    def hardThrsd(self):
+        '''
+        Set up for hard Thresholding Functionality
+        '''
+        self.__Wth = tf.compat.v1.placeholder(tf.float32, name='Wth')
+        self.__Vth = tf.compat.v1.placeholder(tf.float32, name='Vth')
+        self.__Zth = tf.compat.v1.placeholder(tf.float32, name='Zth')
+        self.__Tth = tf.compat.v1.placeholder(tf.float32, name='Tth')
+
+        self.__Woph = self.bonsaiObj.W.assign(self.__Wth)
+        self.__Voph = self.bonsaiObj.V.assign(self.__Vth)
+        self.__Toph = self.bonsaiObj.T.assign(self.__Tth)
+        self.__Zoph = self.bonsaiObj.Z.assign(self.__Zth)
+
+        self.hardThresholdGroup = tf.group(
+            self.__Woph, self.__Voph, self.__Toph, self.__Zoph)
+
+    def sparseTraining(self):
+        '''
+        Set up for Sparse Retraining Functionality
+        '''
+        self.__Wops = self.bonsaiObj.W.assign(self.__Wth)
+        self.__Vops = self.bonsaiObj.V.assign(self.__Vth)
+        self.__Zops = self.bonsaiObj.Z.assign(self.__Zth)
+        self.__Tops = self.bonsaiObj.T.assign(self.__Tth)
+
+        self.sparseRetrainGroup = tf.group(
+            self.__Wops, self.__Vops, self.__Tops, self.__Zops)
+
+    def runHardThrsd(self, sess):
+        '''
+        Function to run the IHT routine on Bonsai Obj
+        '''
+        currW = self.bonsaiObj.W.eval()
+        currV = self.bonsaiObj.V.eval()
+        currZ = self.bonsaiObj.Z.eval()
+        currT = self.bonsaiObj.T.eval()
+
+        self.__thrsdW = utils.hardThreshold(currW, self.sW)
+        self.__thrsdV = utils.hardThreshold(currV, self.sV)
+        self.__thrsdZ = utils.hardThreshold(currZ, self.sZ)
+        self.__thrsdT = utils.hardThreshold(currT, self.sT)
+
+        fd_thrsd = {self.__Wth: self.__thrsdW, self.__Vth: self.__thrsdV,
+                    self.__Zth: self.__thrsdZ, self.__Tth: self.__thrsdT}
+        sess.run(self.hardThresholdGroup, feed_dict=fd_thrsd)
+
+    def runSparseTraining(self, sess):
+        '''
+        Function to run the Sparse Retraining routine on Bonsai Obj
+        '''
+        currW = self.bonsaiObj.W.eval()
+        currV = self.bonsaiObj.V.eval()
+        currZ = self.bonsaiObj.Z.eval()
+        currT = self.bonsaiObj.T.eval()
+
+        newW = utils.copySupport(self.__thrsdW, currW)
+        newV = utils.copySupport(self.__thrsdV, currV)
+        newZ = utils.copySupport(self.__thrsdZ, currZ)
+        newT = utils.copySupport(self.__thrsdT, currT)
+
+        fd_st = {self.__Wth: newW, self.__Vth: newV,
+                 self.__Zth: newZ, self.__Tth: newT}
+        sess.run(self.sparseRetrainGroup, feed_dict=fd_st)
+
+    def assertInit(self):
+        err = "sparsity must be between 0 and 1"
+        assert self.sW >= 0 and self.sW <= 1, "W " + err
+        assert self.sV >= 0 and self.sV <= 1, "V " + err
+        assert self.sZ >= 0 and self.sZ <= 1, "Z " + err
+        assert self.sT >= 0 and self.sT <= 1, "T " + err
+        errMsg = "Dimension Mismatch, Y has to be [_, " + \
+            str(self.bonsaiObj.numClasses) + "]"
+        errCont = " numClasses are 1 in case of Binary case by design"
+        assert (len(self.Y.shape) == 2 and
+                self.Y.shape[1] == self.bonsaiObj.numClasses), errMsg + errCont
+
+    def saveParams(self, currDir):
+        '''
+        Function to save Parameter matrices into a given folder
+        '''
+        paramDir = currDir + '/'
+        np.save(paramDir + "W.npy", self.bonsaiObj.W.eval())
+        np.save(paramDir + "V.npy", self.bonsaiObj.V.eval())
+        np.save(paramDir + "T.npy", self.bonsaiObj.T.eval())
+        np.save(paramDir + "Z.npy", self.bonsaiObj.Z.eval())
+        hyperParamDict = {'dataDim': self.bonsaiObj.dataDimension,
+                          'projDim': self.bonsaiObj.projectionDimension,
+                          'numClasses': self.bonsaiObj.numClasses,
+                          'depth': self.bonsaiObj.treeDepth,
+                          'sigma': self.bonsaiObj.sigma}
+        hyperParamFile = paramDir + 'hyperParam.npy'
+        np.save(hyperParamFile, hyperParamDict)
+
+    def saveParamsForSeeDot(self, currDir):
+        '''
+        Function to save Parameter matrices into a given folder for SeeDot compiler
+        '''
+        seeDotDir = currDir + '/SeeDot/'
+
+        if os.path.isdir(seeDotDir) is False:
+            try:
+                os.mkdir(seeDotDir)
+            except OSError:
+                print("Creation of the directory %s failed" %
+                      seeDotDir)
+
+        np.savetxt(seeDotDir + "W",
+                   utils.restructreMatrixBonsaiSeeDot(self.bonsaiObj.W.eval(),
+                                                      self.bonsaiObj.numClasses,
+                                                      self.bonsaiObj.totalNodes),
+                   delimiter="\t")
+        np.savetxt(seeDotDir + "V",
+                   utils.restructreMatrixBonsaiSeeDot(self.bonsaiObj.V.eval(),
+                                                      self.bonsaiObj.numClasses,
+                                                      self.bonsaiObj.totalNodes),
+                   delimiter="\t")
+        np.savetxt(seeDotDir + "T", self.bonsaiObj.T.eval(), delimiter="\t")
+        np.savetxt(seeDotDir + "Z", self.bonsaiObj.Z.eval(), delimiter="\t")
+        np.savetxt(seeDotDir + "Sigma",
+                   np.array([self.bonsaiObj.sigma]), delimiter="\t")
+
+    def loadModel(self, currDir):
+        '''
+        Load the Saved model and load it to the model using constructor
+        Returns two dict one for params and other for hyperParams
+        '''
+        paramDir = currDir + '/'
+        paramDict = {}
+        paramDict['W'] = np.load(paramDir + "W.npy")
+        paramDict['V'] = np.load(paramDir + "V.npy")
+        paramDict['T'] = np.load(paramDir + "T.npy")
+        paramDict['Z'] = np.load(paramDir + "Z.npy")
+        hyperParamDict = np.load(paramDir + "hyperParam.npy").item()
+        return paramDict, hyperParamDict
+
+    # Function to get aimed model size
+    def getModelSize(self):
+        '''
+        Function to get aimed model size
+        '''
+        nnzZ, sizeZ, sparseZ = utils.countnnZ(self.bonsaiObj.Z, self.sZ)
+        nnzW, sizeW, sparseW = utils.countnnZ(self.bonsaiObj.W, self.sW)
+        nnzV, sizeV, sparseV = utils.countnnZ(self.bonsaiObj.V, self.sV)
+        nnzT, sizeT, sparseT = utils.countnnZ(self.bonsaiObj.T, self.sT)
+
+        totalnnZ = (nnzZ + nnzT + nnzV + nnzW)
+        totalSize = (sizeZ + sizeW + sizeV + sizeT)
+        hasSparse = (sparseW or sparseV or sparseT or sparseZ)
+        return totalnnZ, totalSize, hasSparse
+
+    def train(self, batchSize, totalEpochs, sess,
+              Xtrain, Xtest, Ytrain, Ytest, dataDir, currDir):
+        '''
+        The Dense - IHT - Sparse Retrain Routine for Bonsai Training
+        '''
+        resultFile = open(dataDir + '/TFBonsaiResults.txt', 'a+')
+        numIters = Xtrain.shape[0] / batchSize
+
+        totalBatches = numIters * totalEpochs
+
+        bonsaiObjSigmaI = 1
+
+        counter = 0
+        if self.bonsaiObj.numClasses > 2:
+            trimlevel = 15
+        else:
+            trimlevel = 5
+        ihtDone = 0
+        if (self.bonsaiObj.isRegression is True):
+            maxTestAcc = 100000007
+        else:
+            maxTestAcc = -10000
+        if self.isDenseTraining is True:
+            ihtDone = 1
+            bonsaiObjSigmaI = 1
+            itersInPhase = 0
+
+        header = '*' * 20
+        for i in range(totalEpochs):
+            print("\nEpoch Number: " + str(i), file=self.outFile)
+
+            '''
+            trainAcc -> For Regression, it is 'Mean Absolute Error'.
+            trainAcc -> For Classification, it is 'Accuracy'.
+            '''
+            trainAcc = 0.0
+            trainLoss = 0.0
+
+            numIters = int(numIters)
+            for j in range(numIters):
+
+                if counter == 0:
+                    msg = " Dense Training Phase Started "
+                    print("\n%s%s%s\n" %
+                          (header, msg, header), file=self.outFile)
+
+                # Updating the indicator sigma
+                if ((counter == 0) or (counter == int(totalBatches / 3.0)) or
+                        (counter == int(2 * totalBatches / 3.0))) and (self.isDenseTraining is False):
+                    bonsaiObjSigmaI = 1
+                    itersInPhase = 0
+
+                elif (itersInPhase % 100 == 0):
+                    indices = np.random.choice(Xtrain.shape[0], 100)
+                    batchX = Xtrain[indices, :]
+                    batchY = Ytrain[indices, :]
+                    batchY = np.reshape(
+                        batchY, [-1, self.bonsaiObj.numClasses])
+
+                    _feed_dict = {self.X: batchX}
+                    Xcapeval = self.X_.eval(feed_dict=_feed_dict)
+                    Teval = self.bonsaiObj.T.eval()
+
+                    sum_tr = 0.0
+                    for k in range(0, self.bonsaiObj.internalNodes):
+                        sum_tr += (np.sum(np.abs(np.dot(Teval[k], Xcapeval))))
+
+                    if(self.bonsaiObj.internalNodes > 0):
+                        sum_tr /= (100 * self.bonsaiObj.internalNodes)
+                        sum_tr = 0.1 / sum_tr
+                    else:
+                        sum_tr = 0.1
+                    sum_tr = min(
+                        1000, sum_tr * (2**(float(itersInPhase) /
+                                            (float(totalBatches) / 30.0))))
+
+                    bonsaiObjSigmaI = sum_tr
+
+                itersInPhase += 1
+                batchX = Xtrain[j * batchSize:(j + 1) * batchSize]
+                batchY = Ytrain[j * batchSize:(j + 1) * batchSize]
+                batchY = np.reshape(
+                    batchY, [-1, self.bonsaiObj.numClasses])
+
+                if self.bonsaiObj.numClasses > 2:
+                    if self.useMCHLoss is True:
+                        _feed_dict = {self.X: batchX, self.Y: batchY,
+                                      self.batch_th: batchY.shape[0],
+                                      self.sigmaI: bonsaiObjSigmaI}
+                    else:
+                        _feed_dict = {self.X: batchX, self.Y: batchY,
+                                      self.sigmaI: bonsaiObjSigmaI}
+                else:
+                    _feed_dict = {self.X: batchX, self.Y: batchY,
+                                  self.sigmaI: bonsaiObjSigmaI}
+
+                # Mini-batch training
+                _, batchLoss, batchAcc = sess.run(
+                    [self.trainStep, self.loss, self.accuracy],
+                    feed_dict=_feed_dict)
+
+                # Classification.
+                if (self.bonsaiObj.isRegression is False):
+                    trainAcc += batchAcc
+                    trainLoss += batchLoss
+                # Regression.
+                else:
+                    trainAcc += np.mean(batchAcc)
+                    trainLoss += np.mean(batchLoss)
+
+                # Training routine involving IHT and sparse retraining
+                if (counter >= int(totalBatches / 3.0) and
+                    (counter < int(2 * totalBatches / 3.0)) and
+                    counter % trimlevel == 0 and
+                        self.isDenseTraining is False):
+                    self.runHardThrsd(sess)
+                    if ihtDone == 0:
+                        msg = " IHT Phase Started "
+                        print("\n%s%s%s\n" %
+                              (header, msg, header), file=self.outFile)
+                    ihtDone = 1
+                elif ((ihtDone == 1 and counter >= int(totalBatches / 3.0) and
+                       (counter < int(2 * totalBatches / 3.0)) and
+                       counter % trimlevel != 0 and
+                       self.isDenseTraining is False) or
+                        (counter >= int(2 * totalBatches / 3.0) and
+                            self.isDenseTraining is False)):
+                    self.runSparseTraining(sess)
+                    if counter == int(2 * totalBatches / 3.0):
+                        msg = " Sparse Retraining Phase Started "
+                        print("\n%s%s%s\n" %
+                              (header, msg, header), file=self.outFile)
+                counter += 1
+            try:
+                if (self.bonsaiObj.isRegression is True):
+                    print("\nRegression Train Loss: " + str(trainLoss / numIters) +
+                          "\nTraining MAE (Regression): " +
+                          str(trainAcc / numIters),
+                          file=self.outFile)
+                else:
+                    print("\nClassification Train Loss: " + str(trainLoss / numIters) +
+                          "\nTraining accuracy (Classification): " +
+                          str(trainAcc / numIters),
+                          file=self.outFile)
+            except:
+                continue
+
+            oldSigmaI = bonsaiObjSigmaI
+            bonsaiObjSigmaI = 1e9
+
+            if self.bonsaiObj.numClasses > 2:
+                if self.useMCHLoss is True:
+                    _feed_dict = {self.X: Xtest, self.Y: Ytest,
+                                  self.batch_th: Ytest.shape[0],
+                                  self.sigmaI: bonsaiObjSigmaI}
+                else:
+                    _feed_dict = {self.X: Xtest, self.Y: Ytest,
+                                  self.sigmaI: bonsaiObjSigmaI}
+            else:
+                _feed_dict = {self.X: Xtest, self.Y: Ytest,
+                              self.sigmaI: bonsaiObjSigmaI}
+
+            # This helps in direct testing instead of extracting the model out
+
+            testAcc, testLoss, regTestLoss, pred = sess.run(
+                [self.accuracy, self.loss, self.regLoss, self.prediction], feed_dict=_feed_dict)
+
+            if ihtDone == 0:
+                if (self.bonsaiObj.isRegression is False):
+                    maxTestAcc = -10000
+                    maxTestAccEpoch = i
+                elif (self.bonsaiObj.isRegression is True):
+                    maxTestAcc = testAcc
+                    maxTestAccEpoch = i
+
+            else:
+                if (self.bonsaiObj.isRegression is False):
+                    if maxTestAcc <= testAcc:
+                        maxTestAccEpoch = i
+                        maxTestAcc = testAcc
+                        self.saveParams(currDir)
+                        self.saveParamsForSeeDot(currDir)
+                elif (self.bonsaiObj.isRegression is True):
+                    print("Minimum Training MAE : ", np.mean(maxTestAcc))
+                    if maxTestAcc >= testAcc:
+                        # For regression , we're more interested in the minimum
+                        # MAE.
+                        maxTestAccEpoch = i
+                        maxTestAcc = testAcc
+                        self.saveParams(currDir)
+                        self.saveParamsForSeeDot(currDir)
+
+            if (self.bonsaiObj.isRegression is True):
+                print("Testing MAE %g" % np.mean(testAcc), file=self.outFile)
+            else:
+                print("Test accuracy %g" % np.mean(testAcc), file=self.outFile)
+
+            if (self.bonsaiObj.isRegression is True):
+                testAcc = np.mean(testAcc)
+            else:
+                testAcc = testAcc
+                maxTestAcc = maxTestAcc
+
+            print("MarginLoss + RegLoss: " + str(testLoss - regTestLoss) +
+                  " + " + str(regTestLoss) + " = " + str(testLoss) + "\n",
+                  file=self.outFile)
+            self.outFile.flush()
+
+            bonsaiObjSigmaI = oldSigmaI
+
+        # sigmaI has to be set to infinity to ensure
+        # only a single path is used in inference
+        bonsaiObjSigmaI = 1e9
+        print("\nNon-Zero : " + str(self.getModelSize()[0]) + " Model Size: " +
+              str(float(self.getModelSize()[1]) / 1024.0) + " KB hasSparse: " +
+              str(self.getModelSize()[2]) + "\n", file=self.outFile)
+
+        if (self.bonsaiObj.isRegression is True):
+            maxTestAcc = np.mean(maxTestAcc)
+
+        if (self.bonsaiObj.isRegression is True):
+            print("For Regression, Minimum MAE at compressed" +
+                  " model size(including early stopping): " +
+                  str(maxTestAcc) + " at Epoch: " +
+                  str(maxTestAccEpoch + 1) + "\nFinal Test" +
+                  " MAE: " + str(testAcc), file=self.outFile)
+
+            resultFile.write("MinTestMAE: " + str(maxTestAcc) +
+                             " at Epoch(totalEpochs): " +
+                             str(maxTestAccEpoch + 1) +
+                             "(" + str(totalEpochs) + ")" + " ModelSize: " +
+                             str(float(self.getModelSize()[1]) / 1024.0) +
+                             " KB hasSparse: " + str(self.getModelSize()[2]) +
+                             " Param Directory: " +
+                             str(os.path.abspath(currDir)) + "\n")
+
+        elif (self.bonsaiObj.isRegression is False):
+            print("For Classification, Maximum Test accuracy at compressed" +
+                  " model size(including early stopping): " +
+                  str(maxTestAcc) + " at Epoch: " +
+                  str(maxTestAccEpoch + 1) + "\nFinal Test" +
+                  " Accuracy: " + str(testAcc), file=self.outFile)
+
+            resultFile.write("MaxTestAcc: " + str(maxTestAcc) +
+                             " at Epoch(totalEpochs): " +
+                             str(maxTestAccEpoch + 1) +
+                             "(" + str(totalEpochs) + ")" + " ModelSize: " +
+                             str(float(self.getModelSize()[1]) / 1024.0) +
+                             " KB hasSparse: " + str(self.getModelSize()[2]) +
+                             " Param Directory: " +
+                             str(os.path.abspath(currDir)) + "\n")
+        print("The Model Directory: " + currDir + "\n")
+
+        resultFile.close()
+        self.outFile.flush()
+
+        if self.outFile is not sys.stdout:
+            self.outFile.close()
diff --git a/tf2.0/edgeml/trainer/fastTrainer.py b/tf2.0/edgeml/trainer/fastTrainer.py
new file mode 100644
index 000000000..bb1f51b10
--- /dev/null
+++ b/tf2.0/edgeml/trainer/fastTrainer.py
@@ -0,0 +1,527 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+from __future__ import print_function
+import os
+import sys
+import tensorflow as tf
+import edgeml.utils as utils
+import numpy as np
+from tensorflow.python.framework import graph_util
+
+
+class FastTrainer:
+
+    def __init__(self, FastObj, X, Y, sW=1.0, sU=1.0, learningRate=0.01,
+                 outFile=None):
+        '''
+        FastObj - Can be either FastRNN or FastGRNN with proper initialisations
+        sW and sU are the sparsity factors for Fast parameters
+        X is the Data Placeholder - Dims [_, timesteps, input_dims]
+        Y is the label placeholder for loss computation - Dims [_, num_classes]
+        batchSize is the batchSize
+        learningRate is the initial learning rate
+        '''
+
+        self.FastObj = FastObj
+        self.history = []
+
+        self.sW = sW
+        self.sU = sU
+
+        self.Y = Y
+        self.X = X
+
+        self.numClasses = int(self.Y.shape[1])
+        self.timeSteps = int(self.X.shape[1])
+        self.inputDims = int(self.X.shape[2])
+
+        self.learningRate = learningRate
+
+        if outFile is not None:
+            self.outFile = open(outFile, 'w')
+        else:
+            self.outFile = sys.stdout
+
+        self.lr = tf.compat.v1.placeholder("float", name="lr")
+
+        self.logits, self.finalHiddenState, self.predictions = self.computeGraph()
+
+        self.lossOp = self.lossGraph(self.logits, self.Y)
+        self.trainOp = self.trainGraph(self.lossOp, self.lr)
+
+        self.correctPredictions, self.accuracy = self.accuracyGraph(
+            self.predictions, self.Y)
+
+        self.numMatrices = self.FastObj.num_weight_matrices
+        self.totalMatrices = self.numMatrices[0] + self.numMatrices[1]
+
+        self.FastParams = self.FastObj.getVars()
+
+        if self.sW > 0.99 and self.sU > 0.99:
+            self.isDenseTraining = True
+        else:
+            self.isDenseTraining = False
+
+        self.hardThrsdGraph()
+        self.sparseTrainingGraph()
+
+    def RNN(self, x, timeSteps, FastObj):
+        '''
+        Unrolls and adds linear classifier
+        '''
+        x = tf.unstack(x, timeSteps, 1)
+        outputs, states = tf.compat.v1.nn.static_rnn(FastObj, x, dtype=tf.float32)
+        return outputs[-1]
+
+    def computeGraph(self):
+        '''
+        Compute graph to unroll and predict on the FastObj
+        '''
+        finalHiddenState = self.RNN(self.X, self.timeSteps, self.FastObj)
+
+        logits = self.classifier(finalHiddenState)
+        predictions = tf.nn.softmax(logits, name='predictions')
+
+        return logits, finalHiddenState, predictions
+
+    def classifier(self, feats):
+        '''
+        Can be raplaced by any classifier
+        TODO: Make this a separate class if needed
+        '''
+        self.FC = tf.Variable(tf.random.normal(
+            [self.FastObj.output_size, self.numClasses]), name='FC')
+        self.FCbias = tf.Variable(tf.random.normal(
+            [self.numClasses]), name='FCbias')
+
+        return tf.matmul(feats, self.FC) + self.FCbias
+
+    def lossGraph(self, logits, Y):
+        '''
+        Loss Graph for given FastObj
+        '''
+        lossOp = utils.crossEntropyLoss(logits, Y)
+        return lossOp
+
+    def trainGraph(self, lossOp, lr):
+        '''
+        Train Graph for the loss generated by Bonsai
+        '''
+        optimizer = tf.compat.v1.train.AdamOptimizer(lr)
+        trainOp = optimizer.minimize(lossOp)
+        return trainOp
+
+    def accuracyGraph(self, predictions, Y):
+        '''
+        Accuracy Graph to evaluate accuracy when needed
+        '''
+        correctPredictions = tf.equal(
+            tf.argmax(input=predictions, axis=1), tf.argmax(input=Y, axis=1))
+        accuracy = tf.reduce_mean(input_tensor=tf.cast(correctPredictions, tf.float32))
+        return correctPredictions, accuracy
+
+    def assertInit(self):
+        err = "sparsity must be between 0 and 1"
+        assert self.sW >= 0 and self.sW <= 1, "W " + err
+        assert self.sU >= 0 and self.sU <= 1, "U " + err
+
+    def hardThrsdGraph(self):
+        '''
+        Set up for hard Thresholding Functionality
+        '''
+        self.paramPlaceholders = []
+        self.htOps = []
+        for i in range(0, self.numMatrices[0]):
+            self.paramPlaceholders.append(tf.compat.v1.placeholder(
+                tf.float32, name="Wth_" + str(i)))
+        for i in range(self.numMatrices[0], self.totalMatrices):
+            self.paramPlaceholders.append(tf.compat.v1.placeholder(
+                tf.float32, name="Uth_" + str(i)))
+
+        for i in range(0, self.numMatrices[0]):
+            self.htOps.append(
+                self.FastParams[i].assign(self.paramPlaceholders[i]))
+        for i in range(self.numMatrices[0], self.totalMatrices):
+            self.htOps.append(
+                self.FastParams[i].assign(self.paramPlaceholders[i]))
+
+        self.hardThresholdGroup = tf.group(*self.htOps)
+
+    def sparseTrainingGraph(self):
+        '''
+        Set up for Sparse Retraining Functionality
+        '''
+        self.stOps = []
+
+        for i in range(0, self.numMatrices[0]):
+            self.stOps.append(
+                self.FastParams[i].assign(self.paramPlaceholders[i]))
+        for i in range(self.numMatrices[0], self.totalMatrices):
+            self.stOps.append(
+                self.FastParams[i].assign(self.paramPlaceholders[i]))
+
+        self.sparseRetrainGroup = tf.group(*self.stOps)
+
+    def runHardThrsd(self, sess):
+        '''
+        Function to run the IHT routine on FastObj
+        '''
+        self.thrsdParams = []
+        for i in range(0, self.numMatrices[0]):
+            self.thrsdParams.append(
+                utils.hardThreshold(self.FastParams[i].eval(), self.sW))
+        for i in range(self.numMatrices[0], self.totalMatrices):
+            self.thrsdParams.append(
+                utils.hardThreshold(self.FastParams[i].eval(), self.sU))
+
+        fd_thrsd = {}
+        for i in range(0, self.totalMatrices):
+            fd_thrsd[self.paramPlaceholders[i]] = self.thrsdParams[i]
+        sess.run(self.hardThresholdGroup, feed_dict=fd_thrsd)
+
+    def runSparseTraining(self, sess):
+        '''
+        Function to run the Sparse Retraining routine on FastObj
+        '''
+        self.reTrainParams = []
+        for i in range(0, self.totalMatrices):
+            self.reTrainParams.append(
+                utils.copySupport(self.thrsdParams[i], self.FastParams[i].eval()))
+
+        fd_st = {}
+        for i in range(0, self.totalMatrices):
+            fd_st[self.paramPlaceholders[i]] = self.reTrainParams[i]
+        sess.run(self.sparseRetrainGroup, feed_dict=fd_st)
+
+    def getModelSize(self):
+        '''
+        Function to get aimed model size
+        '''
+        totalnnZ = 0
+        totalSize = 0
+        hasSparse = False
+        for i in range(0, self.numMatrices[0]):
+            nnz, size, sparseFlag = utils.countnnZ(self.FastParams[i], self.sW)
+            totalnnZ += nnz
+            totalSize += size
+            hasSparse = hasSparse or sparseFlag
+
+        for i in range(self.numMatrices[0], self.totalMatrices):
+            nnz, size, sparseFlag = utils.countnnZ(self.FastParams[i], self.sU)
+            totalnnZ += nnz
+            totalSize += size
+            hasSparse = hasSparse or sparseFlag
+        for i in range(self.totalMatrices, len(self.FastParams)):
+            nnz, size, sparseFlag = utils.countnnZ(self.FastParams[i], 1.0)
+            totalnnZ += nnz
+            totalSize += size
+            hasSparse = hasSparse or sparseFlag
+
+        # Replace this with classifier class call
+        nnz, size, sparseFlag = utils.countnnZ(self.FC, 1.0)
+        totalnnZ += nnz
+        totalSize += size
+        hasSparse = hasSparse or sparseFlag
+
+        nnz, size, sparseFlag = utils.countnnZ(self.FCbias, 1.0)
+        totalnnZ += nnz
+        totalSize += size
+        hasSparse = hasSparse or sparseFlag
+
+        return totalnnZ, totalSize, hasSparse
+
+    def saveParams(self, currDir):
+        '''
+        Function to save Parameter matrices
+        '''
+        if self.numMatrices[0] == 1:
+            np.save(os.path.join(currDir, "W.npy"), self.FastParams[0].eval())
+        elif self.FastObj.wRank is None:
+            if self.numMatrices[0] == 2:
+                np.save(os.path.join(currDir, "W1.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "W2.npy"),
+                        self.FastParams[1].eval())
+            if self.numMatrices[0] == 3:
+                np.save(os.path.join(currDir, "W1.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "W2.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "W3.npy"),
+                        self.FastParams[2].eval())
+            if self.numMatrices[0] == 4:
+                np.save(os.path.join(currDir, "W1.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "W2.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "W3.npy"),
+                        self.FastParams[2].eval())
+                np.save(os.path.join(currDir, "W4.npy"),
+                        self.FastParams[3].eval())
+        elif self.FastObj.wRank is not None:
+            if self.numMatrices[0] == 3:
+                np.save(os.path.join(currDir, "W.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "W1.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "W2.npy"),
+                        self.FastParams[2].eval())
+            if self.numMatrices[0] == 4:
+                np.save(os.path.join(currDir, "W.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "W1.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "W2.npy"),
+                        self.FastParams[2].eval())
+                np.save(os.path.join(currDir, "W3.npy"),
+                        self.FastParams[3].eval())
+            if self.numMatrices[0] == 5:
+                np.save(os.path.join(currDir, "W.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "W1.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "W2.npy"),
+                        self.FastParams[2].eval())
+                np.save(os.path.join(currDir, "W3.npy"),
+                        self.FastParams[3].eval())
+                np.save(os.path.join(currDir, "W4.npy"),
+                        self.FastParams[4].eval())
+
+        if self.numMatrices[1] == 1:
+            np.save(os.path.join(currDir, "U.npy"), self.FastParams[0].eval())
+        elif self.FastObj.uRank is None:
+            if self.numMatrices[1] == 2:
+                np.save(os.path.join(currDir, "U1.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "U2.npy"),
+                        self.FastParams[1].eval())
+            if self.numMatrices[1] == 3:
+                np.save(os.path.join(currDir, "U1.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "U2.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "U3.npy"),
+                        self.FastParams[2].eval())
+            if self.numMatrices[1] == 4:
+                np.save(os.path.join(currDir, "U1.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "U2.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "U3.npy"),
+                        self.FastParams[2].eval())
+                np.save(os.path.join(currDir, "U4.npy"),
+                        self.FastParams[3].eval())
+        elif self.FastObj.uRank is not None:
+            if self.numMatrices[1] == 3:
+                np.save(os.path.join(currDir, "U.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "U1.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "U2.npy"),
+                        self.FastParams[2].eval())
+            if self.numMatrices[1] == 4:
+                np.save(os.path.join(currDir, "U.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "U1.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "U2.npy"),
+                        self.FastParams[2].eval())
+                np.save(os.path.join(currDir, "U3.npy"),
+                        self.FastParams[3].eval())
+            if self.numMatrices[1] == 5:
+                np.save(os.path.join(currDir, "U.npy"),
+                        self.FastParams[0].eval())
+                np.save(os.path.join(currDir, "U1.npy"),
+                        self.FastParams[1].eval())
+                np.save(os.path.join(currDir, "U2.npy"),
+                        self.FastParams[2].eval())
+                np.save(os.path.join(currDir, "U3.npy"),
+                        self.FastParams[3].eval())
+                np.save(os.path.join(currDir, "U4.npy"),
+                        self.FastParams[4].eval())
+
+        if self.FastObj.cellType == "FastGRNN":
+            np.save(os.path.join(currDir, "Bg.npy"),
+                    self.FastParams[self.totalMatrices].eval())
+            np.save(os.path.join(currDir, "Bh.npy"),
+                    self.FastParams[self.totalMatrices + 1].eval())
+            np.save(os.path.join(currDir, "zeta.npy"),
+                    self.FastParams[self.totalMatrices + 2].eval())
+            np.save(os.path.join(currDir, "nu.npy"),
+                    self.FastParams[self.totalMatrices + 3].eval())
+        elif self.FastObj.cellType == "FastRNN":
+            np.save(os.path.join(currDir, "B.npy"),
+                    self.FastParams[self.totalMatrices].eval())
+            np.save(os.path.join(currDir, "alpha.npy"), self.FastParams[
+                    self.totalMatrices + 1].eval())
+            np.save(os.path.join(currDir, "beta.npy"),
+                    self.FastParams[self.totalMatrices + 2].eval())
+        elif self.FastObj.cellType == "UGRNNLR":
+            np.save(os.path.join(currDir, "Bg.npy"),
+                    self.FastParams[self.totalMatrices].eval())
+            np.save(os.path.join(currDir, "Bh.npy"),
+                    self.FastParams[self.totalMatrices + 1].eval())
+        elif self.FastObj.cellType == "GRULR":
+            np.save(os.path.join(currDir, "Br.npy"),
+                    self.FastParams[self.totalMatrices].eval())
+            np.save(os.path.join(currDir, "Bg.npy"),
+                    self.FastParams[self.totalMatrices + 1].eval())
+            np.save(os.path.join(currDir, "Bh.npy"),
+                    self.FastParams[self.totalMatrices + 2].eval())
+        elif self.FastObj.cellType == "LSTMLR":
+            np.save(os.path.join(currDir, "Bf.npy"),
+                    self.FastParams[self.totalMatrices].eval())
+            np.save(os.path.join(currDir, "Bi.npy"),
+                    self.FastParams[self.totalMatrices + 1].eval())
+            np.save(os.path.join(currDir, "Bc.npy"),
+                    self.FastParams[self.totalMatrices + 2].eval())
+            np.save(os.path.join(currDir, "Bo.npy"),
+                    self.FastParams[self.totalMatrices + 3].eval())
+
+        np.save(os.path.join(currDir, "FC.npy"), self.FC.eval())
+        np.save(os.path.join(currDir, "FCbias.npy"), self.FCbias.eval())
+
+    def train(self, batchSize, totalEpochs, sess,
+              Xtrain, Xtest, Ytrain, Ytest,
+              decayStep, decayRate, dataDir, currDir):
+        '''
+        The Dense - IHT - Sparse Retrain Routine for FastCell Training
+        '''
+        fileName = str(self.FastObj.cellType) + 'Results.txt'
+        resultFile = open(os.path.join(dataDir, fileName), 'a+')
+        numIters = int(np.ceil(float(Xtrain.shape[0]) / float(batchSize)))
+        totalBatches = numIters * totalEpochs
+
+        counter = 0
+        trimlevel = 15
+        ihtDone = 0
+        maxTestAcc = -10000
+        if self.isDenseTraining is True:
+            ihtDone = 1
+            maxTestAcc = -10000
+        header = '*' * 20
+
+        Xtest = Xtest.reshape((-1, self.timeSteps, self.inputDims))
+        Xtrain = Xtrain.reshape((-1, self.timeSteps, self.inputDims))
+
+        self.history = []
+
+        for i in range(0, totalEpochs):
+            print("\nEpoch Number: " + str(i), file=self.outFile)
+
+            if i % decayStep == 0 and i != 0:
+                self.learningRate = self.learningRate * decayRate
+
+            shuffled = list(range(Xtrain.shape[0]))
+            np.random.shuffle(shuffled)
+            trainAcc = 0.0
+            trainLoss = 0.0
+
+            numIters = int(numIters)
+            for j in range(0, numIters):
+
+                if counter == 0:
+                    msg = " Dense Training Phase Started "
+                    print("\n%s%s%s\n" %
+                          (header, msg, header), file=self.outFile)
+
+                k = shuffled[j * batchSize:(j + 1) * batchSize]
+                batchX = Xtrain[k]
+                batchY = Ytrain[k]
+
+                # Mini-batch training
+                _, batchLoss, batchAcc = sess.run([self.trainOp, self.lossOp, self.accuracy], feed_dict={
+                                                  self.X: batchX, self.Y: batchY, self.lr: self.learningRate})
+
+                trainAcc += batchAcc
+                trainLoss += batchLoss
+
+                # Training routine involving IHT and sparse retraining
+                if (counter >= int(totalBatches / 3.0) and
+                        (counter < int(2 * totalBatches / 3.0)) and
+                        counter % trimlevel == 0 and
+                        self.isDenseTraining is False):
+                    self.runHardThrsd(sess)
+                    if ihtDone == 0:
+                        msg = " IHT Phase Started "
+                        print("\n%s%s%s\n" %
+                              (header, msg, header), file=self.outFile)
+                    ihtDone = 1
+                elif ((ihtDone == 1 and counter >= int(totalBatches / 3.0) and
+                       (counter < int(2 * totalBatches / 3.0)) and
+                       counter % trimlevel != 0 and
+                       self.isDenseTraining is False) or
+                        (counter >= int(2 * totalBatches / 3.0) and
+                            self.isDenseTraining is False)):
+                    self.runSparseTraining(sess)
+                    if counter == int(2 * totalBatches / 3.0):
+                        msg = " Sprase Retraining Phase Started "
+                        print("\n%s%s%s\n" %
+                              (header, msg, header), file=self.outFile)
+                counter += 1
+
+            trainLoss /= numIters
+            trainAcc /= numIters
+            print("Train Loss: " + str(trainLoss) +
+                  " Train Accuracy: " + str(trainAcc),
+                  file=self.outFile)
+
+            testAcc, testLoss = sess.run([self.accuracy, self.lossOp], feed_dict={
+                                         self.X: Xtest, self.Y: Ytest})
+
+            self.history += [
+                {
+                    "epoch": i,
+                    "trainAcc": trainAcc,
+                    "trainLoss": trainLoss,
+                    "testAcc": testAcc,
+                    "testLoss": testLoss
+                }
+            ]
+
+            if ihtDone == 0:
+                maxTestAcc = -10000
+                maxTestAccEpoch = i
+            else:
+                if maxTestAcc <= testAcc:
+                    maxTestAccEpoch = i
+                    maxTestAcc = testAcc
+                    self.saveParams(currDir)
+
+            print("Test Loss: " + str(testLoss) +
+                  " Test Accuracy: " + str(testAcc), file=self.outFile)
+            self.outFile.flush()
+
+        print("\nMaximum Test accuracy at compressed" +
+              " model size(including early stopping): " +
+              str(maxTestAcc) + " at Epoch: " +
+              str(maxTestAccEpoch + 1) + "\nFinal Test" +
+              " Accuracy: " + str(testAcc), file=self.outFile)
+        print("\n\nNon-Zeros: " + str(self.getModelSize()[0]) +
+              " Model Size: " + str(float(self.getModelSize()[1]) / 1024.0) +
+              " KB hasSparse: " + str(self.getModelSize()[2]) + "\n",
+              file=self.outFile)
+
+        resultFile.write("MaxTestAcc: " + str(maxTestAcc) +
+                         " at Epoch(totalEpochs): " +
+                         str(maxTestAccEpoch + 1) +
+                         "(" + str(totalEpochs) + ")" + " ModelSize: " +
+                         str(float(self.getModelSize()[1]) / 1024.0) +
+                         " KB hasSparse: " + str(self.getModelSize()[2]) +
+                         " Param Directory: " +
+                         str(os.path.abspath(currDir)) + "\n")
+
+        print("The Model Directory: " + currDir + "\n")
+
+        # output the tensorflow model
+        model_dir = os.path.join(currDir, "model")
+        os.makedirs(model_dir, exist_ok=True)
+
+        resultFile.close()
+        self.outFile.flush()
+        if self.outFile is not sys.stdout:
+            self.outFile.close()
+
+    def getAccuracyLog(self):
+        return self.history
diff --git a/tf2.0/edgeml/trainer/protoNNTrainer.py b/tf2.0/edgeml/trainer/protoNNTrainer.py
new file mode 100644
index 000000000..27de4d7de
--- /dev/null
+++ b/tf2.0/edgeml/trainer/protoNNTrainer.py
@@ -0,0 +1,219 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+from __future__ import print_function
+import tensorflow as tf
+import numpy as np
+import sys
+import edgeml.utils as utils
+
+
+class ProtoNNTrainer:
+    def __init__(self, protoNNObj, regW, regB, regZ,
+                 sparcityW, sparcityB, sparcityZ,
+                 learningRate, X, Y, lossType='l2'):
+        '''
+        A wrapper for the various techniques used for training ProtoNN. This
+        subsumes both the responsibility of loss graph construction and
+        performing training. The original training routine that is part of the
+        C++ implementation of EdgeML used iterative hard thresholding (IHT),
+        gamma estimation through median heuristic and other tricks for
+        training ProtoNN. This module implements the same in Tensorflow
+        and python.
+
+        protoNNObj: An instance of ProtoNN class defining the forward
+            computation graph. The loss functions and training routines will be
+            attached to this instance.
+        regW, regB, regZ: Regularization constants for W, B, and
+            Z matrices of protoNN.
+        sparcityW, sparcityB, sparcityZ: Sparsity constraints
+            for W, B and Z matrices. A value between 0 (exclusive) and 1
+            (inclusive) is expected. A value of 1 indicates dense training.
+        learningRate: Initial learning rate for ADAM optimizer.
+        X, Y : Placeholders for data and labels.
+            X [-1, featureDimension]
+            Y [-1, num Labels]
+        lossType: ['l2', 'xentropy']
+        '''
+        self.protoNNObj = protoNNObj
+        self.__regW = regW
+        self.__regB = regB
+        self.__regZ = regZ
+        self.__sW = sparcityW
+        self.__sB = sparcityB
+        self.__sZ = sparcityZ
+        self.__lR = learningRate
+        self.X = X
+        self.Y = Y
+        self.sparseTraining = True
+        if (sparcityW == 1.0) and (sparcityB == 1.0) and (sparcityZ == 1.0):
+            self.sparseTraining = False
+            print("Sparse training disabled.", file=sys.stderr)
+        # Define placeholders for sparse training
+        self.W_th = None
+        self.B_th = None
+        self.Z_th = None
+        self.__lossType = lossType
+        self.__validInit = False
+        self.__validInit = self.__validateInit()
+        self.__protoNNOut = protoNNObj(X, Y)
+        self.loss = self.__lossGraph()
+        self.trainStep = self.__trainGraph()
+        self.__hthOp = self.__getHardThresholdOp()
+        self.accuracy = protoNNObj.getAccuracyOp()
+
+    def __validateInit(self):
+        self.__validInit = False
+        msg = "Sparsity value should be between"
+        msg += " 0 and 1 (both inclusive)."
+        assert self.__sW >= 0. and self.__sW <= 1., 'W:' + msg
+        assert self.__sB >= 0. and self.__sB <= 1., 'B:' + msg
+        assert self.__sZ >= 0. and self.__sZ <= 1., 'Z:' + msg
+        d, dcap, m, L, _ = self.protoNNObj.getHyperParams()
+        msg = 'Y should be of dimension [-1, num labels/classes]'
+        msg += ' specified as part of ProtoNN object.'
+        assert (len(self.Y.shape)) == 2, msg
+        assert (self.Y.shape[1] == L), msg
+        msg = 'X should be of dimension [-1, featureDimension]'
+        msg += ' specified as part of ProtoNN object.'
+        assert (len(self.X.shape) == 2), msg
+        assert (self.X.shape[1] == d), msg
+        self.__validInit = True
+        msg = 'Values can be \'l2\', or \'xentropy\''
+        if self.__lossType not in ['l2', 'xentropy']:
+            raise ValueError(msg)
+        return True
+
+    def __lossGraph(self):
+        pnnOut = self.__protoNNOut
+        l1, l2, l3 = self.__regW, self.__regB, self.__regZ
+        W, B, Z, _ = self.protoNNObj.getModelMatrices()
+        if self.__lossType == 'l2':
+            with tf.compat.v1.name_scope('protonn-l2-loss'):
+                loss_0 = tf.nn.l2_loss(self.Y - pnnOut)
+                reg = l1 * tf.nn.l2_loss(W) + l2 * tf.nn.l2_loss(B)
+                reg += l3 * tf.nn.l2_loss(Z)
+                loss = loss_0 + reg
+        elif self.__lossType == 'xentropy':
+            with tf.compat.v1.name_scope('protonn-xentropy-loss'):
+                loss_0 = tf.nn.softmax_cross_entropy_with_logits(logits=pnnOut,
+                                                         labels=tf.stop_gradient(self.Y))
+                loss_0 = tf.reduce_mean(input_tensor=loss_0)
+                reg = l1 * tf.nn.l2_loss(W) + l2 * tf.nn.l2_loss(B)
+                reg += l3 * tf.nn.l2_loss(Z)
+                loss = loss_0 + reg
+        return loss
+
+    def __trainGraph(self):
+        with tf.compat.v1.name_scope('protonn-gradient-adam'):
+            trainStep = tf.compat.v1.train.AdamOptimizer(self.__lR)
+            trainStep = trainStep.minimize(self.loss)
+        return trainStep
+
+    def __getHardThresholdOp(self):
+        W, B, Z, _ = self.protoNNObj.getModelMatrices()
+        self.W_th = tf.compat.v1.placeholder(tf.float32, name='W_th')
+        self.B_th = tf.compat.v1.placeholder(tf.float32, name='B_th')
+        self.Z_th = tf.compat.v1.placeholder(tf.float32, name='Z_th')
+        with tf.compat.v1.name_scope('hard-threshold-assignments'):
+            # hard_thrsd_W = W.assign(self.W_th)
+            # hard_thrsd_B = B.assign(self.B_th)
+            # hard_thrsd_Z = Z.assign(self.Z_th)
+            # Code changes for tf 1.11
+            hard_thrsd_W = tf.compat.v1.assign(W, self.W_th)
+            hard_thrsd_B = tf.compat.v1.assign(B, self.B_th)
+            hard_thrsd_Z = tf.compat.v1.assign(Z, self.Z_th)
+            hard_thrsd_op = tf.group(hard_thrsd_W, hard_thrsd_B, hard_thrsd_Z)
+        return hard_thrsd_op
+
+    def train(self, batchSize, totalEpochs, sess,
+              x_train, x_val, y_train, y_val, noInit=False,
+              redirFile=None, printStep=10, valStep=3):
+        '''
+        Performs dense training of ProtoNN followed by iterative hard
+        thresholding to enforce sparsity constraints.
+
+        batchSize: Batch size per update
+        totalEpochs: The number of epochs to run training for. One epoch is
+            defined as one pass over the entire training data.
+        sess: The Tensorflow session to use for running various graph
+            operators.
+        x_train, x_val, y_train, y_val: The numpy array containing train and
+            validation data. x data is assumed to in of shape [-1,
+            featureDimension] while y should have shape [-1, numberLabels].
+        noInit: By default, all the tensors of the computation graph are
+        initialized at the start of the training session. Set noInit=False to
+        disable this behaviour.
+        printStep: Number of batches between echoing of loss and train accuracy.
+        valStep: Number of epochs between evolutions on validation set.
+        '''
+        d, d_cap, m, L, gamma = self.protoNNObj.getHyperParams()
+        assert batchSize >= 1, 'Batch size should be positive integer'
+        assert totalEpochs >= 1, 'Total epochs should be positive integer'
+        assert x_train.ndim == 2, 'Expected training data to be of rank 2'
+        assert x_train.shape[1] == d, 'Expected x_train to be [-1, %d]' % d
+        assert x_val.ndim == 2, 'Expected validation data to be of rank 2'
+        assert x_val.shape[1] == d, 'Expected x_val to be [-1, %d]' % d
+        assert y_train.ndim == 2, 'Expected training labels to be of rank 2'
+        assert y_train.shape[1] == L, 'Expected y_train to be [-1, %d]' % L
+        assert y_val.ndim == 2, 'Expected validation labels to be of rank 2'
+        assert y_val.shape[1] == L, 'Expected y_val to be [-1, %d]' % L
+
+        # Numpy will throw asserts for arrays
+        if sess is None:
+            raise ValueError('sess must be valid Tensorflow session.')
+
+        trainNumBatches = int(np.ceil(len(x_train) / batchSize))
+        valNumBatches = int(np.ceil(len(x_val) / batchSize))
+        x_train_batches = np.array_split(x_train, trainNumBatches)
+        y_train_batches = np.array_split(y_train, trainNumBatches)
+        x_val_batches = np.array_split(x_val, valNumBatches)
+        y_val_batches = np.array_split(y_val, valNumBatches)
+        if not noInit:
+            sess.run(tf.compat.v1.global_variables_initializer())
+        X, Y = self.X, self.Y
+        W, B, Z, _ = self.protoNNObj.getModelMatrices()
+        for epoch in range(totalEpochs):
+            for i in range(len(x_train_batches)):
+                batch_x = x_train_batches[i]
+                batch_y = y_train_batches[i]
+                feed_dict = {
+                    X: batch_x,
+                    Y: batch_y
+                }
+                sess.run(self.trainStep, feed_dict=feed_dict)
+                if i % printStep == 0:
+                    loss, acc = sess.run([self.loss, self.accuracy],
+                                         feed_dict=feed_dict)
+                    msg = "Epoch: %3d Batch: %3d" % (epoch, i)
+                    msg += " Loss: %3.5f Accuracy: %2.5f" % (loss, acc)
+                    print(msg, file=redirFile)
+
+            # Perform Hard thresholding
+            if self.sparseTraining:
+                W_, B_, Z_ = sess.run([W, B, Z])
+                fd_thrsd = {
+                    self.W_th: utils.hardThreshold(W_, self.__sW),
+                    self.B_th: utils.hardThreshold(B_, self.__sB),
+                    self.Z_th: utils.hardThreshold(Z_, self.__sZ)
+                }
+                sess.run(self.__hthOp, feed_dict=fd_thrsd)
+
+            if (epoch + 1) % valStep  == 0:
+                acc = 0.0
+                loss = 0.0
+                for j in range(len(x_val_batches)):
+                    batch_x = x_val_batches[j]
+                    batch_y = y_val_batches[j]
+                    feed_dict = {
+                        X: batch_x,
+                        Y: batch_y
+                    }
+                    acc_, loss_ = sess.run([self.accuracy, self.loss],
+                                           feed_dict=feed_dict)
+                    acc += acc_
+                    loss += loss_
+                acc /= len(y_val_batches)
+                loss /= len(y_val_batches)
+                print("Test Loss: %2.5f Accuracy: %2.5f" % (loss, acc))
+
diff --git a/tf2.0/edgeml/utils.py b/tf2.0/edgeml/utils.py
new file mode 100644
index 000000000..b3ff5adb4
--- /dev/null
+++ b/tf2.0/edgeml/utils.py
@@ -0,0 +1,339 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+from __future__ import print_function
+import tensorflow as tf
+import numpy as np
+import scipy.cluster
+import scipy.spatial
+import os
+
+
+def medianHeuristic(data, projectionDimension, numPrototypes, W_init=None):
+    '''
+    This method can be used to estimate gamma for ProtoNN. An approximation to
+    median heuristic is used here.
+    1. First the data is collapsed into the projectionDimension by W_init. If
+    W_init is not provided, it is initialized from a random normal(0, 1). Hence
+    data normalization is essential.
+    2. Prototype are computed by running a  k-means clustering on the projected
+    data.
+    3. The median distance is then estimated by calculating median distance
+    between prototypes and projected data points.
+
+    data needs to be [-1, numFeats]
+    If using this method to initialize gamma, please use the W and B as well.
+
+    TODO: Return estimate of Z (prototype labels) based on cluster centroids
+    andand labels
+
+    TODO: Clustering fails due to singularity error if projecting upwards
+
+    W [dxd_cap]
+    B [d_cap, m]
+    returns gamma, W, B
+    '''
+    assert data.ndim == 2
+    X = data
+    featDim = data.shape[1]
+    if projectionDimension > featDim:
+        print("Warning: Projection dimension > feature dimension. Gamma")
+        print("\t estimation due to median heuristic could fail.")
+        print("\tTo retain the projection dataDimension, provide")
+        print("\ta value for gamma.")
+
+    if W_init is None:
+        W_init = np.random.normal(size=[featDim, projectionDimension])
+    W = W_init
+    XW = np.matmul(X, W)
+    assert XW.shape[1] == projectionDimension
+    assert XW.shape[0] == len(X)
+    # Requires [N x d_cap] data matrix of N observations of d_cap-dimension and
+    # the number of centroids m. Returns, [n x d_cap] centroids and
+    # elementwise center information.
+    B, centers = scipy.cluster.vq.kmeans2(XW, numPrototypes)
+    # Requires two matrices. Number of observations x dimension of observation
+    # space. Distances[i,j] is the distance between XW[i] and B[j]
+    distances = scipy.spatial.distance.cdist(XW, B, metric='euclidean')
+    distances = np.reshape(distances, [-1])
+    gamma = np.median(distances)
+    gamma = 1 / (2.5 * gamma)
+    return gamma.astype('float32'), W.astype('float32'), B.T.astype('float32')
+
+
+def multiClassHingeLoss(logits, label, batch_th):
+    '''
+    MultiClassHingeLoss to match C++ Version - No TF internal version
+    '''
+    flatLogits = tf.reshape(logits, [-1, ])
+    label_ = tf.argmax(input=label, axis=1)
+
+    correctId = tf.range(0, batch_th) * label.shape[1] + label_
+    correctLogit = tf.gather(flatLogits, correctId)
+
+    maxLabel = tf.argmax(input=logits, axis=1)
+    top2, _ = tf.nn.top_k(logits, k=2, sorted=True)
+
+    wrongMaxLogit = tf.where(
+        tf.equal(maxLabel, label_), top2[:, 1], top2[:, 0])
+
+    return tf.reduce_mean(input_tensor=tf.nn.relu(1. + wrongMaxLogit - correctLogit))
+
+
+def crossEntropyLoss(logits, label):
+    '''
+    Cross Entropy loss for MultiClass case in joint training for
+    faster convergence
+    '''
+    return tf.reduce_mean(
+        input_tensor=tf.nn.softmax_cross_entropy_with_logits(logits=logits,
+                                                   labels=tf.stop_gradient(label)))
+
+
+def mean_absolute_error(logits, label):
+    '''
+    Function to compute the mean absolute error.
+    '''
+    return tf.reduce_mean(input_tensor=tf.abs(tf.subtract(logits, label)))
+
+
+def hardThreshold(A, s):
+    '''
+    Hard thresholding function on Tensor A with sparsity s
+    '''
+    A_ = np.copy(A)
+    A_ = A_.ravel()
+    if len(A_) > 0:
+        th = np.percentile(np.abs(A_), (1 - s) * 100.0, interpolation='higher')
+        A_[np.abs(A_) < th] = 0.0
+    A_ = A_.reshape(A.shape)
+    return A_
+
+
+def copySupport(src, dest):
+    '''
+    copy support of src tensor to dest tensor
+    '''
+    support = np.nonzero(src)
+    dest_ = dest
+    dest = np.zeros(dest_.shape)
+    dest[support] = dest_[support]
+    return dest
+
+
+def countnnZ(A, s, bytesPerVar=4):
+    '''
+    Returns # of non-zeros and representative size of the tensor
+    Uses dense for s >= 0.5 - 4 byte
+    Else uses sparse - 8 byte
+    '''
+    params = 1
+    hasSparse = False
+    for i in range(0, len(A.shape)):
+        params *= int(A.shape[i])
+    if s < 0.5:
+        nnZ = np.ceil(params * s)
+        hasSparse = True
+        return nnZ, nnZ * 2 * bytesPerVar, hasSparse
+    else:
+        nnZ = params
+        return nnZ, nnZ * bytesPerVar, hasSparse
+
+
+def getConfusionMatrix(predicted, target, numClasses):
+    '''
+    Returns a confusion matrix for a multiclass classification
+    problem. `predicted` is a 1-D array of integers representing
+    the predicted classes and `target` is the target classes.
+
+    confusion[i][j]: Number of elements of class j
+        predicted as class i
+    Labels are assumed to be in range(0, numClasses)
+    Use`printFormattedConfusionMatrix` to echo the confusion matrix
+    in a user friendly form.
+    '''
+    assert(predicted.ndim == 1)
+    assert(target.ndim == 1)
+    arr = np.zeros([numClasses, numClasses])
+
+    for i in range(len(predicted)):
+        arr[predicted[i]][target[i]] += 1
+    return arr
+
+
+def printFormattedConfusionMatrix(matrix):
+    '''
+    Given a 2D confusion matrix, prints it in a human readable way.
+    The confusion matrix is expected to be a 2D numpy array with
+    square dimensions
+    '''
+    assert(matrix.ndim == 2)
+    assert(matrix.shape[0] == matrix.shape[1])
+    RECALL = 'Recall'
+    PRECISION = 'PRECISION'
+    print("|%s|" % ('True->'), end='')
+    for i in range(matrix.shape[0]):
+        print("%7d|" % i, end='')
+    print("%s|" % 'Precision')
+
+    print("|%s|" % ('-' * len(RECALL)), end='')
+    for i in range(matrix.shape[0]):
+        print("%s|" % ('-' * 7), end='')
+    print("%s|" % ('-' * len(PRECISION)))
+
+    precisionlist = np.sum(matrix, axis=1)
+    recalllist = np.sum(matrix, axis=0)
+    precisionlist = [matrix[i][i] / x if x !=
+                     0 else -1 for i, x in enumerate(precisionlist)]
+    recalllist = [matrix[i][i] / x if x !=
+                  0 else -1 for i, x in enumerate(recalllist)]
+    for i in range(matrix.shape[0]):
+        # len recall = 6
+        print("|%6d|" % (i), end='')
+        for j in range(matrix.shape[0]):
+            print("%7d|" % (matrix[i][j]), end='')
+        print("%s" % (" " * (len(PRECISION) - 7)), end='')
+        if precisionlist[i] != -1:
+            print("%1.5f|" % precisionlist[i])
+        else:
+            print("%7s|" % "nan")
+
+    print("|%s|" % ('-' * len(RECALL)), end='')
+    for i in range(matrix.shape[0]):
+        print("%s|" % ('-' * 7), end='')
+    print("%s|" % ('-' * len(PRECISION)))
+    print("|%s|" % ('Recall'), end='')
+
+    for i in range(matrix.shape[0]):
+        if recalllist[i] != -1:
+            print("%1.5f|" % (recalllist[i]), end='')
+        else:
+            print("%7s|" % "nan", end='')
+
+    print('%s|' % (' ' * len(PRECISION)))
+
+
+def getPrecisionRecall(cmatrix, label=1):
+    trueP = cmatrix[label][label]
+    denom = np.sum(cmatrix, axis=0)[label]
+    if denom == 0:
+        denom = 1
+    recall = trueP / denom
+    denom = np.sum(cmatrix, axis=1)[label]
+    if denom == 0:
+        denom = 1
+    precision = trueP / denom
+    return precision, recall
+
+
+def getMacroPrecisionRecall(cmatrix):
+    # TP + FP
+    precisionlist = np.sum(cmatrix, axis=1)
+    # TP + FN
+    recalllist = np.sum(cmatrix, axis=0)
+    precisionlist__ = [cmatrix[i][i] / x if x !=
+                       0 else 0 for i, x in enumerate(precisionlist)]
+    recalllist__ = [cmatrix[i][i] / x if x !=
+                    0 else 0 for i, x in enumerate(recalllist)]
+    precision = np.sum(precisionlist__)
+    precision /= len(precisionlist__)
+    recall = np.sum(recalllist__)
+    recall /= len(recalllist__)
+    return precision, recall
+
+
+def getMicroPrecisionRecall(cmatrix):
+    # TP + FP
+    precisionlist = np.sum(cmatrix, axis=1)
+    # TP + FN
+    recalllist = np.sum(cmatrix, axis=0)
+    num = 0.0
+    for i in range(len(cmatrix)):
+        num += cmatrix[i][i]
+
+    precision = num / np.sum(precisionlist)
+    recall = num / np.sum(recalllist)
+    return precision, recall
+
+
+def getMacroMicroFScore(cmatrix):
+    '''
+    Returns macro and micro f-scores.
+    Refer: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.104.8244&rep=rep1&type=pdf
+    '''
+    precisionlist = np.sum(cmatrix, axis=1)
+    recalllist = np.sum(cmatrix, axis=0)
+    precisionlist__ = [cmatrix[i][i] / x if x !=
+                       0 else 0 for i, x in enumerate(precisionlist)]
+    recalllist__ = [cmatrix[i][i] / x if x !=
+                    0 else 0 for i, x in enumerate(recalllist)]
+    macro = 0.0
+    for i in range(len(precisionlist)):
+        denom = precisionlist__[i] + recalllist__[i]
+        numer = precisionlist__[i] * recalllist__[i] * 2
+        if denom == 0:
+            denom = 1
+        macro += numer / denom
+    macro /= len(precisionlist)
+
+    num = 0.0
+    for i in range(len(precisionlist)):
+        num += cmatrix[i][i]
+
+    denom1 = np.sum(precisionlist)
+    denom2 = np.sum(recalllist)
+    pi = num / denom1
+    rho = num / denom2
+    denom = pi + rho
+    if denom == 0:
+        denom = 1
+    micro = 2 * pi * rho / denom
+    return macro, micro
+
+
+def restructreMatrixBonsaiSeeDot(A, nClasses, nNodes):
+    '''
+    Restructures a matrix from [nNodes*nClasses, Proj] to 
+    [nClasses*nNodes, Proj] for SeeDot
+    '''
+    tempMatrix = np.zeros(A.shape)
+    rowIndex = 0
+
+    for i in range(0, nClasses):
+        for j in range(0, nNodes):
+            tempMatrix[rowIndex] = A[j * nClasses + i]
+            rowIndex += 1
+
+    return tempMatrix
+
+
+class GraphManager:
+    '''
+    Manages saving and restoring graphs. Designed to be used with EMI-RNN
+    though is general enough to be useful otherwise as well.
+    '''
+
+    def __init__(self):
+        pass
+
+    def checkpointModel(self, saver, sess, modelPrefix,
+                        globalStep=1000, redirFile=None):
+        saver.save(sess, modelPrefix, global_step=globalStep)
+        print('Model saved to %s, global_step %d' % (modelPrefix, globalStep),
+              file=redirFile)
+
+    def loadCheckpoint(self, sess, modelPrefix, globalStep,
+                       redirFile=None):
+        metaname = modelPrefix + '-%d.meta' % globalStep
+        basename = os.path.basename(metaname)
+        fileList = os.listdir(os.path.dirname(modelPrefix))
+        fileList = [x for x in fileList if x.startswith(basename)]
+        assert len(fileList) > 0, 'Checkpoint file not found'
+        msg = 'Too many or too few checkpoint files for globalStep: %d' % globalStep
+        assert len(fileList) is 1, msg
+        chkpt = basename + '/' + fileList[0]
+        saver = tf.compat.v1.train.import_meta_graph(metaname)
+        metaname = metaname[:-5]
+        saver.restore(sess, metaname)
+        graph = tf.compat.v1.get_default_graph()
+        return graph
diff --git a/tf2.0/examples/Bonsai/README.md b/tf2.0/examples/Bonsai/README.md
new file mode 100644
index 000000000..91cb00213
--- /dev/null
+++ b/tf2.0/examples/Bonsai/README.md
@@ -0,0 +1,67 @@
+# EdgeML Bonsai on a sample public dataset
+
+This directory includes, example notebook and general execution script of
+Bonsai developed as part of EdgeML. Also, we include a sample cleanup and
+use-case on the USPS10 public dataset.
+
+`edgeml.graph.bonsai` implements the Bonsai prediction graph in tensorflow.
+The three-phase training routine for Bonsai is decoupled from the forward graph
+to facilitate a plug and play behaviour wherein Bonsai can be combined with or
+used as a final layer classifier for other architectures (RNNs, CNNs).
+
+Note that `bonsai_example.py` assumes that data is in a specific format.  It is
+assumed that train and test data is contained in two files, `train.npy` and
+`test.npy`. Each containing a 2D numpy array of dimension `[numberOfExamples,
+numberOfFeatures + 1]`. The first column of each matrix is assumed to contain
+label information.  For an N-Class problem, we assume the labels are integers
+from 0 through N-1. `bonsai_example.py` also supports univariate regression 
+and can be accessed using the help options of the script. Multivariate regression 
+requires restructuring of the input data format and can further help in extending 
+bonsai to multi-label classification and multi-variate regression. Lastly, 
+the training data, `train.npy`, is assumed to well shuffled 
+as the training routine doesn't shuffle internally.
+
+**Tested With:** Tensorflow >1.6 with Python 2 and Python 3
+
+## Download and clean up sample dataset
+
+We will be testing out the validation of the code by using the USPS dataset.
+The download and cleanup of the dataset to match the above-mentioned format is
+done by the script [fetch_usps.py](fetch_usps.py) and
+[process_usps.py](process_usps.py)
+
+```
+python fetch_usps.py
+python process_usps.py
+```
+
+## Sample command for Bonsai on USPS10
+The following sample run on usps10 should validate your library:
+
+```bash
+python bonsai_example.py -dir usps10/ -d 3 -p 28 -rW 0.001 -rZ 0.0001 -rV 0.001 -rT 0.001 -sZ 0.2 -sW 0.3 -sV 0.3 -sT 0.62 -e 100 -s 1
+```
+This command should give you a final output screen which reads roughly similar to (might not be exact numbers due to various version mismatches):
+```
+Maximum Test accuracy at compressed model size(including early stopping): 0.94369704 at Epoch: 66
+Final Test Accuracy: 0.93024415
+
+Non-Zeros: 4156.0 Model Size: 31.703125 KB hasSparse: True
+```
+
+usps10 directory will now have a consolidated results file called `TFBonsaiResults.txt` and a directory `TFBonsaiResults` with the corresponding models with each run of the code on the usps10 dataset
+
+## Byte Quantization (Q) for model compression
+If you wish to quantize the generated model to use byte quantized integers use `quantizeBonsaiModels.py`. Usage Instructions:
+
+```
+python quantizeBonsaiModels.py -h
+```
+
+This will generate quantized models with a suffix of `q` before every param stored in a new directory `QuantizedTFBonsaiModel` inside the model directory.
+One can use this model further on edge devices.
+
+
+Copyright (c) Microsoft Corporation. All rights reserved. 
+
+Licensed under the MIT license.
diff --git a/tf2.0/examples/Bonsai/bonsai_example.ipynb b/tf2.0/examples/Bonsai/bonsai_example.ipynb
new file mode 100644
index 000000000..1935fd2b9
--- /dev/null
+++ b/tf2.0/examples/Bonsai/bonsai_example.ipynb
@@ -0,0 +1,1135 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bonsai in Tensorflow\n",
+    "\n",
+    "This is a simple notebook that illustrates the usage of Tensorflow implementation of Bonsai. We are using the USPS dataset. Please refer to `fetch_usps.py` and run it for downloading and cleaning up the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:06.056404Z",
+     "start_time": "2018-08-15T12:06:05.112969Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Copyright (c) Microsoft Corporation. All rights reserved.\n",
+    "# Licensed under the MIT license.\n",
+    "\n",
+    "import helpermethods\n",
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "import os\n",
+    "\n",
+    "#Provide the GPU number to be used\n",
+    "os.environ['CUDA_VISIBLE_DEVICES'] =''\n",
+    "\n",
+    "#Bonsai imports\n",
+    "from edgeml.trainer.bonsaiTrainer import BonsaiTrainer\n",
+    "from edgeml.graph.bonsai import Bonsai\n",
+    "\n",
+    "# Fixing seeds for reproducibility\n",
+    "tf.set_random_seed(42)\n",
+    "np.random.seed(42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# USPS Data\n",
+    "\n",
+    "It is assumed that the USPS data has already been downloaded and set up with the help of [fetch_usps.py](fetch_usps.py) and is present in the `./usps10` subdirectory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:06.104645Z",
+     "start_time": "2018-08-15T12:06:06.058368Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Feature Dimension:  257\n",
+      "Num classes:  10\n"
+     ]
+    }
+   ],
+   "source": [
+    "#Loading and Pre-processing dataset for Bonsai\n",
+    "dataDir = \"usps10/\"\n",
+    "(dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest, mean, std) = helpermethods.preProcessData(dataDir, isRegression=False)\n",
+    "print(\"Feature Dimension: \", dataDimension)\n",
+    "print(\"Num classes: \", numClasses)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Model Parameters\n",
+    "\n",
+    "Note that Bonsai is designed for low-memory setting and the best results are obtained when operating in that setting. Use the sparsity, projection dimension and tree depth to vary the model size."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:06.123318Z",
+     "start_time": "2018-08-15T12:06:06.106847Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "sigma = 1.0 #Sigmoid parameter for tanh\n",
+    "depth = 3 #Depth of Bonsai Tree\n",
+    "projectionDimension = 28 #Lower Dimensional space for Bonsai to work on\n",
+    "\n",
+    "#Regularizers for Bonsai Parameters\n",
+    "regZ = 0.0001\n",
+    "regW = 0.001\n",
+    "regV = 0.001\n",
+    "regT = 0.001\n",
+    "\n",
+    "totalEpochs = 100\n",
+    "\n",
+    "learningRate = 0.01\n",
+    "\n",
+    "outFile = None\n",
+    "\n",
+    "#Sparsity for Bonsai Parameters. x => 100*x % are non-zeros\n",
+    "sparZ = 0.2\n",
+    "sparW = 0.3\n",
+    "sparV = 0.3\n",
+    "sparT = 0.62\n",
+    "\n",
+    "batchSize = np.maximum(100, int(np.ceil(np.sqrt(Ytrain.shape[0]))))\n",
+    "\n",
+    "useMCHLoss = True #only for Multiclass cases True: Multiclass-Hing Loss, False: Cross Entropy. \n",
+    "\n",
+    "#Bonsai uses one classier for Binary, thus this condition\n",
+    "if numClasses == 2:\n",
+    "    numClasses = 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Placeholders for Data feeding during training and infernece"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:06.220274Z",
+     "start_time": "2018-08-15T12:06:06.125219Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "X = tf.placeholder(\"float32\", [None, dataDimension])\n",
+    "Y = tf.placeholder(\"float32\", [None, numClasses])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Creating a directory for current model in the datadirectory using timestamp"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:06.264985Z",
+     "start_time": "2018-08-15T12:06:06.222170Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "currDir = helpermethods.createTimeStampDir(dataDir)\n",
+    "helpermethods.dumpCommand(sys.argv, currDir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bonsai Graph Object\n",
+    "\n",
+    "Instantiating the Bonsai Graph which will be used for training and inference."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:06.341168Z",
+     "start_time": "2018-08-15T12:06:06.266877Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "bonsaiObj = Bonsai(numClasses, dataDimension, projectionDimension, depth, sigma)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bonsai Trainer Object\n",
+    "\n",
+    "Instantiating the Bonsai Trainer which will be used for 3 phase training."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:07.973584Z",
+     "start_time": "2018-08-15T12:06:06.342945Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "C:\\Users\\t-vekusu\\AppData\\Local\\Continuum\\anaconda3\\envs\\tensorflow\\lib\\site-packages\\tensorflow\\python\\ops\\gradients_impl.py:100: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.\n",
+      "  \"Converting sparse IndexedSlices to a dense Tensor of unknown shape. \"\n"
+     ]
+    }
+   ],
+   "source": [
+    "bonsaiTrainer = BonsaiTrainer(bonsaiObj, regW, regT, regV, regZ, sparW, sparT, sparV, sparZ,\n",
+    "                              learningRate, X, Y, useMCHLoss, outFile)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Session declaration and variable initialization. \n",
+    "Interactive Session doesn't clog the entire GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:06:15.577425Z",
+     "start_time": "2018-08-15T12:06:07.976090Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "sess = tf.InteractiveSession()\n",
+    "sess.run(tf.global_variables_initializer())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Bonsai Training Routine\n",
+    "\n",
+    "The method to to run the 3 phase training, followed by giving out the best early stopping model, accuracy along with saving of the parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T12:07:02.500241Z",
+     "start_time": "2018-08-15T12:06:15.579618Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 0\n",
+      "\n",
+      "******************** Dense Training Phase Started ********************\n",
+      "\n",
+      "\n",
+      "Classification Train Loss: 6.388934433460236\n",
+      "Training accuracy (Classification): 0.6250000005174015\n",
+      "Test accuracy 0.726956\n",
+      "MarginLoss + RegLoss: 1.4466879 + 3.6487768 = 5.0954647\n",
+      "\n",
+      "\n",
+      "Epoch Number: 1\n",
+      "\n",
+      "Classification Train Loss: 3.6885906954606376\n",
+      "Training accuracy (Classification): 0.8623611107468605\n",
+      "Test accuracy 0.758346\n",
+      "MarginLoss + RegLoss: 1.0173264 + 2.778634 = 3.7959604\n",
+      "\n",
+      "\n",
+      "Epoch Number: 2\n",
+      "\n",
+      "Classification Train Loss: 2.667721450328827\n",
+      "Training accuracy (Classification): 0.9184722271230485\n",
+      "Test accuracy 0.7429\n",
+      "MarginLoss + RegLoss: 0.92546654 + 2.095467 = 3.0209336\n",
+      "\n",
+      "\n",
+      "Epoch Number: 3\n",
+      "\n",
+      "Classification Train Loss: 1.9921080254846149\n",
+      "Training accuracy (Classification): 0.941944446000788\n",
+      "Test accuracy 0.767314\n",
+      "MarginLoss + RegLoss: 0.7603649 + 1.5889603 = 2.3493252\n",
+      "\n",
+      "\n",
+      "Epoch Number: 4\n",
+      "\n",
+      "Classification Train Loss: 1.5233625107341342\n",
+      "Training accuracy (Classification): 0.9563888907432556\n",
+      "Test accuracy 0.791231\n",
+      "MarginLoss + RegLoss: 0.6496898 + 1.2271981 = 1.8768879\n",
+      "\n",
+      "\n",
+      "Epoch Number: 5\n",
+      "\n",
+      "Classification Train Loss: 1.1950715631246567\n",
+      "Training accuracy (Classification): 0.9650000035762787\n",
+      "Test accuracy 0.810164\n",
+      "MarginLoss + RegLoss: 0.54003507 + 0.97295314 = 1.5129882\n",
+      "\n",
+      "\n",
+      "Epoch Number: 6\n",
+      "\n",
+      "Classification Train Loss: 0.9672323316335678\n",
+      "Training accuracy (Classification): 0.968333340353436\n",
+      "Test accuracy 0.855007\n",
+      "MarginLoss + RegLoss: 0.44149697 + 0.79325426 = 1.2347512\n",
+      "\n",
+      "\n",
+      "Epoch Number: 7\n",
+      "\n",
+      "Classification Train Loss: 0.8014380658666292\n",
+      "Training accuracy (Classification): 0.9722222313284874\n",
+      "Test accuracy 0.874938\n",
+      "MarginLoss + RegLoss: 0.37062877 + 0.6628879 = 1.0335166\n",
+      "\n",
+      "\n",
+      "Epoch Number: 8\n",
+      "\n",
+      "Classification Train Loss: 0.684503066043059\n",
+      "Training accuracy (Classification): 0.976111119820012\n",
+      "Test accuracy 0.899851\n",
+      "MarginLoss + RegLoss: 0.3099702 + 0.5688073 = 0.8787775\n",
+      "\n",
+      "\n",
+      "Epoch Number: 9\n",
+      "\n",
+      "Classification Train Loss: 0.5987317487597466\n",
+      "Training accuracy (Classification): 0.9794444565971693\n",
+      "Test accuracy 0.907324\n",
+      "MarginLoss + RegLoss: 0.2689218 + 0.49965328 = 0.7685751\n",
+      "\n",
+      "\n",
+      "Epoch Number: 10\n",
+      "\n",
+      "Classification Train Loss: 0.5343128165437115\n",
+      "Training accuracy (Classification): 0.9804166778922081\n",
+      "Test accuracy 0.9143\n",
+      "MarginLoss + RegLoss: 0.24538836 + 0.44663915 = 0.6920275\n",
+      "\n",
+      "\n",
+      "Epoch Number: 11\n",
+      "\n",
+      "Classification Train Loss: 0.48874612069792217\n",
+      "Training accuracy (Classification): 0.9801388987236552\n",
+      "Test accuracy 0.916293\n",
+      "MarginLoss + RegLoss: 0.23703864 + 0.40629783 = 0.6433365\n",
+      "\n",
+      "\n",
+      "Epoch Number: 12\n",
+      "\n",
+      "Classification Train Loss: 0.44733552055226433\n",
+      "Training accuracy (Classification): 0.98097223126226\n",
+      "Test accuracy 0.918286\n",
+      "MarginLoss + RegLoss: 0.23851919 + 0.37269312 = 0.6112123\n",
+      "\n",
+      "\n",
+      "Epoch Number: 13\n",
+      "\n",
+      "Classification Train Loss: 0.4165669356783231\n",
+      "Training accuracy (Classification): 0.9822222317258517\n",
+      "Test accuracy 0.917289\n",
+      "MarginLoss + RegLoss: 0.23061273 + 0.345445 = 0.57605773\n",
+      "\n",
+      "\n",
+      "Epoch Number: 14\n",
+      "\n",
+      "Classification Train Loss: 0.39181090601616436\n",
+      "Training accuracy (Classification): 0.9812500087751282\n",
+      "Test accuracy 0.92277\n",
+      "MarginLoss + RegLoss: 0.2121576 + 0.32245666 = 0.53461426\n",
+      "\n",
+      "\n",
+      "Epoch Number: 15\n",
+      "\n",
+      "Classification Train Loss: 0.36949437111616135\n",
+      "Training accuracy (Classification): 0.9820833446251022\n",
+      "Test accuracy 0.926258\n",
+      "MarginLoss + RegLoss: 0.19854721 + 0.30341443 = 0.50196165\n",
+      "\n",
+      "\n",
+      "Epoch Number: 16\n",
+      "\n",
+      "Classification Train Loss: 0.3469446731938256\n",
+      "Training accuracy (Classification): 0.9831944538487328\n",
+      "Test accuracy 0.927255\n",
+      "MarginLoss + RegLoss: 0.19628116 + 0.28535655 = 0.48163772\n",
+      "\n",
+      "\n",
+      "Epoch Number: 17\n",
+      "\n",
+      "Classification Train Loss: 0.329777576857143\n",
+      "Training accuracy (Classification): 0.984166675971614\n",
+      "Test accuracy 0.92277\n",
+      "MarginLoss + RegLoss: 0.20166817 + 0.26965213 = 0.4713203\n",
+      "\n",
+      "\n",
+      "Epoch Number: 18\n",
+      "\n",
+      "Classification Train Loss: 0.317672994815641\n",
+      "Training accuracy (Classification): 0.9815277879436811\n",
+      "Test accuracy 0.925262\n",
+      "MarginLoss + RegLoss: 0.20086277 + 0.2559616 = 0.45682436\n",
+      "\n",
+      "\n",
+      "Epoch Number: 19\n",
+      "\n",
+      "Classification Train Loss: 0.3000084459781647\n",
+      "Training accuracy (Classification): 0.9843055655558904\n",
+      "Test accuracy 0.931739\n",
+      "MarginLoss + RegLoss: 0.18073215 + 0.24324338 = 0.42397553\n",
+      "\n",
+      "\n",
+      "Epoch Number: 20\n",
+      "\n",
+      "Classification Train Loss: 0.2897499371320009\n",
+      "Training accuracy (Classification): 0.9827777867515882\n",
+      "Test accuracy 0.921276\n",
+      "MarginLoss + RegLoss: 0.20172484 + 0.23221089 = 0.43393573\n",
+      "\n",
+      "\n",
+      "Epoch Number: 21\n",
+      "\n",
+      "Classification Train Loss: 0.2821065636558665\n",
+      "Training accuracy (Classification): 0.9812500096029706\n",
+      "Test accuracy 0.928749\n",
+      "MarginLoss + RegLoss: 0.18990344 + 0.22147894 = 0.41138238\n",
+      "\n",
+      "\n",
+      "Epoch Number: 22\n",
+      "\n",
+      "Classification Train Loss: 0.2660716378854381\n",
+      "Training accuracy (Classification): 0.9844444559680091\n",
+      "Test accuracy 0.928251\n",
+      "MarginLoss + RegLoss: 0.17955597 + 0.21111046 = 0.39066643\n",
+      "\n",
+      "\n",
+      "Epoch Number: 23\n",
+      "\n",
+      "Classification Train Loss: 0.2567368100086848\n",
+      "Training accuracy (Classification): 0.9852777885066138\n",
+      "Test accuracy 0.928251\n",
+      "MarginLoss + RegLoss: 0.18770447 + 0.20248988 = 0.39019436\n",
+      "\n",
+      "\n",
+      "Epoch Number: 24\n",
+      "\n",
+      "Classification Train Loss: 0.25224825532899964\n",
+      "Training accuracy (Classification): 0.9823611204822859\n",
+      "Test accuracy 0.932735\n",
+      "MarginLoss + RegLoss: 0.18552671 + 0.19460817 = 0.38013488\n",
+      "\n",
+      "\n",
+      "Epoch Number: 25\n",
+      "\n",
+      "Classification Train Loss: 0.24661735258996487\n",
+      "Training accuracy (Classification): 0.9804166762365235\n",
+      "Test accuracy 0.931241\n",
+      "MarginLoss + RegLoss: 0.18796808 + 0.18610859 = 0.37407666\n",
+      "\n",
+      "\n",
+      "Epoch Number: 26\n",
+      "\n",
+      "Classification Train Loss: 0.23342499737110403\n",
+      "Training accuracy (Classification): 0.9829166763358645\n",
+      "Test accuracy 0.932735\n",
+      "MarginLoss + RegLoss: 0.17906994 + 0.17793566 = 0.3570056\n",
+      "\n",
+      "\n",
+      "Epoch Number: 27\n",
+      "\n",
+      "Classification Train Loss: 0.22210048822065195\n",
+      "Training accuracy (Classification): 0.9851388972666528\n",
+      "Test accuracy 0.934728\n",
+      "MarginLoss + RegLoss: 0.17679122 + 0.16876754 = 0.34555876\n",
+      "\n",
+      "\n",
+      "Epoch Number: 28\n",
+      "\n",
+      "Classification Train Loss: 0.2189549288402001\n",
+      "Training accuracy (Classification): 0.9831944538487328\n",
+      "Test accuracy 0.932237\n",
+      "MarginLoss + RegLoss: 0.19115414 + 0.16296963 = 0.35412377\n",
+      "\n",
+      "\n",
+      "Epoch Number: 29\n",
+      "\n",
+      "Classification Train Loss: 0.21842483865718046\n",
+      "Training accuracy (Classification): 0.9805555658208\n",
+      "Test accuracy 0.936722\n",
+      "MarginLoss + RegLoss: 0.17462157 + 0.15921564 = 0.3338372\n",
+      "\n",
+      "\n",
+      "Epoch Number: 30\n",
+      "\n",
+      "Classification Train Loss: 0.21449942576388517\n",
+      "Training accuracy (Classification): 0.9804166754086813\n",
+      "Test accuracy 0.939711\n",
+      "MarginLoss + RegLoss: 0.17741902 + 0.15273981 = 0.33015883\n",
+      "\n",
+      "\n",
+      "Epoch Number: 31\n",
+      "\n",
+      "Classification Train Loss: 0.20739994280868107\n",
+      "Training accuracy (Classification): 0.9825000100665622\n",
+      "Test accuracy 0.933732\n",
+      "MarginLoss + RegLoss: 0.17381513 + 0.1498537 = 0.32366884\n",
+      "\n",
+      "\n",
+      "Epoch Number: 32\n",
+      "\n",
+      "Classification Train Loss: 0.20110303929282558\n",
+      "Training accuracy (Classification): 0.9840277888708644\n",
+      "Test accuracy 0.93423\n",
+      "MarginLoss + RegLoss: 0.18619148 + 0.14583017 = 0.33202165\n",
+      "\n",
+      "\n",
+      "Epoch Number: 33\n",
+      "\n",
+      "******************** IHT Phase Started ********************\n",
+      "\n",
+      "\n",
+      "Classification Train Loss: 0.21433907147083017\n",
+      "Training accuracy (Classification): 0.9801388987236552\n",
+      "Test accuracy 0.927255\n",
+      "MarginLoss + RegLoss: 0.19979775 + 0.12088289 = 0.32068065\n",
+      "\n",
+      "\n",
+      "Epoch Number: 34\n",
+      "\n",
+      "Classification Train Loss: 0.1990115779141585\n",
+      "Training accuracy (Classification): 0.980694454577234\n",
+      "Test accuracy 0.933234\n",
+      "MarginLoss + RegLoss: 0.17835513 + 0.12438774 = 0.30274287\n",
+      "\n",
+      "\n",
+      "Epoch Number: 35\n",
+      "\n",
+      "Classification Train Loss: 0.20429682172834873\n",
+      "Training accuracy (Classification): 0.9788888974322213\n",
+      "Test accuracy 0.929248\n",
+      "MarginLoss + RegLoss: 0.19013074 + 0.12853864 = 0.31866938\n",
+      "\n",
+      "\n",
+      "Epoch Number: 36\n",
+      "\n",
+      "Classification Train Loss: 0.19357945707937083\n",
+      "Training accuracy (Classification): 0.9816666767001152\n",
+      "Test accuracy 0.932735\n",
+      "MarginLoss + RegLoss: 0.18534705 + 0.12509713 = 0.31044418\n",
+      "\n",
+      "\n",
+      "Epoch Number: 37\n",
+      "\n",
+      "Classification Train Loss: 0.18653404754069117\n",
+      "Training accuracy (Classification): 0.9818055638008647\n",
+      "Test accuracy 0.929746\n",
+      "MarginLoss + RegLoss: 0.18708317 + 0.12236847 = 0.30945164\n",
+      "\n",
+      "\n",
+      "Epoch Number: 38\n",
+      "\n",
+      "Classification Train Loss: 0.18141362298693922\n",
+      "Training accuracy (Classification): 0.9815277871158388\n",
+      "Test accuracy 0.933234\n",
+      "MarginLoss + RegLoss: 0.18262453 + 0.11991154 = 0.30253607\n",
+      "\n",
+      "\n",
+      "Epoch Number: 39\n",
+      "\n",
+      "Classification Train Loss: 0.17729416727605793\n",
+      "Training accuracy (Classification): 0.9820833429694176\n",
+      "Test accuracy 0.932735\n",
+      "MarginLoss + RegLoss: 0.1798804 + 0.11748926 = 0.29736966\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 40\n",
+      "\n",
+      "Classification Train Loss: 0.17237282171845436\n",
+      "Training accuracy (Classification): 0.9837500088744693\n",
+      "Test accuracy 0.937718\n",
+      "MarginLoss + RegLoss: 0.17473482 + 0.11479883 = 0.28953364\n",
+      "\n",
+      "\n",
+      "Epoch Number: 41\n",
+      "\n",
+      "Classification Train Loss: 0.16901198805620274\n",
+      "Training accuracy (Classification): 0.9837500097023116\n",
+      "Test accuracy 0.93423\n",
+      "MarginLoss + RegLoss: 0.17860568 + 0.112817116 = 0.29142278\n",
+      "\n",
+      "\n",
+      "Epoch Number: 42\n",
+      "\n",
+      "Classification Train Loss: 0.16710670509686074\n",
+      "Training accuracy (Classification): 0.9833333442608515\n",
+      "Test accuracy 0.936722\n",
+      "MarginLoss + RegLoss: 0.17501548 + 0.11118551 = 0.286201\n",
+      "\n",
+      "\n",
+      "Epoch Number: 43\n",
+      "\n",
+      "Classification Train Loss: 0.16463725310232905\n",
+      "Training accuracy (Classification): 0.9836111209458775\n",
+      "Test accuracy 0.93423\n",
+      "MarginLoss + RegLoss: 0.17687047 + 0.10897398 = 0.28584445\n",
+      "\n",
+      "\n",
+      "Epoch Number: 44\n",
+      "\n",
+      "Classification Train Loss: 0.16215091271118987\n",
+      "Training accuracy (Classification): 0.9843055663837327\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.17832607 + 0.107886344 = 0.2862124\n",
+      "\n",
+      "\n",
+      "Epoch Number: 45\n",
+      "\n",
+      "Classification Train Loss: 0.16012930932144323\n",
+      "Training accuracy (Classification): 0.9841666767994562\n",
+      "Test accuracy 0.937718\n",
+      "MarginLoss + RegLoss: 0.17309293 + 0.10644325 = 0.2795362\n",
+      "\n",
+      "\n",
+      "Epoch Number: 46\n",
+      "\n",
+      "Classification Train Loss: 0.1574974125251174\n",
+      "Training accuracy (Classification): 0.9850000101659033\n",
+      "Test accuracy 0.93722\n",
+      "MarginLoss + RegLoss: 0.17099261 + 0.10526536 = 0.27625796\n",
+      "\n",
+      "\n",
+      "Epoch Number: 47\n",
+      "\n",
+      "Classification Train Loss: 0.15617641361637247\n",
+      "Training accuracy (Classification): 0.9856944539480739\n",
+      "Test accuracy 0.937718\n",
+      "MarginLoss + RegLoss: 0.16866577 + 0.104043506 = 0.27270928\n",
+      "\n",
+      "\n",
+      "Epoch Number: 48\n",
+      "\n",
+      "Classification Train Loss: 0.15530151346077523\n",
+      "Training accuracy (Classification): 0.9838889001144303\n",
+      "Test accuracy 0.940209\n",
+      "MarginLoss + RegLoss: 0.16514857 + 0.10232182 = 0.2674704\n",
+      "\n",
+      "\n",
+      "Epoch Number: 49\n",
+      "\n",
+      "Classification Train Loss: 0.15294318615148464\n",
+      "Training accuracy (Classification): 0.9862500089738104\n",
+      "Test accuracy 0.939711\n",
+      "MarginLoss + RegLoss: 0.16788226 + 0.10096101 = 0.26884326\n",
+      "\n",
+      "\n",
+      "Epoch Number: 50\n",
+      "\n",
+      "Classification Train Loss: 0.15095406781054205\n",
+      "Training accuracy (Classification): 0.9861111202173762\n",
+      "Test accuracy 0.940209\n",
+      "MarginLoss + RegLoss: 0.17100953 + 0.10046519 = 0.27147472\n",
+      "\n",
+      "\n",
+      "Epoch Number: 51\n",
+      "\n",
+      "Classification Train Loss: 0.1513558304351237\n",
+      "Training accuracy (Classification): 0.9844444543123245\n",
+      "Test accuracy 0.941704\n",
+      "MarginLoss + RegLoss: 0.1662268 + 0.100100346 = 0.26632714\n",
+      "\n",
+      "\n",
+      "Epoch Number: 52\n",
+      "\n",
+      "Classification Train Loss: 0.14914156941490042\n",
+      "Training accuracy (Classification): 0.9852777876787715\n",
+      "Test accuracy 0.941206\n",
+      "MarginLoss + RegLoss: 0.16318396 + 0.099286705 = 0.26247066\n",
+      "\n",
+      "\n",
+      "Epoch Number: 53\n",
+      "\n",
+      "Classification Train Loss: 0.1497938595712185\n",
+      "Training accuracy (Classification): 0.9851388997501798\n",
+      "Test accuracy 0.932735\n",
+      "MarginLoss + RegLoss: 0.17166732 + 0.09957267 = 0.27124\n",
+      "\n",
+      "\n",
+      "Epoch Number: 54\n",
+      "\n",
+      "Classification Train Loss: 0.15218847369154295\n",
+      "Training accuracy (Classification): 0.985277786023087\n",
+      "Test accuracy 0.938715\n",
+      "MarginLoss + RegLoss: 0.17181182 + 0.09915227 = 0.2709641\n",
+      "\n",
+      "\n",
+      "Epoch Number: 55\n",
+      "\n",
+      "Classification Train Loss: 0.14960632245573732\n",
+      "Training accuracy (Classification): 0.9855555668473244\n",
+      "Test accuracy 0.943697\n",
+      "MarginLoss + RegLoss: 0.16333821 + 0.09872535 = 0.26206356\n",
+      "\n",
+      "\n",
+      "Epoch Number: 56\n",
+      "\n",
+      "Classification Train Loss: 0.15064662312053972\n",
+      "Training accuracy (Classification): 0.9852777885066138\n",
+      "Test accuracy 0.942202\n",
+      "MarginLoss + RegLoss: 0.16303498 + 0.09878391 = 0.2618189\n",
+      "\n",
+      "\n",
+      "Epoch Number: 57\n",
+      "\n",
+      "Classification Train Loss: 0.15265570394694805\n",
+      "Training accuracy (Classification): 0.9831944555044174\n",
+      "Test accuracy 0.940708\n",
+      "MarginLoss + RegLoss: 0.16671813 + 0.09886683 = 0.26558495\n",
+      "\n",
+      "\n",
+      "Epoch Number: 58\n",
+      "\n",
+      "Classification Train Loss: 0.15230748295370075\n",
+      "Training accuracy (Classification): 0.984166675971614\n",
+      "Test accuracy 0.938715\n",
+      "MarginLoss + RegLoss: 0.16594657 + 0.097650595 = 0.26359716\n",
+      "\n",
+      "\n",
+      "Epoch Number: 59\n",
+      "\n",
+      "Classification Train Loss: 0.1514456778143843\n",
+      "Training accuracy (Classification): 0.9843055647280481\n",
+      "Test accuracy 0.938216\n",
+      "MarginLoss + RegLoss: 0.16204405 + 0.09645542 = 0.25849947\n",
+      "\n",
+      "\n",
+      "Epoch Number: 60\n",
+      "\n",
+      "Classification Train Loss: 0.15362831794967255\n",
+      "Training accuracy (Classification): 0.9829166771637069\n",
+      "Test accuracy 0.933732\n",
+      "MarginLoss + RegLoss: 0.17626402 + 0.09787459 = 0.2741386\n",
+      "\n",
+      "\n",
+      "Epoch Number: 61\n",
+      "\n",
+      "Classification Train Loss: 0.15526858448154396\n",
+      "Training accuracy (Classification): 0.9813889024986161\n",
+      "Test accuracy 0.933732\n",
+      "MarginLoss + RegLoss: 0.17297557 + 0.09806729 = 0.27104285\n",
+      "\n",
+      "\n",
+      "Epoch Number: 62\n",
+      "\n",
+      "Classification Train Loss: 0.1579084157322844\n",
+      "Training accuracy (Classification): 0.9816666767001152\n",
+      "Test accuracy 0.936223\n",
+      "MarginLoss + RegLoss: 0.17195764 + 0.098572396 = 0.27053005\n",
+      "\n",
+      "\n",
+      "Epoch Number: 63\n",
+      "\n",
+      "Classification Train Loss: 0.1566090847675999\n",
+      "Training accuracy (Classification): 0.9826389013065232\n",
+      "Test accuracy 0.93423\n",
+      "MarginLoss + RegLoss: 0.17155647 + 0.10033124 = 0.27188772\n",
+      "\n",
+      "\n",
+      "Epoch Number: 64\n",
+      "\n",
+      "Classification Train Loss: 0.1548497351921267\n",
+      "Training accuracy (Classification): 0.9837500105301539\n",
+      "Test accuracy 0.941704\n",
+      "MarginLoss + RegLoss: 0.16137016 + 0.099378176 = 0.26074833\n",
+      "\n",
+      "\n",
+      "Epoch Number: 65\n",
+      "\n",
+      "Classification Train Loss: 0.15319975931197405\n",
+      "Training accuracy (Classification): 0.9829166746801801\n",
+      "Test accuracy 0.939213\n",
+      "MarginLoss + RegLoss: 0.16549328 + 0.09872568 = 0.26421896\n",
+      "\n",
+      "\n",
+      "Epoch Number: 66\n",
+      "\n",
+      "Classification Train Loss: 0.1565150058724814\n",
+      "Training accuracy (Classification): 0.9819444542129835\n",
+      "Test accuracy 0.935725\n",
+      "MarginLoss + RegLoss: 0.17288828 + 0.09988601 = 0.27277428\n",
+      "\n",
+      "\n",
+      "Epoch Number: 67\n",
+      "\n",
+      "******************** Sparse Retraining Phase Started ********************\n",
+      "\n",
+      "\n",
+      "Classification Train Loss: 0.15831943404757315\n",
+      "Training accuracy (Classification): 0.9829166779915491\n",
+      "Test accuracy 0.935725\n",
+      "MarginLoss + RegLoss: 0.17936754 + 0.101812266 = 0.28117982\n",
+      "\n",
+      "\n",
+      "Epoch Number: 68\n",
+      "\n",
+      "Classification Train Loss: 0.15614786164628136\n",
+      "Training accuracy (Classification): 0.9838889009422727\n",
+      "Test accuracy 0.931739\n",
+      "MarginLoss + RegLoss: 0.17960551 + 0.101831324 = 0.28143683\n",
+      "\n",
+      "\n",
+      "Epoch Number: 69\n",
+      "\n",
+      "Classification Train Loss: 0.1662438316270709\n",
+      "Training accuracy (Classification): 0.9827777884072728\n",
+      "Test accuracy 0.931739\n",
+      "MarginLoss + RegLoss: 0.19018382 + 0.10729199 = 0.2974758\n",
+      "\n",
+      "\n",
+      "Epoch Number: 70\n",
+      "\n",
+      "Classification Train Loss: 0.16005917576452097\n",
+      "Training accuracy (Classification): 0.9844444518287977\n",
+      "Test accuracy 0.929248\n",
+      "MarginLoss + RegLoss: 0.19133526 + 0.10547125 = 0.2968065\n",
+      "\n",
+      "\n",
+      "Epoch Number: 71\n",
+      "\n",
+      "Classification Train Loss: 0.15785305326183638\n",
+      "Training accuracy (Classification): 0.985000009338061\n",
+      "Test accuracy 0.933732\n",
+      "MarginLoss + RegLoss: 0.18749763 + 0.10477199 = 0.29226962\n",
+      "\n",
+      "\n",
+      "Epoch Number: 72\n",
+      "\n",
+      "Classification Train Loss: 0.15456503671076563\n",
+      "Training accuracy (Classification): 0.9843055663837327\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.1811654 + 0.10317116 = 0.28433657\n",
+      "\n",
+      "\n",
+      "Epoch Number: 73\n",
+      "\n",
+      "Classification Train Loss: 0.15287091862410307\n",
+      "Training accuracy (Classification): 0.9848611205816269\n",
+      "Test accuracy 0.934728\n",
+      "MarginLoss + RegLoss: 0.17708676 + 0.101716325 = 0.27880308\n",
+      "\n",
+      "\n",
+      "Epoch Number: 74\n",
+      "\n",
+      "Classification Train Loss: 0.15090375486761332\n",
+      "Training accuracy (Classification): 0.9855555643637975\n",
+      "Test accuracy 0.934728\n",
+      "MarginLoss + RegLoss: 0.17898533 + 0.10174509 = 0.28073043\n",
+      "\n",
+      "\n",
+      "Epoch Number: 75\n",
+      "\n",
+      "Classification Train Loss: 0.15054931139780414\n",
+      "Training accuracy (Classification): 0.9848611197537847\n",
+      "Test accuracy 0.93722\n",
+      "MarginLoss + RegLoss: 0.17272809 + 0.101017065 = 0.27374515\n",
+      "\n",
+      "\n",
+      "Epoch Number: 76\n",
+      "\n",
+      "Classification Train Loss: 0.14770951929191747\n",
+      "Training accuracy (Classification): 0.9855555651916398\n",
+      "Test accuracy 0.936722\n",
+      "MarginLoss + RegLoss: 0.17685911 + 0.09888628 = 0.2757454\n",
+      "\n",
+      "\n",
+      "Epoch Number: 77\n",
+      "\n",
+      "Classification Train Loss: 0.14727520239022043\n",
+      "Training accuracy (Classification): 0.9841666767994562\n",
+      "Test accuracy 0.935725\n",
+      "MarginLoss + RegLoss: 0.1720485 + 0.09774725 = 0.26979575\n",
+      "\n",
+      "\n",
+      "Epoch Number: 78\n",
+      "\n",
+      "Classification Train Loss: 0.1471475510754519\n",
+      "Training accuracy (Classification): 0.9858333418766657\n",
+      "Test accuracy 0.940209\n",
+      "MarginLoss + RegLoss: 0.16558117 + 0.09803399 = 0.26361516\n",
+      "\n",
+      "\n",
+      "Epoch Number: 79\n",
+      "\n",
+      "Classification Train Loss: 0.14565238232413927\n",
+      "Training accuracy (Classification): 0.9861111210452186\n",
+      "Test accuracy 0.937718\n",
+      "MarginLoss + RegLoss: 0.17031503 + 0.09688788 = 0.2672029\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 80\n",
+      "\n",
+      "Classification Train Loss: 0.14349345521380505\n",
+      "Training accuracy (Classification): 0.9861111185616918\n",
+      "Test accuracy 0.941206\n",
+      "MarginLoss + RegLoss: 0.16280341 + 0.09526416 = 0.25806758\n",
+      "\n",
+      "\n",
+      "Epoch Number: 81\n",
+      "\n",
+      "Classification Train Loss: 0.14298133655554718\n",
+      "Training accuracy (Classification): 0.9848611205816269\n",
+      "Test accuracy 0.935725\n",
+      "MarginLoss + RegLoss: 0.16992427 + 0.095204785 = 0.26512906\n",
+      "\n",
+      "\n",
+      "Epoch Number: 82\n",
+      "\n",
+      "Classification Train Loss: 0.1410345918395453\n",
+      "Training accuracy (Classification): 0.9854166756073633\n",
+      "Test accuracy 0.937718\n",
+      "MarginLoss + RegLoss: 0.16711517 + 0.09361006 = 0.26072523\n",
+      "\n",
+      "\n",
+      "Epoch Number: 83\n",
+      "\n",
+      "Classification Train Loss: 0.14173460192978382\n",
+      "Training accuracy (Classification): 0.9858333418766657\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.17255607 + 0.09335034 = 0.26590642\n",
+      "\n",
+      "\n",
+      "Epoch Number: 84\n",
+      "\n",
+      "Classification Train Loss: 0.1413275660533044\n",
+      "Training accuracy (Classification): 0.985000009338061\n",
+      "Test accuracy 0.939213\n",
+      "MarginLoss + RegLoss: 0.1691187 + 0.09220875 = 0.26132745\n",
+      "\n",
+      "\n",
+      "Epoch Number: 85\n",
+      "\n",
+      "Classification Train Loss: 0.1399904629215598\n",
+      "Training accuracy (Classification): 0.9863888977302445\n",
+      "Test accuracy 0.937718\n",
+      "MarginLoss + RegLoss: 0.16878359 + 0.09304918 = 0.26183277\n",
+      "\n",
+      "\n",
+      "Epoch Number: 86\n",
+      "\n",
+      "Classification Train Loss: 0.14306676108390093\n",
+      "Training accuracy (Classification): 0.9848611214094691\n",
+      "Test accuracy 0.933732\n",
+      "MarginLoss + RegLoss: 0.17234829 + 0.09307802 = 0.2654263\n",
+      "\n",
+      "\n",
+      "Epoch Number: 87\n",
+      "\n",
+      "Classification Train Loss: 0.14483444765210152\n",
+      "Training accuracy (Classification): 0.9838888976309035\n",
+      "Test accuracy 0.932237\n",
+      "MarginLoss + RegLoss: 0.17103034 + 0.093002975 = 0.26403332\n",
+      "\n",
+      "\n",
+      "Epoch Number: 88\n",
+      "\n",
+      "Classification Train Loss: 0.1426364007509417\n",
+      "Training accuracy (Classification): 0.9854166772630479\n",
+      "Test accuracy 0.938216\n",
+      "MarginLoss + RegLoss: 0.17191838 + 0.09332408 = 0.26524246\n",
+      "\n",
+      "\n",
+      "Epoch Number: 89\n",
+      "\n",
+      "Classification Train Loss: 0.1419605797984534\n",
+      "Training accuracy (Classification): 0.9854166756073633\n",
+      "Test accuracy 0.93722\n",
+      "MarginLoss + RegLoss: 0.16863512 + 0.09229554 = 0.26093066\n",
+      "\n",
+      "\n",
+      "Epoch Number: 90\n",
+      "\n",
+      "Classification Train Loss: 0.1416015759524372\n",
+      "Training accuracy (Classification): 0.984166675971614\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.17089692 + 0.0915688 = 0.26246572\n",
+      "\n",
+      "\n",
+      "Epoch Number: 91\n",
+      "\n",
+      "Classification Train Loss: 0.1449494053506189\n",
+      "Training accuracy (Classification): 0.9843055663837327\n",
+      "Test accuracy 0.933234\n",
+      "MarginLoss + RegLoss: 0.17210826 + 0.092280895 = 0.26438916\n",
+      "\n",
+      "\n",
+      "Epoch Number: 92\n",
+      "\n",
+      "Classification Train Loss: 0.14661915486471522\n",
+      "Training accuracy (Classification): 0.9826388971673118\n",
+      "Test accuracy 0.935725\n",
+      "MarginLoss + RegLoss: 0.17449446 + 0.092357084 = 0.26685154\n",
+      "\n",
+      "\n",
+      "Epoch Number: 93\n",
+      "\n",
+      "Classification Train Loss: 0.1467396484480964\n",
+      "Training accuracy (Classification): 0.9831944546765752\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.17004617 + 0.09433146 = 0.26437762\n",
+      "\n",
+      "\n",
+      "Epoch Number: 94\n",
+      "\n",
+      "Classification Train Loss: 0.1460545692178938\n",
+      "Training accuracy (Classification): 0.9841666767994562\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.17442052 + 0.09421773 = 0.26863825\n",
+      "\n",
+      "\n",
+      "Epoch Number: 95\n",
+      "\n",
+      "Classification Train Loss: 0.14522172489927876\n",
+      "Training accuracy (Classification): 0.9843055639002058\n",
+      "Test accuracy 0.936223\n",
+      "MarginLoss + RegLoss: 0.16918503 + 0.09473272 = 0.26391774\n",
+      "\n",
+      "\n",
+      "Epoch Number: 96\n",
+      "\n",
+      "Classification Train Loss: 0.14685245561930868\n",
+      "Training accuracy (Classification): 0.9838888992865881\n",
+      "Test accuracy 0.93423\n",
+      "MarginLoss + RegLoss: 0.1715351 + 0.09685955 = 0.26839465\n",
+      "\n",
+      "\n",
+      "Epoch Number: 97\n",
+      "\n",
+      "Classification Train Loss: 0.15079948357823822\n",
+      "Training accuracy (Classification): 0.9830555634366142\n",
+      "Test accuracy 0.935227\n",
+      "MarginLoss + RegLoss: 0.1724481 + 0.0967999 = 0.269248\n",
+      "\n",
+      "\n",
+      "Epoch Number: 98\n",
+      "\n",
+      "Classification Train Loss: 0.15230303982065785\n",
+      "Training accuracy (Classification): 0.9816666767001152\n",
+      "Test accuracy 0.932237\n",
+      "MarginLoss + RegLoss: 0.17799449 + 0.09676037 = 0.27475485\n",
+      "\n",
+      "\n",
+      "Epoch Number: 99\n",
+      "\n",
+      "Classification Train Loss: 0.1494007593848639\n",
+      "Training accuracy (Classification): 0.9838888976309035\n",
+      "Test accuracy 0.932735\n",
+      "MarginLoss + RegLoss: 0.17286898 + 0.096531555 = 0.26940054\n",
+      "\n",
+      "\n",
+      "Non-Zero : 4156.0 Model Size: 31.703125 KB hasSparse: True\n",
+      "\n",
+      "For Classification, Maximum Test accuracy at compressed model size(including early stopping): 0.94369704 at Epoch: 56\n",
+      "Final Test Accuracy: 0.93273544\n",
+      "The Model Directory: usps10//TFBonsaiResults/16_20_53_15_02_19\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "bonsaiTrainer.train(batchSize, totalEpochs, sess,\n",
+    "                    Xtrain, Xtest, Ytrain, Ytest, dataDir, currDir)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/tf2.0/examples/Bonsai/bonsai_example.py b/tf2.0/examples/Bonsai/bonsai_example.py
new file mode 100644
index 000000000..2fc29e7c4
--- /dev/null
+++ b/tf2.0/examples/Bonsai/bonsai_example.py
@@ -0,0 +1,115 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+import helpermethods
+import tensorflow as tf
+import numpy as np
+import sys
+from edgeml.trainer.bonsaiTrainer import BonsaiTrainer
+from edgeml.graph.bonsai import Bonsai
+
+tf.compat.v1.disable_eager_execution()
+
+def main():
+    # Fixing seeds for reproducibility
+    tf.compat.v1.set_random_seed(42)
+    np.random.seed(42)
+
+    # Hyper Param pre-processing
+    args = helpermethods.getArgs()
+
+    # Set 'isRegression' to be True, for regression. Default is 'False'.
+    isRegression = args.regression
+
+    sigma = args.sigma
+    depth = args.depth
+
+    projectionDimension = args.proj_dim
+    regZ = args.rZ
+    regT = args.rT
+    regW = args.rW
+    regV = args.rV
+
+    totalEpochs = args.epochs
+
+    learningRate = args.learning_rate
+
+    dataDir = args.data_dir
+
+    outFile = args.output_file
+
+    (dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest,
+     mean, std) = helpermethods.preProcessData(dataDir, isRegression)
+
+    sparZ = args.sZ
+
+    if numClasses > 2:
+        sparW = 0.2
+        sparV = 0.2
+        sparT = 0.2
+    else:
+        sparW = 1
+        sparV = 1
+        sparT = 1
+
+    if args.sW is not None:
+        sparW = args.sW
+    if args.sV is not None:
+        sparV = args.sV
+    if args.sT is not None:
+        sparT = args.sT
+
+    if args.batch_size is None:
+        batchSize = np.maximum(100, int(np.ceil(np.sqrt(Ytrain.shape[0]))))
+    else:
+        batchSize = args.batch_size
+
+    useMCHLoss = True
+
+    if numClasses == 2:
+        numClasses = 1
+
+    X = tf.compat.v1.placeholder("float32", [None, dataDimension])
+    Y = tf.compat.v1.placeholder("float32", [None, numClasses])
+
+    currDir = helpermethods.createTimeStampDir(dataDir)
+
+    helpermethods.dumpCommand(sys.argv, currDir)
+    helpermethods.saveMeanStd(mean, std, currDir)
+
+    # numClasses = 1 for binary case
+    bonsaiObj = Bonsai(numClasses, dataDimension,
+                       projectionDimension, depth, sigma, isRegression)
+
+    bonsaiTrainer = BonsaiTrainer(bonsaiObj,
+                                  regW, regT, regV, regZ,
+                                  sparW, sparT, sparV, sparZ,
+                                  learningRate, X, Y, useMCHLoss, outFile)
+
+    sess = tf.compat.v1.InteractiveSession()
+
+    sess.run(tf.compat.v1.global_variables_initializer())
+
+    bonsaiTrainer.train(batchSize, totalEpochs, sess,
+                        Xtrain, Xtest, Ytrain, Ytest, dataDir, currDir)
+
+    sess.close()
+    sys.stdout.close()
+
+
+if __name__ == '__main__':
+    main()
+
+# For the following command:
+# Data - Curet
+# python2 bonsai_example.py -dir ./curet/ -d 2 -p 22 -rW 0.00001 -rZ 0.0000001 -rV 0.00001 -rT 0.000001 -sZ 0.4 -sW 0.5 -sV 0.5 -sT 1 -e 300 -s 0.1 -b 20
+# Final Output - useMCHLoss = True
+# Maximum Test accuracy at compressed model size(including early stopping): 0.93727726 at Epoch: 297
+# Final Test Accuracy: 0.9337135
+# Non-Zeros: 24231.0 Model Size: 115.65625 KB hasSparse: True
+
+# Data - usps2
+# python2 bonsai_example.py -dir /mnt/c/Users/t-vekusu/Downloads/datasets/usps-binary/ -d 2 -p 22 -rW 0.00001 -rZ 0.0000001 -rV 0.00001 -rT 0.000001 -sZ 0.4 -sW 0.5 -sV 0.5 -sT 1 -e 300 -s 0.1 -b 20
+# Maximum Test accuracy at compressed model size(including early stopping): 0.9521674 at Epoch: 246
+# Final Test Accuracy: 0.94170403
+# Non-Zeros: 2636.0 Model Size: 19.1328125 KB hasSparse: True
diff --git a/tf2.0/examples/Bonsai/fetch_usps.py b/tf2.0/examples/Bonsai/fetch_usps.py
new file mode 100644
index 000000000..c1b2e0726
--- /dev/null
+++ b/tf2.0/examples/Bonsai/fetch_usps.py
@@ -0,0 +1,64 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+#
+# Setting up the USPS Data.
+
+import subprocess
+import os
+import numpy as np
+from sklearn.datasets import load_svmlight_file
+import sys
+
+def downloadData(workingDir, downloadDir, linkTrain, linkTest):
+    def runcommand(command):
+        p = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
+        output, error = p.communicate()
+        assert(p.returncode == 0), 'Command failed: %s' % command
+
+    path = workingDir + '/' + downloadDir
+    path = os.path.abspath(path)
+    try:
+        os.mkdir(path)
+    except OSError:
+        print("Could not create %s. Make sure the path does" % path)
+        print("not already exist and you have permisions to create it.")
+        return False
+    cwd = os.getcwd()
+    os.chdir(path)
+    print("Downloading data")
+    command = 'wget %s' % linkTrain
+    runcommand(command)
+    command = 'wget %s' % linkTest
+    runcommand(command)
+    print("Extracting data")
+    command = 'bzip2 -d usps.bz2'
+    runcommand(command)
+    command = 'bzip2 -d usps.t.bz2'
+    runcommand(command)
+    command = 'mv usps train.txt'
+    runcommand(command)
+    command = 'mv usps.t test.txt'
+    runcommand(command)
+    os.chdir(cwd)
+    return True
+
+if __name__ == '__main__':
+    workingDir = './'
+    downloadDir = 'usps10'
+    linkTrain = 'http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2'
+    linkTest = 'http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2'
+    failureMsg = '''
+Download Failed!
+To manually perform the download
+\t1. Create a new empty directory named `usps10`.
+\t2. Download the data from the following links into the usps10 directory.
+\t\tTest: %s
+\t\tTrain: %s
+\t3. Extract the downloaded files.
+\t4. Rename `usps` to `train.txt` and,
+\t5. Rename `usps.t` to `test.txt
+''' % (linkTrain, linkTest)
+
+    if not downloadData(workingDir, downloadDir, linkTrain, linkTest):
+        exit(failureMsg)
+    print("Done")
diff --git a/tf2.0/examples/Bonsai/helpermethods.py b/tf2.0/examples/Bonsai/helpermethods.py
new file mode 100644
index 000000000..febe0613e
--- /dev/null
+++ b/tf2.0/examples/Bonsai/helpermethods.py
@@ -0,0 +1,270 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+'''
+ Functions to check sanity of input arguments
+ for the example script.
+'''
+import argparse
+import datetime
+import os
+import numpy as np
+
+
+def checkIntPos(value):
+    ivalue = int(value)
+    if ivalue <= 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid positive int value" % value)
+    return ivalue
+
+
+def checkIntNneg(value):
+    ivalue = int(value)
+    if ivalue < 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid non-neg int value" % value)
+    return ivalue
+
+
+def checkFloatNneg(value):
+    fvalue = float(value)
+    if fvalue < 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid non-neg float value" % value)
+    return fvalue
+
+
+def checkFloatPos(value):
+    fvalue = float(value)
+    if fvalue <= 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid positive float value" % value)
+    return fvalue
+
+
+def str2bool(v):
+    if v.lower() in ('yes', 'true', 't', 'y', '1'):
+        return True
+    elif v.lower() in ('no', 'false', 'f', 'n', '0'):
+        return False
+    else:
+        raise argparse.ArgumentTypeError('Boolean value expected.')
+
+
+def getArgs():
+    '''
+    Function to parse arguments for Bonsai Algorithm
+    '''
+    parser = argparse.ArgumentParser(
+        description='HyperParams for Bonsai Algorithm')
+    parser.add_argument('-dir', '--data-dir', required=True,
+                        help='Data directory containing' +
+                        'train.npy and test.npy')
+
+    parser.add_argument('-d', '--depth', type=checkIntNneg, default=2,
+                        help='Depth of Bonsai Tree ' +
+                        '(default: 2 try: [0, 1, 3])')
+    parser.add_argument('-p', '--proj-dim', type=checkIntPos, default=10,
+                        help='Projection Dimension ' +
+                        '(default: 20 try: [5, 10, 30])')
+    parser.add_argument('-s', '--sigma', type=float, default=1.0,
+                        help='Parameter for sigmoid sharpness ' +
+                        '(default: 1.0 try: [3.0, 0.05, 0.1]')
+    parser.add_argument('-e', '--epochs', type=checkIntPos, default=42,
+                        help='Total Epochs (default: 42 try:[100, 150, 60])')
+    parser.add_argument('-b', '--batch-size', type=checkIntPos,
+                        help='Batch Size to be used ' +
+                        '(default: max(100, sqrt(train_samples)))')
+    parser.add_argument('-lr', '--learning-rate', type=checkFloatPos,
+                        default=0.01, help='Initial Learning rate for ' +
+                        'Adam Optimizer (default: 0.01)')
+
+    parser.add_argument('-rW', type=float, default=0.0001,
+                        help='Regularizer for predictor parameter W  ' +
+                        '(default: 0.0001 try: [0.01, 0.001, 0.00001])')
+    parser.add_argument('-rV', type=float, default=0.0001,
+                        help='Regularizer for predictor parameter V  ' +
+                        '(default: 0.0001 try: [0.01, 0.001, 0.00001])')
+    parser.add_argument('-rT', type=float, default=0.0001,
+                        help='Regularizer for branching parameter Theta  ' +
+                        '(default: 0.0001 try: [0.01, 0.001, 0.00001])')
+    parser.add_argument('-rZ', type=float, default=0.00001,
+                        help='Regularizer for projection parameter Z  ' +
+                        '(default: 0.00001 try: [0.001, 0.0001, 0.000001])')
+
+    parser.add_argument('-sW', type=checkFloatPos,
+                        help='Sparsity for predictor parameter W  ' +
+                        '(default: For Binary classification 1.0 else 0.2 ' +
+                        'try: [0.1, 0.3, 0.5])')
+    parser.add_argument('-sV', type=checkFloatPos,
+                        help='Sparsity for predictor parameter V  ' +
+                        '(default: For Binary classification 1.0 else 0.2 ' +
+                        'try: [0.1, 0.3, 0.5])')
+    parser.add_argument('-sT', type=checkFloatPos,
+                        help='Sparsity for branching parameter Theta  ' +
+                        '(default: For Binary classification 1.0 else 0.2 ' +
+                        'try: [0.1, 0.3, 0.5])')
+    parser.add_argument('-sZ', type=checkFloatPos, default=0.2,
+                        help='Sparsity for projection parameter Z  ' +
+                        '(default: 0.2 try: [0.1, 0.3, 0.5])')
+    parser.add_argument('-oF', '--output-file', default=None,
+                        help='Output file for dumping the program output, ' +
+                        '(default: stdout)')
+
+    parser.add_argument('-regression', type=str2bool, default=False,
+                        help='boolean argument which controls whether to perform ' +
+                        'regression or classification.' +
+                        'default : False (Classification) values: [True, False]')
+
+    return parser.parse_args()
+
+
+def getQuantArgs():
+    '''
+    Function to parse arguments for Model Quantisation
+    '''
+    parser = argparse.ArgumentParser(
+        description='Arguments for quantizing Fast models. ' +
+        'Works only for piece-wise linear non-linearities, ' +
+        'like relu, quantTanh, quantSigm (check rnn.py for the definitions)')
+    parser.add_argument('-dir', '--model-dir', required=True,
+                        help='model directory containing' +
+                        '*.npy weight files dumped from the trained model')
+    parser.add_argument('-m', '--max-val', type=checkIntNneg, default=127,
+                        help='this represents the maximum possible value ' +
+                        'in model, essentially the byte complexity, ' +
+                        '127=> 1 byte is default')
+
+    return parser.parse_args()
+
+
+def createTimeStampDir(dataDir):
+    '''
+    Creates a Directory with timestamp as it's name
+    '''
+    if os.path.isdir(dataDir + '/TFBonsaiResults') is False:
+        try:
+            os.mkdir(dataDir + '/TFBonsaiResults')
+        except OSError:
+            print("Creation of the directory %s failed" %
+                  dataDir + '/TFBonsaiResults')
+
+    currDir = 'TFBonsaiResults/' + datetime.datetime.now().strftime("%H_%M_%S_%d_%m_%y")
+    if os.path.isdir(dataDir + '/' + currDir) is False:
+        try:
+            os.mkdir(dataDir + '/' + currDir)
+        except OSError:
+            print("Creation of the directory %s failed" %
+                  dataDir + '/' + currDir)
+        else:
+            return (dataDir + '/' + currDir)
+    return None
+
+
+def preProcessData(dataDir, isRegression=False):
+    '''
+    Function to pre-process input data
+    Expects a .npy file of form [lbl feats] for each datapoint
+    Outputs a train and test set datapoints appended with 1 for Bias induction
+    dataDimension, numClasses are inferred directly
+    '''
+    train = np.load(dataDir + '/train.npy')
+    test = np.load(dataDir + '/test.npy')
+
+    dataDimension = int(train.shape[1]) - 1
+
+    Xtrain = train[:, 1:dataDimension + 1]
+    Ytrain_ = train[:, 0]
+
+    Xtest = test[:, 1:dataDimension + 1]
+    Ytest_ = test[:, 0]
+
+    # Mean Var Normalisation
+    mean = np.mean(Xtrain, 0)
+    std = np.std(Xtrain, 0)
+    std[std[:] < 0.000001] = 1
+    Xtrain = (Xtrain - mean) / std
+    Xtest = (Xtest - mean) / std
+    # End Mean Var normalisation
+
+    # Classification.
+    if (isRegression == False):
+        numClasses = max(Ytrain_) - min(Ytrain_) + 1
+        numClasses = int(max(numClasses, max(Ytest_) - min(Ytest_) + 1))
+
+        lab = Ytrain_.astype('uint8')
+        lab = np.array(lab) - min(lab)
+
+        lab_ = np.zeros((Xtrain.shape[0], numClasses))
+        lab_[np.arange(Xtrain.shape[0]), lab] = 1
+        if (numClasses == 2):
+            Ytrain = np.reshape(lab, [-1, 1])
+        else:
+            Ytrain = lab_
+
+        lab = Ytest_.astype('uint8')
+        lab = np.array(lab) - min(lab)
+
+        lab_ = np.zeros((Xtest.shape[0], numClasses))
+        lab_[np.arange(Xtest.shape[0]), lab] = 1
+        if (numClasses == 2):
+            Ytest = np.reshape(lab, [-1, 1])
+        else:
+            Ytest = lab_
+
+    elif (isRegression == True):
+        # The number of classes is always 1, for regression.
+        numClasses = 1
+        Ytrain = Ytrain_
+        Ytest = Ytest_
+
+    trainBias = np.ones([Xtrain.shape[0], 1])
+    Xtrain = np.append(Xtrain, trainBias, axis=1)
+    testBias = np.ones([Xtest.shape[0], 1])
+    Xtest = np.append(Xtest, testBias, axis=1)
+
+    mean = np.append(mean, np.array([0]))
+    std = np.append(std, np.array([1]))
+
+    if (isRegression == False):
+        return dataDimension + 1, numClasses, Xtrain, Ytrain, Xtest, Ytest, mean, std
+    elif (isRegression == True):
+        return dataDimension + 1, numClasses, Xtrain, Ytrain.reshape((-1, 1)), Xtest, Ytest.reshape((-1, 1)), mean, std
+
+
+def dumpCommand(list, currDir):
+    '''
+    Dumps the current command to a file for further use
+    '''
+    commandFile = open(currDir + '/command.txt', 'w')
+    command = "python"
+
+    command = command + " " + ' '.join(list)
+    commandFile.write(command)
+
+    commandFile.flush()
+    commandFile.close()
+
+
+def saveMeanStd(mean, std, currDir):
+    '''
+    Function to save Mean and Std vectors
+    '''
+    np.save(currDir + '/mean.npy', mean)
+    np.save(currDir + '/std.npy', std)
+    saveMeanStdSeeDot(mean, std, currDir + "/SeeDot")
+
+
+def saveMeanStdSeeDot(mean, std, seeDotDir):
+    '''
+    Function to save Mean and Std vectors
+    '''
+    if os.path.isdir(seeDotDir) is False:
+        try:
+            os.mkdir(seeDotDir)
+        except OSError:
+            print("Creation of the directory %s failed" %
+                  seeDotDir)
+    np.savetxt(seeDotDir + '/Mean', mean, delimiter="\t")
+    np.savetxt(seeDotDir + '/Std', std, delimiter="\t")
diff --git a/tf2.0/examples/Bonsai/process_usps.py b/tf2.0/examples/Bonsai/process_usps.py
new file mode 100644
index 000000000..252ba11e2
--- /dev/null
+++ b/tf2.0/examples/Bonsai/process_usps.py
@@ -0,0 +1,54 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+#
+# Processing the USPS Data. It is assumed that the data is already
+# downloaded.
+
+import subprocess
+import os
+import numpy as np
+from sklearn.datasets import load_svmlight_file
+import sys
+
+def processData(workingDir, downloadDir):
+    def loadLibSVMFile(file):
+        data = load_svmlight_file(file)
+        features = data[0]
+        labels = data[1]
+        retMat = np.zeros([features.shape[0], features.shape[1] + 1])
+        retMat[:, 0] = labels
+        retMat[:, 1:] = features.todense()
+        return retMat
+
+    path = workingDir + '/' + downloadDir
+    path = os.path.abspath(path)
+    trf = path + '/train.txt'
+    tsf = path + '/test.txt'
+    assert os.path.isfile(trf), 'File not found: %s' % trf
+    assert os.path.isfile(tsf), 'File not found: %s' % tsf
+    train = loadLibSVMFile(trf)
+    test = loadLibSVMFile(tsf)
+
+    # Convert the labels from 0 to numClasses-1
+    y_train = train[:, 0]
+    y_test = test[:, 0]
+
+    lab = y_train.astype('uint8')
+    lab = np.array(lab) - min(lab)
+    train[:, 0] = lab
+
+    lab = y_test.astype('uint8')
+    lab = np.array(lab) - min(lab)
+    test[:, 0] = lab
+
+    np.save(path + '/train.npy', train)
+    np.save(path + '/test.npy', test)
+
+if __name__ == '__main__':
+    # Configuration
+    workingDir = './'
+    downloadDir = 'usps10'
+    # End config
+    print("Processing data")
+    processData(workingDir, downloadDir)
+    print("Done")
diff --git a/tf2.0/examples/Bonsai/quantizeBonsaiModels.py b/tf2.0/examples/Bonsai/quantizeBonsaiModels.py
new file mode 100644
index 000000000..6ff9f737c
--- /dev/null
+++ b/tf2.0/examples/Bonsai/quantizeBonsaiModels.py
@@ -0,0 +1,72 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+import helpermethods
+import os
+import numpy as np
+
+
+def min_max(A, name):
+    print(name + " has max: " + str(np.max(A)) + " min: " + str(np.min(A)))
+    return np.max([np.abs(np.max(A)), np.abs(np.min(A))])
+
+
+def quantizeFastModels(modelDir, maxValue=127, scalarScaleFactor=1000):
+    ls = os.listdir(modelDir)
+    paramNameList = []
+    paramWeightList = []
+    paramLimitList = []
+
+    for file in ls:
+        if file.endswith("npy"):
+            if file.startswith("mean") or file.startswith("std") or file.startswith("hyperParam"):
+                continue
+            else:
+                paramNameList.append(file)
+                temp = np.load(modelDir + "/" + file)
+                paramWeightList.append(temp)
+                paramLimitList.append(min_max(temp, file))
+
+    paramLimit = np.max(paramLimitList)
+
+    paramScaleFactor = np.round((2.0 * maxValue + 1.0) / (2.0 * paramLimit))
+
+    quantParamWeights = []
+    for param in paramWeightList:
+        temp = np.round(paramScaleFactor * param)
+        temp[temp[:] > maxValue] = maxValue
+        temp[temp[:] < -maxValue] = -1 * (maxValue + 1)
+
+        if maxValue <= 127:
+            temp = temp.astype('int8')
+        elif maxValue <= 32767:
+            temp = temp.astype('int16')
+        else:
+            temp = temp.astype('int32')
+
+        quantParamWeights.append(temp)
+
+    if os.path.isdir(modelDir + '/QuantizedTFBonsaiModel') is False:
+        try:
+            os.mkdir(modelDir + '/QuantizedTFBonsaiModel')
+            quantModelDir = modelDir + '/QuantizedTFBonsaiModel'
+        except OSError:
+            print("Creation of the directory %s failed" %
+                  modelDir + '/QuantizedFastModel')
+
+    np.save(quantModelDir + "/paramScaleFactor.npy",
+            paramScaleFactor.astype('int32'))
+
+    for i in range(len(paramNameList)):
+        np.save(quantModelDir + "/q" + paramNameList[i], quantParamWeights[i])
+
+    print("\n\nQuantized Model Dir: " + quantModelDir)
+
+
+def main():
+    args = helpermethods.getQuantArgs()
+    quantizeFastModels(args.model_dir, int(args.max_val))
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tf2.0/examples/FastCells/README.md b/tf2.0/examples/FastCells/README.md
new file mode 100644
index 000000000..52b12e6b2
--- /dev/null
+++ b/tf2.0/examples/FastCells/README.md
@@ -0,0 +1,77 @@
+# EdgeML FastCells on a sample public dataset
+
+This directory includes example notebook and general execution script of
+FastCells (FastRNN & FastGRNN) developed as part of EdgeML along with modified
+UGRNN, GRU and LSTM to support the LSQ training routine. 
+Also, we include a sample cleanup and use-case on the USPS10 public dataset.
+
+`edgeml.graph.rnn` implements the custom RNN cells of **FastRNN** ([`FastRNNCell`](../../edgeml/graph/rnn.py#L215)) and **FastGRNN** ([`FastGRNNCell`](../../edgeml/graph/rnn.py#L40)) with
+multiple additional features like Low-Rank parameterisation, custom
+non-linearities etc., Similar to Bonsai and ProtoNN, the three-phase training
+routine for FastRNN and FastGRNN is decoupled from the custom cells to
+facilitate a plug and play behaviour of the custom RNN cells in other
+architectures (NMT, Encoder-Decoder etc.,) in place of the inbuilt `RNNCell`, `GRUCell`, `BasicLSTMCell` etc., 
+`edgeml.graph.rnn` also contains modified RNN cells of **UGRNN** ([`UGRNNLRCell`](../../edgeml/graph/rnn.py#L862)), 
+**GRU** ([`GRULRCell`](../../edgeml/graph/rnn.py#L635)) and **LSTM** ([`LSTMLRCell`](../../edgeml/graph/rnn.py#L376)). These cells also can be substituted for FastCells where ever feasible. 
+
+For training FastCells, `edgeml.trainer.fastTrainer` implements the three-phase
+FastCell training routine in Tensorflow. A simple example,
+`examples/fastcell_example.py` is provided to illustrate its usage.
+
+Note that `fastcell_example.py` assumes that data is in a specific format.  It
+is assumed that train and test data is contained in two files, `train.npy` and
+`test.npy`. Each containing a 2D numpy array of dimension `[numberOfExamples,
+numberOfFeatures]`. numberOfFeatures is `timesteps x inputDims`, flattened
+across timestep dimension. So the input of 1st timestep followed by second and
+so on.  For an N-Class problem, we assume the labels are integers from 0
+through N-1. Lastly, the training data, `train.npy`, is assumed to well shuffled 
+as the training routine doesn't shuffle internally.
+
+**Tested With:** Tensorflow >1.6 with Python 2 and Python 3
+
+## Download and clean up sample dataset
+
+We will be testing out the validation of the code by using the USPS dataset.
+The download and cleanup of the dataset to match the above-mentioned format is
+done by the script [fetch_usps.py](fetch_usps.py) and
+[process_usps.py](process_usps.py)
+
+```
+python fetch_usps.py
+python process_usps.py
+```
+
+
+## Sample command for FastCells on USPS10
+The following sample run on usps10 should validate your library:
+
+Note: Even though usps10 is not a time-series dataset, it can be assumed as, a time-series where each row is coming in at one single time.
+So the number of timesteps = 16 and inputDims = 16
+
+```bash
+python fastcell_example.py -dir usps10/ -id 16 -hd 32
+```
+This command should give you a final output screen which reads roughly similar to (might not be exact numbers due to various version mismatches):
+
+```
+Maximum Test accuracy at compressed model size(including early stopping): 0.9407075 at Epoch: 262
+Final Test Accuracy: 0.93721974
+
+Non-Zeros: 1932 Model Size: 7.546875 KB hasSparse: False
+```
+`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or `FastGRNNResults.txt` depending on the choice of the RNN cell.
+A directory `FastRNNResults` or `FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset
+
+## Byte Quantization(Q) for model compression
+If you wish to quantize the generated model to use byte quantized integers use `quantizeFastModels.py`. Usage Instructions:
+
+```
+python quantizeFastModels.py -h
+```
+
+This will generate quantized models with a suffix of `q` before every param stored in a new directory `QuantizedFastModel` inside the model directory.
+One can use this model further on edge devices.
+
+Copyright (c) Microsoft Corporation. All rights reserved. 
+
+Licensed under the MIT license.
diff --git a/tf2.0/examples/FastCells/fastcell_example.ipynb b/tf2.0/examples/FastCells/fastcell_example.ipynb
new file mode 100644
index 000000000..d1d59ee80
--- /dev/null
+++ b/tf2.0/examples/FastCells/fastcell_example.ipynb
@@ -0,0 +1,1557 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# FastRNN and FastGRNN in Tensorflow\n",
+    "\n",
+    "This is a simple notebook that illustrates the usage of Tensorflow implementation of FastRNN and FastGRNN. We are using the USPS dataset. Please refer to `fetch_usps.py` and run it for downloading and cleaning up the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright (c) Microsoft Corporation. All rights reserved.\n",
+    "# Licensed under the MIT license.\n",
+    "\n",
+    "import helpermethods\n",
+    "import tensorflow as tf\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "import os\n",
+    "\n",
+    "#Provide the GPU number to be used\n",
+    "os.environ['CUDA_VISIBLE_DEVICES'] =''\n",
+    "\n",
+    "#FastRNN and FastGRNN imports\n",
+    "from edgeml.trainer.fastTrainer import FastTrainer\n",
+    "from edgeml.graph.rnn import FastGRNNCell\n",
+    "from edgeml.graph.rnn import FastRNNCell\n",
+    "from edgeml.graph.rnn import UGRNNLRCell\n",
+    "from edgeml.graph.rnn import GRULRCell\n",
+    "from edgeml.graph.rnn import LSTMLRCell\n",
+    "\n",
+    "# Fixing seeds for reproducibility\n",
+    "tf.set_random_seed(42)\n",
+    "np.random.seed(42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# USPS Data\n",
+    "\n",
+    "It is assumed that the USPS data has already been downloaded and processed with [fetch_usps.py](fetch_usps.py) and [process_usps.py](process_usps.py), and is present in the `./usps10` subdirectory.\n",
+    "\n",
+    "Note: Even though usps10 is not a time-series dataset, it can be assumed as, a time-series where each row is coming in at one single time.\n",
+    "So the number of timesteps = 16 and inputDims = 16"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Feature Dimension:  256\n",
+      "Num classes:  10\n"
+     ]
+    }
+   ],
+   "source": [
+    "#Loading and Pre-processing dataset for FastCells\n",
+    "dataDir = \"usps10\"\n",
+    "(dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest, mean, std) = helpermethods.preProcessData(dataDir)\n",
+    "print(\"Feature Dimension: \", dataDimension)\n",
+    "print(\"Num classes: \", numClasses)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Model Parameters\n",
+    "\n",
+    "FastRNN and FastGRNN work for most of the hyper-parameters with which you could acheive decent accuracies on LSTM/GRU. Over and above that, you can use low-rank, sparsity and quatization to reduce model size upto 45x when compared to LSTM/GRU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cell = \"FastGRNN\" # Choose between FastGRNN, FastRNN, UGRNN, GRU and LSTM\n",
+    "\n",
+    "inputDims = 16 #features taken in by RNN in one timestep\n",
+    "hiddenDims = 32 #hidden state of RNN\n",
+    "\n",
+    "totalEpochs = 300\n",
+    "batchSize = 100\n",
+    "\n",
+    "learningRate = 0.01\n",
+    "decayStep = 200\n",
+    "decayRate = 0.1\n",
+    "\n",
+    "outFile = None #provide your file, if you need all the logging info in a file\n",
+    "\n",
+    "#low-rank parameterisation for weight matrices. None => Full Rank\n",
+    "wRank = None \n",
+    "uRank = None \n",
+    "\n",
+    "#Sparsity of the weight matrices. x => 100*x % are non-zeros\n",
+    "sW = 1.0 \n",
+    "sU = 1.0\n",
+    "\n",
+    "#Non-linearities for the RNN architecture. Can choose from \"tanh, sigmoid, relu, quantTanh, quantSigm\"\n",
+    "update_non_linearity = \"tanh\"\n",
+    "gate_non_linearity = \"sigmoid\"\n",
+    "\n",
+    "assert dataDimension % inputDims == 0, \"Infeasible per step input, Timesteps have to be integer\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Placeholders for Data feeding during training and infernece"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X = tf.placeholder(\"float\", [None, int(dataDimension / inputDims), inputDims])\n",
+    "Y = tf.placeholder(\"float\", [None, numClasses])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Creating a directory for current model in the datadirectory using timestamp"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "currDir = helpermethods.createTimeStampDir(dataDir, cell)\n",
+    "helpermethods.dumpCommand(sys.argv, currDir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# FastCell Graph Object\n",
+    "\n",
+    "Instantiating the FastCell Graph using modular RNN Cells which will be used for training and inference.\n",
+    "\n",
+    "Note: RNN cells in edgeml.rnn can be used anywhere in place of LSTM/GRU in a plug & play fashion."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#Create appropriate RNN cell object based on choice\n",
+    "if cell == \"FastGRNN\":\n",
+    "    FastCell = FastGRNNCell(hiddenDims, gate_non_linearity=gate_non_linearity,\n",
+    "                            update_non_linearity=update_non_linearity,\n",
+    "                            wRank=wRank, uRank=uRank)\n",
+    "elif cell == \"FastRNN\":\n",
+    "    FastCell = FastRNNCell(hiddenDims, update_non_linearity=update_non_linearity,\n",
+    "                           wRank=wRank, uRank=uRank)\n",
+    "elif cell == \"UGRNN\":\n",
+    "    FastCell = UGRNNLRCell(hiddenDims, update_non_linearity=update_non_linearity,\n",
+    "                           wRank=wRank, uRank=uRank)\n",
+    "elif cell == \"GRU\":\n",
+    "    FastCell = GRULRCell(hiddenDims, update_non_linearity=update_non_linearity,\n",
+    "                         wRank=wRank, uRank=uRank)\n",
+    "elif cell == \"LSTM\":\n",
+    "    FastCell = LSTMLRCell(hiddenDims, update_non_linearity=update_non_linearity,\n",
+    "                          wRank=wRank, uRank=uRank)\n",
+    "else:\n",
+    "    sys.exit('Exiting: No Such Cell as ' + cell)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# FastCell Trainer Object\n",
+    "\n",
+    "Instantiating the FastCell Trainer which will be used for 3 phase training"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "FastCellTrainer = FastTrainer(FastCell, X, Y, sW=sW, sU=sU, learningRate=learningRate, outFile=outFile)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Session declaration and variable initialization. Interactive Session doesn't clog the entire GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sess = tf.InteractiveSession()\n",
+    "sess.run(tf.global_variables_initializer())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# FastCell Training Routine\n",
+    "\n",
+    "The method to to run the 3 phase training, followed by giving out the best early stopping model, accuracy along with saving of the parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 0\n",
+      "\n",
+      "******************** Dense Training Phase Started ********************\n",
+      "\n",
+      "Train Loss: 1.3531070024999854 Train Accuracy: 0.565881378744563\n",
+      "Test Loss: 0.8334901 Test Accuracy: 0.7349278\n",
+      "\n",
+      "Epoch Number: 1\n",
+      "Train Loss: 0.5264064224615489 Train Accuracy: 0.8227005854044875\n",
+      "Test Loss: 0.52811986 Test Accuracy: 0.83557546\n",
+      "\n",
+      "Epoch Number: 2\n",
+      "Train Loss: 0.3170111432467421 Train Accuracy: 0.8997546287432109\n",
+      "Test Loss: 0.41971388 Test Accuracy: 0.87593424\n",
+      "\n",
+      "Epoch Number: 3\n",
+      "Train Loss: 0.22838621382435706 Train Accuracy: 0.9285217539904869\n",
+      "Test Loss: 0.37176716 Test Accuracy: 0.8943697\n",
+      "\n",
+      "Epoch Number: 4\n",
+      "Train Loss: 0.17584358977332507 Train Accuracy: 0.9436173479850978\n",
+      "Test Loss: 0.3482268 Test Accuracy: 0.9013453\n",
+      "\n",
+      "Epoch Number: 5\n",
+      "Train Loss: 0.1554100387921072 Train Accuracy: 0.9503703141865665\n",
+      "Test Loss: 0.36468038 Test Accuracy: 0.8963627\n",
+      "\n",
+      "Epoch Number: 6\n",
+      "Train Loss: 0.13128593576791353 Train Accuracy: 0.9591509887616928\n",
+      "Test Loss: 0.36238122 Test Accuracy: 0.9028401\n",
+      "\n",
+      "Epoch Number: 7\n",
+      "Train Loss: 0.11856559077150201 Train Accuracy: 0.9623016780369902\n",
+      "Test Loss: 0.37148365 Test Accuracy: 0.9003488\n",
+      "\n",
+      "Epoch Number: 8\n",
+      "Train Loss: 0.11480801381579 Train Accuracy: 0.9623016764039862\n",
+      "Test Loss: 0.40140042 Test Accuracy: 0.8938714\n",
+      "\n",
+      "Epoch Number: 9\n",
+      "Train Loss: 0.11065655635440186 Train Accuracy: 0.9653153754260442\n",
+      "Test Loss: 0.3517686 Test Accuracy: 0.90981567\n",
+      "\n",
+      "Epoch Number: 10\n",
+      "Train Loss: 0.09199796772676788 Train Accuracy: 0.9716302948455288\n",
+      "Test Loss: 0.3499246 Test Accuracy: 0.9147982\n",
+      "\n",
+      "Epoch Number: 11\n",
+      "Train Loss: 0.07985301451017596 Train Accuracy: 0.9762742788824317\n",
+      "Test Loss: 0.3625236 Test Accuracy: 0.91529644\n",
+      "\n",
+      "Epoch Number: 12\n",
+      "Train Loss: 0.07171525779397112 Train Accuracy: 0.9787535806224771\n",
+      "Test Loss: 0.35705435 Test Accuracy: 0.91429996\n",
+      "\n",
+      "Epoch Number: 13\n",
+      "Train Loss: 0.077431221046064 Train Accuracy: 0.9755893504782899\n",
+      "Test Loss: 0.38592914 Test Accuracy: 0.9093174\n",
+      "\n",
+      "Epoch Number: 14\n",
+      "Train Loss: 0.07726132686007513 Train Accuracy: 0.9744799128950459\n",
+      "Test Loss: 0.38768652 Test Accuracy: 0.9123069\n",
+      "\n",
+      "Epoch Number: 15\n",
+      "Train Loss: 0.06339540997239416 Train Accuracy: 0.9798494748873253\n",
+      "Test Loss: 0.36402556 Test Accuracy: 0.9197808\n",
+      "\n",
+      "Epoch Number: 16\n",
+      "Train Loss: 0.0624726844173282 Train Accuracy: 0.9810823528733972\n",
+      "Test Loss: 0.3556986 Test Accuracy: 0.9192825\n",
+      "\n",
+      "Epoch Number: 17\n",
+      "Train Loss: 0.05848091944082551 Train Accuracy: 0.9821376008530186\n",
+      "Test Loss: 0.3734596 Test Accuracy: 0.922272\n",
+      "\n",
+      "Epoch Number: 18\n",
+      "Train Loss: 0.06179975296613084 Train Accuracy: 0.9775207050859112\n",
+      "Test Loss: 0.37375587 Test Accuracy: 0.9147982\n",
+      "\n",
+      "Epoch Number: 19\n",
+      "Train Loss: 0.060816061236474615 Train Accuracy: 0.980534406557475\n",
+      "Test Loss: 0.36386096 Test Accuracy: 0.92077726\n",
+      "\n",
+      "Epoch Number: 20\n",
+      "Train Loss: 0.05517878877126599 Train Accuracy: 0.9829866126792072\n",
+      "Test Loss: 0.38278854 Test Accuracy: 0.92077726\n",
+      "\n",
+      "Epoch Number: 21\n",
+      "Train Loss: 0.04950164187036148 Train Accuracy: 0.9835481072125369\n",
+      "Test Loss: 0.38189712 Test Accuracy: 0.91878426\n",
+      "\n",
+      "Epoch Number: 22\n",
+      "Train Loss: 0.04603105507893105 Train Accuracy: 0.984219489848777\n",
+      "Test Loss: 0.39881724 Test Accuracy: 0.9123069\n",
+      "\n",
+      "Epoch Number: 23\n",
+      "Train Loss: 0.04120528124183519 Train Accuracy: 0.985726339359806\n",
+      "Test Loss: 0.41953668 Test Accuracy: 0.91131043\n",
+      "\n",
+      "Epoch Number: 24\n",
+      "Train Loss: 0.04223672329282312 Train Accuracy: 0.9858497748636219\n",
+      "Test Loss: 0.37599987 Test Accuracy: 0.9227703\n",
+      "\n",
+      "Epoch Number: 25\n",
+      "Train Loss: 0.044115278812457026 Train Accuracy: 0.9849044190694208\n",
+      "Test Loss: 0.39963064 Test Accuracy: 0.92127556\n",
+      "\n",
+      "Epoch Number: 26\n",
+      "Train Loss: 0.060125956299064094 Train Accuracy: 0.9792608863686862\n",
+      "Test Loss: 0.39676014 Test Accuracy: 0.91131043\n",
+      "\n",
+      "Epoch Number: 27\n",
+      "Train Loss: 0.058513890084338514 Train Accuracy: 0.9795484101935609\n",
+      "Test Loss: 0.3695973 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 28\n",
+      "Train Loss: 0.04882802803401057 Train Accuracy: 0.9824115707449717\n",
+      "Test Loss: 0.4062322 Test Accuracy: 0.9128052\n",
+      "\n",
+      "Epoch Number: 29\n",
+      "Train Loss: 0.04246805129853422 Train Accuracy: 0.9854659160522565\n",
+      "Test Loss: 0.36979795 Test Accuracy: 0.92526156\n",
+      "\n",
+      "Epoch Number: 30\n",
+      "Train Loss: 0.05128337493906283 Train Accuracy: 0.9843700242369142\n",
+      "Test Loss: 0.4025077 Test Accuracy: 0.9172895\n",
+      "\n",
+      "Epoch Number: 31\n",
+      "Train Loss: 0.04524477290895398 Train Accuracy: 0.9840825028615455\n",
+      "Test Loss: 0.36316648 Test Accuracy: 0.9227703\n",
+      "\n",
+      "Epoch Number: 32\n",
+      "Train Loss: 0.04791155387966396 Train Accuracy: 0.9839319660239023\n",
+      "Test Loss: 0.38224837 Test Accuracy: 0.9197808\n",
+      "\n",
+      "Epoch Number: 33\n",
+      "Train Loss: 0.04305804770261253 Train Accuracy: 0.98493151713724\n",
+      "Test Loss: 0.3597 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 34\n",
+      "Train Loss: 0.03439056758819888 Train Accuracy: 0.9891509944445467\n",
+      "Test Loss: 0.36144 Test Accuracy: 0.92326856\n",
+      "\n",
+      "Epoch Number: 35\n",
+      "Train Loss: 0.025825574640057063 Train Accuracy: 0.9935481017583037\n",
+      "Test Loss: 0.3576532 Test Accuracy: 0.9287494\n",
+      "\n",
+      "Epoch Number: 36\n",
+      "Train Loss: 0.020732127933775726 Train Accuracy: 0.9947809772948696\n",
+      "Test Loss: 0.3529356 Test Accuracy: 0.92825115\n",
+      "\n",
+      "Epoch Number: 37\n",
+      "Train Loss: 0.02256068464189972 Train Accuracy: 0.9938356215006685\n",
+      "Test Loss: 0.3675873 Test Accuracy: 0.93223715\n",
+      "\n",
+      "Epoch Number: 38\n",
+      "Train Loss: 0.04096006839025817 Train Accuracy: 0.9857398875772136\n",
+      "Test Loss: 0.36569017 Test Accuracy: 0.9267564\n",
+      "\n",
+      "Epoch Number: 39\n",
+      "Train Loss: 0.04014190110339694 Train Accuracy: 0.9867123389897281\n",
+      "Test Loss: 0.34677818 Test Accuracy: 0.9262581\n",
+      "\n",
+      "Epoch Number: 40\n",
+      "Train Loss: 0.031071233378136404 Train Accuracy: 0.9899864605028336\n",
+      "Test Loss: 0.363686 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 41\n",
+      "Train Loss: 0.02729316997303538 Train Accuracy: 0.9908219265611204\n",
+      "Test Loss: 0.35555694 Test Accuracy: 0.9312407\n",
+      "\n",
+      "Epoch Number: 42\n",
+      "Train Loss: 0.021803765849542026 Train Accuracy: 0.992191786635412\n",
+      "Test Loss: 0.35095477 Test Accuracy: 0.93223715\n",
+      "\n",
+      "Epoch Number: 43\n",
+      "Train Loss: 0.04842862480460373 Train Accuracy: 0.9833975695583919\n",
+      "Test Loss: 0.42905322 Test Accuracy: 0.91679126\n",
+      "\n",
+      "Epoch Number: 44\n",
+      "Train Loss: 0.04453416636264692 Train Accuracy: 0.9834111210418074\n",
+      "Test Loss: 0.406023 Test Accuracy: 0.920279\n",
+      "\n",
+      "Epoch Number: 45\n",
+      "Train Loss: 0.038877726283740914 Train Accuracy: 0.9870962010671015\n",
+      "Test Loss: 0.39293337 Test Accuracy: 0.91878426\n",
+      "\n",
+      "Epoch Number: 46\n",
+      "Train Loss: 0.034626684416315126 Train Accuracy: 0.9884796118083066\n",
+      "Test Loss: 0.36277694 Test Accuracy: 0.9237668\n",
+      "\n",
+      "Epoch Number: 47\n",
+      "Train Loss: 0.02302065390889367 Train Accuracy: 0.9934111139545702\n",
+      "Test Loss: 0.38474992 Test Accuracy: 0.9247633\n",
+      "\n",
+      "Epoch Number: 48\n",
+      "Train Loss: 0.023432086993723292 Train Accuracy: 0.9943564705652733\n",
+      "Test Loss: 0.370669 Test Accuracy: 0.9237668\n",
+      "\n",
+      "Epoch Number: 49\n",
+      "Train Loss: 0.024380253930097726 Train Accuracy: 0.9921782384180042\n",
+      "Test Loss: 0.40583202 Test Accuracy: 0.9227703\n",
+      "\n",
+      "Epoch Number: 50\n",
+      "Train Loss: 0.023330659918129854 Train Accuracy: 0.9926027467806046\n",
+      "Test Loss: 0.4097609 Test Accuracy: 0.92575985\n",
+      "\n",
+      "Epoch Number: 51\n",
+      "Train Loss: 0.018314683679108545 Train Accuracy: 0.9943835661835867\n",
+      "Test Loss: 0.38972235 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 52\n",
+      "Train Loss: 0.029633181783600315 Train Accuracy: 0.9905344043692498\n",
+      "Test Loss: 0.37864792 Test Accuracy: 0.9247633\n",
+      "\n",
+      "Epoch Number: 53\n",
+      "Train Loss: 0.030011002424058235 Train Accuracy: 0.9905479509536534\n",
+      "Test Loss: 0.3964535 Test Accuracy: 0.9192825\n",
+      "\n",
+      "Epoch Number: 54\n",
+      "Train Loss: 0.03564942483343694 Train Accuracy: 0.9888499256682722\n",
+      "Test Loss: 0.38546467 Test Accuracy: 0.92326856\n",
+      "\n",
+      "Epoch Number: 55\n",
+      "Train Loss: 0.0320119748230105 Train Accuracy: 0.9893015280161819\n",
+      "Test Loss: 0.41079342 Test Accuracy: 0.91679126\n",
+      "\n",
+      "Epoch Number: 56\n",
+      "Train Loss: 0.027233783602204225 Train Accuracy: 0.9919042677095492\n",
+      "Test Loss: 0.40080228 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 57\n",
+      "Train Loss: 0.0170260006386455 Train Accuracy: 0.9949044152481915\n",
+      "Test Loss: 0.42503983 Test Accuracy: 0.9292476\n",
+      "\n",
+      "Epoch Number: 58\n",
+      "Train Loss: 0.020110745480513736 Train Accuracy: 0.9946575393415478\n",
+      "Test Loss: 0.38848647 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 59\n",
+      "Train Loss: 0.015590530762780611 Train Accuracy: 0.9949179634655991\n",
+      "Test Loss: 0.4031199 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 60\n",
+      "Train Loss: 0.022963548624530844 Train Accuracy: 0.992863170088154\n",
+      "Test Loss: 0.42644864 Test Accuracy: 0.9197808\n",
+      "\n",
+      "Epoch Number: 61\n",
+      "Train Loss: 0.024166807283532536 Train Accuracy: 0.9914933091973606\n",
+      "Test Loss: 0.4117787 Test Accuracy: 0.9247633\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 62\n",
+      "Train Loss: 0.01902851595364715 Train Accuracy: 0.9946575393415478\n",
+      "Test Loss: 0.43569365 Test Accuracy: 0.918286\n",
+      "\n",
+      "Epoch Number: 63\n",
+      "Train Loss: 0.022098659849502402 Train Accuracy: 0.9919178142939529\n",
+      "Test Loss: 0.4453173 Test Accuracy: 0.92575985\n",
+      "\n",
+      "Epoch Number: 64\n",
+      "Train Loss: 0.02353779313932747 Train Accuracy: 0.9930001562588835\n",
+      "Test Loss: 0.43414015 Test Accuracy: 0.91429996\n",
+      "\n",
+      "Epoch Number: 65\n",
+      "Train Loss: 0.016468530626048986 Train Accuracy: 0.9947809764783676\n",
+      "Test Loss: 0.43052217 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 66\n",
+      "Train Loss: 0.016379667304086257 Train Accuracy: 0.9958904148781136\n",
+      "Test Loss: 0.4004999 Test Accuracy: 0.92825115\n",
+      "\n",
+      "Epoch Number: 67\n",
+      "Train Loss: 0.012232361819072026 Train Accuracy: 0.9971232904146795\n",
+      "Test Loss: 0.40298688 Test Accuracy: 0.93273544\n",
+      "\n",
+      "Epoch Number: 68\n",
+      "Train Loss: 0.008708359920403806 Train Accuracy: 0.998493152121975\n",
+      "Test Loss: 0.42018083 Test Accuracy: 0.9272546\n",
+      "\n",
+      "Epoch Number: 69\n",
+      "Train Loss: 0.009453040786081134 Train Accuracy: 0.9979452074390568\n",
+      "Test Loss: 0.42367473 Test Accuracy: 0.9287494\n",
+      "\n",
+      "Epoch Number: 70\n",
+      "Train Loss: 0.02633900548393634 Train Accuracy: 0.9916031989332748\n",
+      "Test Loss: 0.4314282 Test Accuracy: 0.91778773\n",
+      "\n",
+      "Epoch Number: 71\n",
+      "Train Loss: 0.05996186181277751 Train Accuracy: 0.9832605866536702\n",
+      "Test Loss: 0.40858173 Test Accuracy: 0.9227703\n",
+      "\n",
+      "Epoch Number: 72\n",
+      "Train Loss: 0.03984937108479032 Train Accuracy: 0.9866987883228145\n",
+      "Test Loss: 0.41035435 Test Accuracy: 0.9247633\n",
+      "\n",
+      "Epoch Number: 73\n",
+      "Train Loss: 0.024671344705283232 Train Accuracy: 0.991356323026631\n",
+      "Test Loss: 0.42347214 Test Accuracy: 0.92575985\n",
+      "\n",
+      "Epoch Number: 74\n",
+      "Train Loss: 0.0261542204694108 Train Accuracy: 0.9923287736226435\n",
+      "Test Loss: 0.40737543 Test Accuracy: 0.92127556\n",
+      "\n",
+      "Epoch Number: 75\n",
+      "Train Loss: 0.021734511994833304 Train Accuracy: 0.9932741294168446\n",
+      "Test Loss: 0.38865966 Test Accuracy: 0.93024415\n",
+      "\n",
+      "Epoch Number: 76\n",
+      "Train Loss: 0.017603178070028862 Train Accuracy: 0.9942330326119514\n",
+      "Test Loss: 0.41292053 Test Accuracy: 0.9272546\n",
+      "\n",
+      "Epoch Number: 77\n",
+      "Train Loss: 0.01774944902368987 Train Accuracy: 0.9949179634655991\n",
+      "Test Loss: 0.3975856 Test Accuracy: 0.9287494\n",
+      "\n",
+      "Epoch Number: 78\n",
+      "Train Loss: 0.026556726039565895 Train Accuracy: 0.9926027451476006\n",
+      "Test Loss: 0.37759724 Test Accuracy: 0.9262581\n",
+      "\n",
+      "Epoch Number: 79\n",
+      "Train Loss: 0.03763930009971436 Train Accuracy: 0.9888905711369972\n",
+      "Test Loss: 0.44735578 Test Accuracy: 0.9172895\n",
+      "\n",
+      "Epoch Number: 80\n",
+      "Train Loss: 0.029481777800119496 Train Accuracy: 0.9901369957074727\n",
+      "Test Loss: 0.41876832 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 81\n",
+      "Train Loss: 0.02179017917999411 Train Accuracy: 0.9924522115759653\n",
+      "Test Loss: 0.41835007 Test Accuracy: 0.920279\n",
+      "\n",
+      "Epoch Number: 82\n",
+      "Train Loss: 0.0234184642127007 Train Accuracy: 0.9921646918336006\n",
+      "Test Loss: 0.41416502 Test Accuracy: 0.9272546\n",
+      "\n",
+      "Epoch Number: 83\n",
+      "Train Loss: 0.02082834580166852 Train Accuracy: 0.993260580382935\n",
+      "Test Loss: 0.4422068 Test Accuracy: 0.9172895\n",
+      "\n",
+      "Epoch Number: 84\n",
+      "Train Loss: 0.022050149352465156 Train Accuracy: 0.9939726084879\n",
+      "Test Loss: 0.3987477 Test Accuracy: 0.92825115\n",
+      "\n",
+      "Epoch Number: 85\n",
+      "Train Loss: 0.026048276352549405 Train Accuracy: 0.9927261847339265\n",
+      "Test Loss: 0.38845396 Test Accuracy: 0.9272546\n",
+      "\n",
+      "Epoch Number: 86\n",
+      "Train Loss: 0.01715031830029409 Train Accuracy: 0.9952054840244658\n",
+      "Test Loss: 0.3792558 Test Accuracy: 0.9267564\n",
+      "\n",
+      "Epoch Number: 87\n",
+      "Train Loss: 0.014544817494636732 Train Accuracy: 0.9964248113436242\n",
+      "Test Loss: 0.41980278 Test Accuracy: 0.9242651\n",
+      "\n",
+      "Epoch Number: 88\n",
+      "Train Loss: 0.006491333439193462 Train Accuracy: 0.9987671244634341\n",
+      "Test Loss: 0.39655796 Test Accuracy: 0.9317389\n",
+      "\n",
+      "Epoch Number: 89\n",
+      "Train Loss: 0.004307456604007325 Train Accuracy: 0.9993150691463523\n",
+      "Test Loss: 0.40233433 Test Accuracy: 0.92825115\n",
+      "\n",
+      "Epoch Number: 90\n",
+      "Train Loss: 0.0027800448436596215 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.403938 Test Accuracy: 0.9287494\n",
+      "\n",
+      "Epoch Number: 91\n",
+      "Train Loss: 0.002242555237002033 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.4070075 Test Accuracy: 0.9307424\n",
+      "\n",
+      "Epoch Number: 92\n",
+      "Train Loss: 0.0022119151863703277 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.41036773 Test Accuracy: 0.9307424\n",
+      "\n",
+      "Epoch Number: 93\n",
+      "Train Loss: 0.001824945211809205 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.41361076 Test Accuracy: 0.9317389\n",
+      "\n",
+      "Epoch Number: 94\n",
+      "Train Loss: 0.001808816738895536 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.41818038 Test Accuracy: 0.93223715\n",
+      "\n",
+      "Epoch Number: 95\n",
+      "Train Loss: 0.0015898832340871482 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.4229942 Test Accuracy: 0.9312407\n",
+      "\n",
+      "Epoch Number: 96\n",
+      "Train Loss: 0.001751650427787067 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.42656386 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 97\n",
+      "Train Loss: 0.0015788370674023125 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.43016008 Test Accuracy: 0.93223715\n",
+      "\n",
+      "Epoch Number: 98\n",
+      "Train Loss: 0.0016806908206988688 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.43488127 Test Accuracy: 0.9297459\n",
+      "\n",
+      "Epoch Number: 99\n",
+      "Train Loss: 0.0015810940553009996 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.43672302 Test Accuracy: 0.9307424\n",
+      "\n",
+      "Epoch Number: 100\n",
+      "Train Loss: 0.001932646346052037 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.4472658 Test Accuracy: 0.9292476\n",
+      "\n",
+      "Epoch Number: 101\n",
+      "Train Loss: 0.03748205324996914 Train Accuracy: 0.9887535817002597\n",
+      "Test Loss: 0.44853446 Test Accuracy: 0.9147982\n",
+      "\n",
+      "Epoch Number: 102\n",
+      "Train Loss: 0.11106032654898215 Train Accuracy: 0.9678774721001926\n",
+      "Test Loss: 0.36246482 Test Accuracy: 0.9227703\n",
+      "\n",
+      "Epoch Number: 103\n",
+      "Train Loss: 0.05241849743569755 Train Accuracy: 0.9829866134957091\n",
+      "Test Loss: 0.3570766 Test Accuracy: 0.92127556\n",
+      "\n",
+      "Epoch Number: 104\n",
+      "Train Loss: 0.028517094278095723 Train Accuracy: 0.9917672799058157\n",
+      "Test Loss: 0.37065578 Test Accuracy: 0.9292476\n",
+      "\n",
+      "Epoch Number: 105\n",
+      "Train Loss: 0.017652338973488915 Train Accuracy: 0.9942330326119514\n",
+      "Test Loss: 0.37107578 Test Accuracy: 0.93223715\n",
+      "\n",
+      "Epoch Number: 106\n",
+      "Train Loss: 0.01568616884250245 Train Accuracy: 0.9952054840244658\n",
+      "Test Loss: 0.35663217 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 107\n",
+      "Train Loss: 0.017838217169748084 Train Accuracy: 0.9951919358070582\n",
+      "Test Loss: 0.41287652 Test Accuracy: 0.9247633\n",
+      "\n",
+      "Epoch Number: 108\n",
+      "Train Loss: 0.010033470020734717 Train Accuracy: 0.9976712350975977\n",
+      "Test Loss: 0.4009026 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 109\n",
+      "Train Loss: 0.008324489026961925 Train Accuracy: 0.9980686453923787\n",
+      "Test Loss: 0.4033402 Test Accuracy: 0.9297459\n",
+      "\n",
+      "Epoch Number: 110\n",
+      "Train Loss: 0.00771069963668371 Train Accuracy: 0.9986165900752969\n",
+      "Test Loss: 0.41503954 Test Accuracy: 0.93223715\n",
+      "\n",
+      "Epoch Number: 111\n",
+      "Train Loss: 0.017172634716413608 Train Accuracy: 0.9951919358070582\n",
+      "Test Loss: 0.43192545 Test Accuracy: 0.9237668\n",
+      "\n",
+      "Epoch Number: 112\n",
+      "Train Loss: 0.03227482749984842 Train Accuracy: 0.9877946801381569\n",
+      "Test Loss: 0.44748837 Test Accuracy: 0.91579473\n",
+      "\n",
+      "Epoch Number: 113\n",
+      "Train Loss: 0.03210181032135215 Train Accuracy: 0.9895890493915506\n",
+      "Test Loss: 0.4282059 Test Accuracy: 0.92127556\n",
+      "\n",
+      "Epoch Number: 114\n",
+      "Train Loss: 0.01357129268182365 Train Accuracy: 0.9964383595610318\n",
+      "Test Loss: 0.394982 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 115\n",
+      "Train Loss: 0.019958787539508194 Train Accuracy: 0.9939455112365827\n",
+      "Test Loss: 0.44919127 Test Accuracy: 0.92326856\n",
+      "\n",
+      "Epoch Number: 116\n",
+      "Train Loss: 0.01951833989204877 Train Accuracy: 0.9941095938421276\n",
+      "Test Loss: 0.39846456 Test Accuracy: 0.9262581\n",
+      "\n",
+      "Epoch Number: 117\n",
+      "Train Loss: 0.013109919135873397 Train Accuracy: 0.9961643872195727\n",
+      "Test Loss: 0.3964593 Test Accuracy: 0.93273544\n",
+      "\n",
+      "Epoch Number: 118\n",
+      "Train Loss: 0.008196171877063708 Train Accuracy: 0.9978082212683272\n",
+      "Test Loss: 0.39881837 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 119\n",
+      "Train Loss: 0.0053880620705544285 Train Accuracy: 0.9991780829756227\n",
+      "Test Loss: 0.3994898 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 120\n",
+      "Train Loss: 0.004016872985792436 Train Accuracy: 0.9990410968048932\n",
+      "Test Loss: 0.40483266 Test Accuracy: 0.93821627\n",
+      "\n",
+      "Epoch Number: 121\n",
+      "Train Loss: 0.003189845641752807 Train Accuracy: 0.9994520553170818\n",
+      "Test Loss: 0.4129637 Test Accuracy: 0.9317389\n",
+      "\n",
+      "Epoch Number: 122\n",
+      "Train Loss: 0.0019790745577105157 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.41090903 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 123\n",
+      "Train Loss: 0.001803544777754873 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41632676 Test Accuracy: 0.9362232\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 124\n",
+      "Train Loss: 0.0015932139348516189 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.4208521 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 125\n",
+      "Train Loss: 0.0016064488660697252 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.4273708 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 126\n",
+      "Train Loss: 0.0015048502509048438 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.43023735 Test Accuracy: 0.9337319\n",
+      "\n",
+      "Epoch Number: 127\n",
+      "Train Loss: 0.0014419755101050824 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.4389877 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 128\n",
+      "Train Loss: 0.0013684726919028398 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.44143116 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 129\n",
+      "Train Loss: 0.0013124902181690943 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.44684827 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 130\n",
+      "Train Loss: 0.001271455863264249 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.44386175 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 131\n",
+      "Train Loss: 0.0013829727382247642 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.45779392 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 132\n",
+      "Train Loss: 0.0019963769629288285 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.45586306 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 133\n",
+      "Train Loss: 0.0016972724874966382 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.4626169 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 134\n",
+      "Train Loss: 0.08464051221679954 Train Accuracy: 0.9767951271305345\n",
+      "Test Loss: 0.5116203 Test Accuracy: 0.89287496\n",
+      "\n",
+      "Epoch Number: 135\n",
+      "Train Loss: 0.06996250499282287 Train Accuracy: 0.9749044163586342\n",
+      "Test Loss: 0.40689448 Test Accuracy: 0.9242651\n",
+      "\n",
+      "Epoch Number: 136\n",
+      "Train Loss: 0.03188256850970067 Train Accuracy: 0.9891509952610487\n",
+      "Test Loss: 0.38704544 Test Accuracy: 0.9262581\n",
+      "\n",
+      "Epoch Number: 137\n",
+      "Train Loss: 0.024611471771032945 Train Accuracy: 0.9905479534031594\n",
+      "Test Loss: 0.37838554 Test Accuracy: 0.92526156\n",
+      "\n",
+      "Epoch Number: 138\n",
+      "Train Loss: 0.010400510262315199 Train Accuracy: 0.9975342489268682\n",
+      "Test Loss: 0.37540606 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 139\n",
+      "Train Loss: 0.01360704386503155 Train Accuracy: 0.9961643864030707\n",
+      "Test Loss: 0.3835624 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 140\n",
+      "Train Loss: 0.011857587645589437 Train Accuracy: 0.9971097421972719\n",
+      "Test Loss: 0.38600543 Test Accuracy: 0.9337319\n",
+      "\n",
+      "Epoch Number: 141\n",
+      "Train Loss: 0.009779149474263548 Train Accuracy: 0.9979452074390568\n",
+      "Test Loss: 0.3920699 Test Accuracy: 0.9292476\n",
+      "\n",
+      "Epoch Number: 142\n",
+      "Train Loss: 0.011075850638356826 Train Accuracy: 0.9968493180732204\n",
+      "Test Loss: 0.39335945 Test Accuracy: 0.9337319\n",
+      "\n",
+      "Epoch Number: 143\n",
+      "Train Loss: 0.007224232131018852 Train Accuracy: 0.9982191797805159\n",
+      "Test Loss: 0.39289063 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 144\n",
+      "Train Loss: 0.00687252213119542 Train Accuracy: 0.9986165900752969\n",
+      "Test Loss: 0.41478387 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 145\n",
+      "Train Loss: 0.0036116211602547246 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.38912663 Test Accuracy: 0.9412058\n",
+      "\n",
+      "Epoch Number: 146\n",
+      "Train Loss: 0.002582093849076494 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38982612 Test Accuracy: 0.93921274\n",
+      "\n",
+      "Epoch Number: 147\n",
+      "Train Loss: 0.0020956868141943793 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.3942846 Test Accuracy: 0.9407075\n",
+      "\n",
+      "Epoch Number: 148\n",
+      "Train Loss: 0.0018172568598943954 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.40042797 Test Accuracy: 0.93971103\n",
+      "\n",
+      "Epoch Number: 149\n",
+      "Train Loss: 0.004224563434110852 Train Accuracy: 0.9993150691463523\n",
+      "Test Loss: 0.41389707 Test Accuracy: 0.9387145\n",
+      "\n",
+      "Epoch Number: 150\n",
+      "Train Loss: 0.0033937980600693327 Train Accuracy: 0.9991780829756227\n",
+      "Test Loss: 0.4309551 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 151\n",
+      "Train Loss: 0.016683471513902513 Train Accuracy: 0.995753428707384\n",
+      "Test Loss: 0.5002462 Test Accuracy: 0.92326856\n",
+      "\n",
+      "Epoch Number: 152\n",
+      "Train Loss: 0.07921012418587016 Train Accuracy: 0.9773566224803664\n",
+      "Test Loss: 0.49312237 Test Accuracy: 0.91380167\n",
+      "\n",
+      "Epoch Number: 153\n",
+      "Train Loss: 0.06044871790321824 Train Accuracy: 0.9809589198190872\n",
+      "Test Loss: 0.39408478 Test Accuracy: 0.918286\n",
+      "\n",
+      "Epoch Number: 154\n",
+      "Train Loss: 0.03367851439015047 Train Accuracy: 0.9889041193544048\n",
+      "Test Loss: 0.37989596 Test Accuracy: 0.9307424\n",
+      "\n",
+      "Epoch Number: 155\n",
+      "Train Loss: 0.017698209322925196 Train Accuracy: 0.9939319638356771\n",
+      "Test Loss: 0.37573016 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 156\n",
+      "Train Loss: 0.010081476129253383 Train Accuracy: 0.9975342489268682\n",
+      "Test Loss: 0.38967404 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 157\n",
+      "Train Loss: 0.00447188632057227 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.38149583 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 158\n",
+      "Train Loss: 0.0025207580036120105 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38982794 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 159\n",
+      "Train Loss: 0.0020615914665092394 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38763282 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 160\n",
+      "Train Loss: 0.0017625982677787286 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.39443132 Test Accuracy: 0.93821627\n",
+      "\n",
+      "Epoch Number: 161\n",
+      "Train Loss: 0.0015912198259061432 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.39672822 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 162\n",
+      "Train Loss: 0.0015430196065994693 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.40416223 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 163\n",
+      "Train Loss: 0.0014482194389143245 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.40405902 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 164\n",
+      "Train Loss: 0.001342231735877361 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41373003 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 165\n",
+      "Train Loss: 0.001386067051797697 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41112974 Test Accuracy: 0.9387145\n",
+      "\n",
+      "Epoch Number: 166\n",
+      "Train Loss: 0.001193268498392259 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.42294186 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 167\n",
+      "Train Loss: 0.0012895042981317727 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41742194 Test Accuracy: 0.9387145\n",
+      "\n",
+      "Epoch Number: 168\n",
+      "Train Loss: 0.001108311818077784 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.4331248 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 169\n",
+      "Train Loss: 0.001382393570304274 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.42247245 Test Accuracy: 0.93971103\n",
+      "\n",
+      "Epoch Number: 170\n",
+      "Train Loss: 0.0012338795040954184 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.45070615 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 171\n",
+      "Train Loss: 0.002104504165289069 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.4341877 Test Accuracy: 0.93921274\n",
+      "\n",
+      "Epoch Number: 172\n",
+      "Train Loss: 0.004313505504460534 Train Accuracy: 0.9984796039045674\n",
+      "Test Loss: 0.46030065 Test Accuracy: 0.9297459\n",
+      "\n",
+      "Epoch Number: 173\n",
+      "Train Loss: 0.093728030188175 Train Accuracy: 0.9727126351774555\n",
+      "Test Loss: 0.42030093 Test Accuracy: 0.91629297\n",
+      "\n",
+      "Epoch Number: 174\n",
+      "Train Loss: 0.05609176247635831 Train Accuracy: 0.9824522162136966\n",
+      "Test Loss: 0.41325808 Test Accuracy: 0.92127556\n",
+      "\n",
+      "Epoch Number: 175\n",
+      "Train Loss: 0.02312381442338037 Train Accuracy: 0.9921782392345063\n",
+      "Test Loss: 0.4045683 Test Accuracy: 0.9242651\n",
+      "\n",
+      "Epoch Number: 176\n",
+      "Train Loss: 0.014690037730647481 Train Accuracy: 0.9953424693786934\n",
+      "Test Loss: 0.3937127 Test Accuracy: 0.9307424\n",
+      "\n",
+      "Epoch Number: 177\n",
+      "Train Loss: 0.00893136704809428 Train Accuracy: 0.9982191797805159\n",
+      "Test Loss: 0.39721966 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 178\n",
+      "Train Loss: 0.007236258619168314 Train Accuracy: 0.9989041106341636\n",
+      "Test Loss: 0.38853252 Test Accuracy: 0.92775285\n",
+      "\n",
+      "Epoch Number: 179\n",
+      "Train Loss: 0.004083093933518721 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.37919173 Test Accuracy: 0.9312407\n",
+      "\n",
+      "Epoch Number: 180\n",
+      "Train Loss: 0.002494418898705801 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.39079925 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 181\n",
+      "Train Loss: 0.001869901260624206 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.3961389 Test Accuracy: 0.9312407\n",
+      "\n",
+      "Epoch Number: 182\n",
+      "Train Loss: 0.0017469667874225607 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.4000474 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 183\n",
+      "Train Loss: 0.0012739899573043908 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.40910247 Test Accuracy: 0.93273544\n",
+      "\n",
+      "Epoch Number: 184\n",
+      "Train Loss: 0.0013601894353672684 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41040978 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 185\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train Loss: 0.0011997123000495875 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41469583 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 186\n",
+      "Train Loss: 0.0014707065065397741 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.41498712 Test Accuracy: 0.94020927\n",
+      "\n",
+      "Epoch Number: 187\n",
+      "Train Loss: 0.002836034407166203 Train Accuracy: 0.9993150691463523\n",
+      "Test Loss: 0.45857498 Test Accuracy: 0.9247633\n",
+      "\n",
+      "Epoch Number: 188\n",
+      "Train Loss: 0.04335543119080671 Train Accuracy: 0.9872467313727288\n",
+      "Test Loss: 0.4854505 Test Accuracy: 0.91081214\n",
+      "\n",
+      "Epoch Number: 189\n",
+      "Train Loss: 0.0711721541576904 Train Accuracy: 0.9769863102534045\n",
+      "Test Loss: 0.38205332 Test Accuracy: 0.92575985\n",
+      "\n",
+      "Epoch Number: 190\n",
+      "Train Loss: 0.028694037625515093 Train Accuracy: 0.9897260347457781\n",
+      "Test Loss: 0.3982458 Test Accuracy: 0.9217738\n",
+      "\n",
+      "Epoch Number: 191\n",
+      "Train Loss: 0.020338214381298125 Train Accuracy: 0.9945205531708182\n",
+      "Test Loss: 0.39259014 Test Accuracy: 0.9317389\n",
+      "\n",
+      "Epoch Number: 192\n",
+      "Train Loss: 0.015414321870058265 Train Accuracy: 0.995753428707384\n",
+      "Test Loss: 0.44486946 Test Accuracy: 0.9247633\n",
+      "\n",
+      "Epoch Number: 193\n",
+      "Train Loss: 0.018497141398655326 Train Accuracy: 0.9928767191220637\n",
+      "Test Loss: 0.39597595 Test Accuracy: 0.9242651\n",
+      "\n",
+      "Epoch Number: 194\n",
+      "Train Loss: 0.015433995104203485 Train Accuracy: 0.9949315116830069\n",
+      "Test Loss: 0.38098228 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 195\n",
+      "Train Loss: 0.006958263319712101 Train Accuracy: 0.9980821936097863\n",
+      "Test Loss: 0.3999225 Test Accuracy: 0.93273544\n",
+      "\n",
+      "Epoch Number: 196\n",
+      "Train Loss: 0.005096633062213149 Train Accuracy: 0.9991780829756227\n",
+      "Test Loss: 0.37073624 Test Accuracy: 0.937718\n",
+      "\n",
+      "Epoch Number: 197\n",
+      "Train Loss: 0.00343738832432866 Train Accuracy: 0.9995890414878114\n",
+      "Test Loss: 0.37505808 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 198\n",
+      "Train Loss: 0.0018556024091818996 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.37871873 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 199\n",
+      "Train Loss: 0.0015186609035319559 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.3814098 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 200\n",
+      "Train Loss: 0.0010581495521800619 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38004857 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 201\n",
+      "Train Loss: 0.0009942382828440925 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.37956578 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 202\n",
+      "Train Loss: 0.0009491439842561592 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.37938344 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 203\n",
+      "Train Loss: 0.0009144399371333038 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.37940732 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 204\n",
+      "Train Loss: 0.0008879157705376027 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.37957668 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 205\n",
+      "Train Loss: 0.0008668735051648819 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.3798471 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 206\n",
+      "Train Loss: 0.0008491342837447046 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38018626 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 207\n",
+      "Train Loss: 0.0008333472782994735 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38057283 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 208\n",
+      "Train Loss: 0.0008187590400953076 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38099325 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 209\n",
+      "Train Loss: 0.0008049521249183135 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.3814406 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 210\n",
+      "Train Loss: 0.0007916802561551664 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38191074 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 211\n",
+      "Train Loss: 0.0007788011856450482 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.38240153 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 212\n",
+      "Train Loss: 0.0007662154095843817 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.3829116 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 213\n",
+      "Train Loss: 0.0007538625193849104 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38343957 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 214\n",
+      "Train Loss: 0.0007416940372347934 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38398492 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 215\n",
+      "Train Loss: 0.0007296792061661357 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3845468 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 216\n",
+      "Train Loss: 0.0007177960753125101 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38512433 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 217\n",
+      "Train Loss: 0.0007060262115834893 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3857176 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 218\n",
+      "Train Loss: 0.0006943566685587117 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38632545 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 219\n",
+      "Train Loss: 0.0006827760203932859 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38694763 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 220\n",
+      "Train Loss: 0.0006712808288483918 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38758373 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 221\n",
+      "Train Loss: 0.0006598622694831622 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38823336 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 222\n",
+      "Train Loss: 0.0006485197959030812 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3888961 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 223\n",
+      "Train Loss: 0.0006372484367353561 Train Accuracy: 1.0\n",
+      "Test Loss: 0.38957155 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 224\n",
+      "Train Loss: 0.0006260495690297183 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3902598 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 225\n",
+      "Train Loss: 0.0006149213456896402 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39096028 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 226\n",
+      "Train Loss: 0.0006038654383913014 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39167294 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 227\n",
+      "Train Loss: 0.0005928805896897532 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3923978 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 228\n",
+      "Train Loss: 0.0005819705751093028 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3931346 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 229\n",
+      "Train Loss: 0.0005711371570264232 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39388332 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 230\n",
+      "Train Loss: 0.0005603803631495556 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39464444 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 231\n",
+      "Train Loss: 0.0005497033376890962 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3954174 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 232\n",
+      "Train Loss: 0.0005391083086828051 Train Accuracy: 1.0\n",
+      "Test Loss: 0.3962025 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 233\n",
+      "Train Loss: 0.0005285999931439706 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39699972 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 234\n",
+      "Train Loss: 0.0005181774882558249 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39780933 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 235\n",
+      "Train Loss: 0.0005078478582755165 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39863133 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 236\n",
+      "Train Loss: 0.000497609365046541 Train Accuracy: 1.0\n",
+      "Test Loss: 0.39946586 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 237\n",
+      "Train Loss: 0.0004874655870443261 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40031332 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 238\n",
+      "Train Loss: 0.0004774212549370395 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40117365 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 239\n",
+      "Train Loss: 0.0004674753974341749 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40204704 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 240\n",
+      "Train Loss: 0.0004576308943198879 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40293333 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 241\n",
+      "Train Loss: 0.0004478914650039084 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40383312 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 242\n",
+      "Train Loss: 0.00043825685871487534 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40474612 Test Accuracy: 0.9337319\n",
+      "\n",
+      "Epoch Number: 243\n",
+      "Train Loss: 0.0004287295363211928 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40567285 Test Accuracy: 0.9337319\n",
+      "\n",
+      "Epoch Number: 244\n",
+      "Train Loss: 0.00041931199118389734 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4066128 Test Accuracy: 0.9337319\n",
+      "\n",
+      "Epoch Number: 245\n",
+      "Train Loss: 0.0004100034349081298 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40756655 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 246\n",
+      "Train Loss: 0.000400808029740328 Train Accuracy: 1.0\n",
+      "Test Loss: 0.408534 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 247\n",
+      "Train Loss: 0.00039172325889685205 Train Accuracy: 1.0\n",
+      "Test Loss: 0.40951535 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 248\n",
+      "Train Loss: 0.00038275338309874424 Train Accuracy: 1.0\n",
+      "Test Loss: 0.41051057 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 249\n",
+      "Train Loss: 0.00037389971720124915 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4115201 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 250\n",
+      "Train Loss: 0.0003651603550031424 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4125443 Test Accuracy: 0.9352267\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "Epoch Number: 251\n",
+      "Train Loss: 0.0003565376773372943 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4135833 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 252\n",
+      "Train Loss: 0.0003480334934253845 Train Accuracy: 1.0\n",
+      "Test Loss: 0.41463742 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 253\n",
+      "Train Loss: 0.0003396494167359316 Train Accuracy: 1.0\n",
+      "Test Loss: 0.41570726 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 254\n",
+      "Train Loss: 0.0003313843925540935 Train Accuracy: 1.0\n",
+      "Test Loss: 0.41679344 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 255\n",
+      "Train Loss: 0.00032324064602718165 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4178963 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 256\n",
+      "Train Loss: 0.00031521800582134485 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4190162 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 257\n",
+      "Train Loss: 0.0003073199977859064 Train Accuracy: 1.0\n",
+      "Test Loss: 0.42015436 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 258\n",
+      "Train Loss: 0.00029954845147535896 Train Accuracy: 1.0\n",
+      "Test Loss: 0.42131123 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 259\n",
+      "Train Loss: 0.00029190532083316924 Train Accuracy: 1.0\n",
+      "Test Loss: 0.42248726 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 260\n",
+      "Train Loss: 0.0002843899977686879 Train Accuracy: 1.0\n",
+      "Test Loss: 0.42368302 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 261\n",
+      "Train Loss: 0.0002770060498932415 Train Accuracy: 1.0\n",
+      "Test Loss: 0.42489874 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 262\n",
+      "Train Loss: 0.00026975655114941605 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4261356 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 263\n",
+      "Train Loss: 0.0002626420347752011 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4273931 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 264\n",
+      "Train Loss: 0.0002556652377267075 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4286726 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 265\n",
+      "Train Loss: 0.0002488270239491488 Train Accuracy: 1.0\n",
+      "Test Loss: 0.42997336 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 266\n",
+      "Train Loss: 0.00024212878793462064 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4312961 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 267\n",
+      "Train Loss: 0.00023557050247303505 Train Accuracy: 1.0\n",
+      "Test Loss: 0.43264046 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 268\n",
+      "Train Loss: 0.000229153466771344 Train Accuracy: 1.0\n",
+      "Test Loss: 0.43400633 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 269\n",
+      "Train Loss: 0.00022287752135650396 Train Accuracy: 1.0\n",
+      "Test Loss: 0.43539396 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 270\n",
+      "Train Loss: 0.00021674406400974203 Train Accuracy: 1.0\n",
+      "Test Loss: 0.43680328 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 271\n",
+      "Train Loss: 0.0002107478593307075 Train Accuracy: 1.0\n",
+      "Test Loss: 0.43823338 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 272\n",
+      "Train Loss: 0.00020489141274058605 Train Accuracy: 1.0\n",
+      "Test Loss: 0.43968424 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 273\n",
+      "Train Loss: 0.00019917354314214844 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4411549 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 274\n",
+      "Train Loss: 0.00019359330177045594 Train Accuracy: 1.0\n",
+      "Test Loss: 0.44264516 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 275\n",
+      "Train Loss: 0.00018814832334126 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4441542 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 276\n",
+      "Train Loss: 0.00018283885997147554 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4456814 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 277\n",
+      "Train Loss: 0.00017766496358951234 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4472254 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 278\n",
+      "Train Loss: 0.00017262536239784771 Train Accuracy: 1.0\n",
+      "Test Loss: 0.44878626 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 279\n",
+      "Train Loss: 0.00016772304865896818 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4503633 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 280\n",
+      "Train Loss: 0.00016296197600863645 Train Accuracy: 1.0\n",
+      "Test Loss: 0.451956 Test Accuracy: 0.93721974\n",
+      "\n",
+      "Epoch Number: 281\n",
+      "Train Loss: 0.00015834992364158995 Train Accuracy: 1.0\n",
+      "Test Loss: 0.45356566 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 282\n",
+      "Train Loss: 0.00015390685763272253 Train Accuracy: 1.0\n",
+      "Test Loss: 0.45519423 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 283\n",
+      "Train Loss: 0.0001496767374241967 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4568483 Test Accuracy: 0.9367215\n",
+      "\n",
+      "Epoch Number: 284\n",
+      "Train Loss: 0.00014578019630017192 Train Accuracy: 1.0\n",
+      "Test Loss: 0.45854405 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 285\n",
+      "Train Loss: 0.00014262170305001957 Train Accuracy: 1.0\n",
+      "Test Loss: 0.46033803 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 286\n",
+      "Train Loss: 0.00014241321601630636 Train Accuracy: 1.0\n",
+      "Test Loss: 0.46250543 Test Accuracy: 0.9362232\n",
+      "\n",
+      "Epoch Number: 287\n",
+      "Train Loss: 0.00039112994681377196 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.47080484 Test Accuracy: 0.937718\n",
+      "\n",
+      "Epoch Number: 288\n",
+      "Train Loss: 0.002160546216233243 Train Accuracy: 0.9993150691463523\n",
+      "Test Loss: 0.4985566 Test Accuracy: 0.9332337\n",
+      "\n",
+      "Epoch Number: 289\n",
+      "Train Loss: 0.0015827084215531725 Train Accuracy: 0.9997260276585409\n",
+      "Test Loss: 0.460808 Test Accuracy: 0.9342302\n",
+      "\n",
+      "Epoch Number: 290\n",
+      "Train Loss: 0.0013418768471824914 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.4653932 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 291\n",
+      "Train Loss: 0.00042524092328078463 Train Accuracy: 0.9998630138292705\n",
+      "Test Loss: 0.46051893 Test Accuracy: 0.93472844\n",
+      "\n",
+      "Epoch Number: 292\n",
+      "Train Loss: 0.0002157161274326053 Train Accuracy: 1.0\n",
+      "Test Loss: 0.45942155 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 293\n",
+      "Train Loss: 0.00018626744945135688 Train Accuracy: 1.0\n",
+      "Test Loss: 0.45938873 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 294\n",
+      "Train Loss: 0.000173948956017577 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4597626 Test Accuracy: 0.935725\n",
+      "\n",
+      "Epoch Number: 295\n",
+      "Train Loss: 0.00016522929556779436 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4602842 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 296\n",
+      "Train Loss: 0.00015823588222552295 Train Accuracy: 1.0\n",
+      "Test Loss: 0.46087208 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 297\n",
+      "Train Loss: 0.00015232920922655517 Train Accuracy: 1.0\n",
+      "Test Loss: 0.4614972 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 298\n",
+      "Train Loss: 0.00014718409047792156 Train Accuracy: 1.0\n",
+      "Test Loss: 0.46214733 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Epoch Number: 299\n",
+      "Train Loss: 0.00014260701286667889 Train Accuracy: 1.0\n",
+      "Test Loss: 0.46281824 Test Accuracy: 0.9352267\n",
+      "\n",
+      "Maximum Test accuracy at compressed model size(including early stopping): 0.9412058 at Epoch: 146\n",
+      "Final Test Accuracy: 0.9352267\n",
+      "\n",
+      "\n",
+      "Non-Zeros: 1932 Model Size: 7.546875 KB hasSparse: False\n",
+      "\n",
+      "The Model Directory: usps10\\FastGRNNResults/23_51_17_15_03_19\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "FastCellTrainer.train(batchSize, totalEpochs, sess, Xtrain, Xtest,\n",
+    "                      Ytrain, Ytest, decayStep, decayRate, dataDir, currDir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Model Quantization\n",
+    "\n",
+    "Byte Quantization for the trained FastModels, to reduce the model size by 4x. If one uses piece-wise linear approximations for non-linearities like quantTanh for tanh and quantSigm for Sigmoid, they can benefit greatly from pure integer arithmetic after model quantization during prediction"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Bg.npy has max: 4.9833384 min: -0.6077357\n",
+      "Bh.npy has max: 2.8973198 min: -0.16004847\n",
+      "FC.npy has max: 4.9540076 min: -5.963999\n",
+      "FCbias.npy has max: 2.540496 min: -1.7358814\n",
+      "U.npy has max: 2.2965062 min: -2.670992\n",
+      "W.npy has max: 1.3919494 min: -1.2454427\n",
+      "\n",
+      "\n",
+      "Quantized Model Dir: usps10\\FastGRNNResults/23_51_17_15_03_19\\QuantizedFastModel\n"
+     ]
+    }
+   ],
+   "source": [
+    "#Model quantization\n",
+    "model_dir = currDir #you will see model dir printed at the end of trianing, use that here or use the currDir\n",
+    "\n",
+    "import quantizeFastModels\n",
+    "quantizeFastModels.quantizeFastModels(model_dir)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.5.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/tf2.0/examples/FastCells/fastcell_example.py b/tf2.0/examples/FastCells/fastcell_example.py
new file mode 100644
index 000000000..1d5468101
--- /dev/null
+++ b/tf2.0/examples/FastCells/fastcell_example.py
@@ -0,0 +1,99 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+import helpermethods
+import tensorflow as tf
+import numpy as np
+import sys
+
+from edgeml.trainer.fastTrainer import FastTrainer
+from edgeml.graph.rnn import FastGRNNCell
+from edgeml.graph.rnn import FastRNNCell
+from edgeml.graph.rnn import UGRNNLRCell
+from edgeml.graph.rnn import GRULRCell
+from edgeml.graph.rnn import LSTMLRCell
+
+tf.compat.v1.disable_eager_execution()
+
+def main():
+    # Fixing seeds for reproducibility
+    tf.compat.v1.set_random_seed(42)
+    np.random.seed(42)
+
+    # Hyper Param pre-processing
+    args = helpermethods.getArgs()
+
+    dataDir = args.data_dir
+    cell = args.cell
+    inputDims = args.input_dim
+    hiddenDims = args.hidden_dim
+
+    totalEpochs = args.epochs
+    learningRate = args.learning_rate
+    outFile = args.output_file
+    batchSize = args.batch_size
+    decayStep = args.decay_step
+    decayRate = args.decay_rate
+
+    wRank = args.wRank
+    uRank = args.uRank
+
+    sW = args.sW
+    sU = args.sU
+
+    update_non_linearity = args.update_nl
+    gate_non_linearity = args.gate_nl
+
+    (dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest,
+     mean, std) = helpermethods.preProcessData(dataDir)
+
+    assert dataDimension % inputDims == 0, "Infeasible per step input, " + \
+        "Timesteps have to be integer"
+
+    X = tf.compat.v1.placeholder(
+        "float", [None, int(dataDimension / inputDims), inputDims])
+    Y = tf.compat.v1.placeholder("float", [None, numClasses])
+
+    currDir = helpermethods.createTimeStampDir(dataDir, cell)
+
+    helpermethods.dumpCommand(sys.argv, currDir)
+    helpermethods.saveMeanStd(mean, std, currDir)
+
+    if cell == "FastGRNN":
+        FastCell = FastGRNNCell(hiddenDims,
+                                gate_non_linearity=gate_non_linearity,
+                                update_non_linearity=update_non_linearity,
+                                wRank=wRank, uRank=uRank)
+    elif cell == "FastRNN":
+        FastCell = FastRNNCell(hiddenDims,
+                               update_non_linearity=update_non_linearity,
+                               wRank=wRank, uRank=uRank)
+    elif cell == "UGRNN":
+        FastCell = UGRNNLRCell(hiddenDims,
+                               update_non_linearity=update_non_linearity,
+                               wRank=wRank, uRank=uRank)
+    elif cell == "GRU":
+        FastCell = GRULRCell(hiddenDims,
+                             update_non_linearity=update_non_linearity,
+                             wRank=wRank, uRank=uRank)
+    elif cell == "LSTM":
+        FastCell = LSTMLRCell(hiddenDims,
+                              update_non_linearity=update_non_linearity,
+                              wRank=wRank, uRank=uRank)
+    else:
+        sys.exit('Exiting: No Such Cell as ' + cell)
+
+    FastCellTrainer = FastTrainer(
+        FastCell, X, Y, sW=sW, sU=sU,
+        learningRate=learningRate, outFile=outFile)
+
+    sess = tf.compat.v1.InteractiveSession()
+    sess.run(tf.compat.v1.global_variables_initializer())
+
+    FastCellTrainer.train(batchSize, totalEpochs, sess, Xtrain, Xtest,
+                          Ytrain, Ytest, decayStep, decayRate,
+                          dataDir, currDir)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tf2.0/examples/FastCells/fetch_usps.py b/tf2.0/examples/FastCells/fetch_usps.py
new file mode 100644
index 000000000..a5c314369
--- /dev/null
+++ b/tf2.0/examples/FastCells/fetch_usps.py
@@ -0,0 +1,66 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+#
+# Setting up the USPS Data.
+
+import bz2
+import os
+import subprocess
+import sys
+
+import requests
+import numpy as np
+from sklearn.datasets import load_svmlight_file
+from helpermethods import download_file, decompress
+
+
+
+def downloadData(workingDir, downloadDir, linkTrain, linkTest):
+    path = workingDir + '/' + downloadDir
+    path = os.path.abspath(path)
+    try:
+        os.makedirs(path, exist_ok=True)
+    except OSError:
+        print("Could not create %s. Make sure the path does" % path)
+        print("not already exist and you have permissions to create it.")
+        return False
+
+    training_data_bz2 = download_file(linkTrain, path)
+    test_data_bz2 = download_file(linkTest, path)
+
+    training_data = decompress(training_data_bz2)
+    test_data = decompress(test_data_bz2)
+    
+    train = os.path.join(path, "train.txt")
+    test = os.path.join(path, "test.txt")
+    if os.path.isfile(train):
+        os.remove(train)
+    if os.path.isfile(test):
+        os.remove(test)
+
+    os.rename(training_data, train)
+    os.rename(test_data, test)
+    os.remove(training_data_bz2)
+    os.remove(test_data_bz2)
+    return True
+
+if __name__ == '__main__':
+    workingDir = './'
+    downloadDir = 'usps10'
+    linkTrain = 'http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2'
+    linkTest = 'http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2'
+    failureMsg = '''
+Download Failed!
+To manually perform the download
+\t1. Create a new empty directory named `usps10`.
+\t2. Download the data from the following links into the usps10 directory.
+\t\tTest: %s
+\t\tTrain: %s
+\t3. Extract the downloaded files.
+\t4. Rename `usps` to `train.txt` and,
+\t5. Rename `usps.t` to `test.txt
+''' % (linkTrain, linkTest)
+
+    if not downloadData(workingDir, downloadDir, linkTrain, linkTest):
+        exit(failureMsg)
+    print("Done: see ", downloadDir)
diff --git a/tf2.0/examples/FastCells/helpermethods.py b/tf2.0/examples/FastCells/helpermethods.py
new file mode 100644
index 000000000..a052330f3
--- /dev/null
+++ b/tf2.0/examples/FastCells/helpermethods.py
@@ -0,0 +1,273 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+'''
+ Functions to check sanity of input arguments
+ for the example script.
+'''
+import argparse
+import bz2
+import datetime
+import json
+import os
+
+import numpy as np
+import requests
+
+
+def decompress(filepath):
+    print("extracting: ", filepath)
+    zipfile = bz2.BZ2File(filepath)  # open the file
+    data = zipfile.read()  # get the decompressed data
+    newfilepath = os.path.splitext(filepath)[0]  # assuming the filepath ends with .bz2
+    with open(newfilepath, 'wb') as f:
+        f.write(data)  # write a uncompressed file
+    return newfilepath
+
+
+def download_file(url, local_folder=None):
+    """Downloads file pointed to by `url`.
+    If `local_folder` is not supplied, downloads to the current folder.
+    """
+    filename = os.path.basename(url)
+    if local_folder:
+        filename = os.path.join(local_folder, filename)
+
+    # Download the file
+    print("Downloading: " + url)
+    response = requests.get(url, stream=True)
+    if response.status_code != 200:
+        raise Exception("download file failed with status code: %d, fetching url '%s'" % (response.status_code, url))
+
+    # Write the file to disk
+    with open(filename, "wb") as handle:
+        handle.write(response.content)
+    return filename
+
+
+def checkIntPos(value):
+    ivalue = int(value)
+    if ivalue <= 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid positive int value" % value)
+    return ivalue
+
+
+def checkIntNneg(value):
+    ivalue = int(value)
+    if ivalue < 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid non-neg int value" % value)
+    return ivalue
+
+
+def checkFloatNneg(value):
+    fvalue = float(value)
+    if fvalue < 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid non-neg float value" % value)
+    return fvalue
+
+
+def checkFloatPos(value):
+    fvalue = float(value)
+    if fvalue <= 0:
+        raise argparse.ArgumentTypeError(
+            "%s is an invalid positive float value" % value)
+    return fvalue
+
+
+def getArgs():
+    '''
+    Function to parse arguments for FastCells
+    '''
+    parser = argparse.ArgumentParser(
+        description='HyperParams for Fast(G)RNN')
+    parser.add_argument('-dir', '--data-dir', required=True,
+                        help='Data directory containing' +
+                        'train.npy and test.npy')
+
+    parser.add_argument('-c', '--cell', type=str, default="FastGRNN",
+                        help='Choose between [FastGRNN, FastRNN, UGRNN' +
+                        ', GRU, LSTM], default: FastGRNN')
+
+    parser.add_argument('-id', '--input-dim', type=checkIntNneg, required=True,
+                        help='Input Dimension of RNN, each timestep will ' +
+                        'feed input-dim features to RNN. ' +
+                        'Total Feature length = Input Dim * Total Timestep')
+    parser.add_argument('-hd', '--hidden-dim', type=checkIntNneg,
+                        required=True, help='Hidden Dimension of RNN')
+
+    parser.add_argument('-e', '--epochs', type=checkIntPos, default=300,
+                        help='Total Epochs (default: 300 try:[100, 150, 600])')
+    parser.add_argument('-b', '--batch-size', type=checkIntPos, default=100,
+                        help='Batch Size to be used (default: 100)')
+    parser.add_argument('-lr', '--learning-rate', type=checkFloatPos,
+                        default=0.01, help='Initial Learning rate for ' +
+                        'Adam Optimizer (default: 0.01)')
+
+    parser.add_argument('-rW', '--wRank', type=checkIntPos, default=None,
+                        help='Rank for the low-rank parameterisation of W, ' +
+                        'None => Full Rank')
+    parser.add_argument('-rU', '--uRank', type=checkIntPos, default=None,
+                        help='Rank for the low-rank parameterisation of U, ' +
+                        'None => Full Rank')
+
+    parser.add_argument('-sW', type=checkFloatPos, default=1.0,
+                        help='Sparsity for predictor parameter W(and both ' +
+                        'W1 and W2 in low-rank)  ' +
+                        '(default: 1.0(Dense) try: [0.1, 0.2, 0.3])')
+    parser.add_argument('-sU', type=checkFloatPos, default=1.0,
+                        help='Sparsity for predictor parameter U(and both ' +
+                        'U1 and U2 in low-rank)  ' +
+                        '(default: 1.0(Dense) try: [0.1, 0.2, 0.3])')
+
+    parser.add_argument('-unl', '--update-nl', type=str, default="tanh",
+                        help='Update non linearity. Choose between ' +
+                        '[tanh, sigmoid, relu, quantTanh, quantSigm]. ' +
+                        'default => tanh. Can add more in edgeml/graph/rnn.py')
+    parser.add_argument('-gnl', '--gate-nl', type=str, default="sigmoid",
+                        help='Gate non linearity. Choose between ' +
+                        '[tanh, sigmoid, relu, quantTanh, quantSigm]. ' +
+                        'default => sigmoid. Can add more in ' +
+                        'edgeml/graph/rnn.py. Only Applicable to FastGRNN')
+
+    parser.add_argument('-dS', '--decay-step', type=checkIntPos, default=200,
+                        help='The interval (in epochs) after which the ' +
+                        'learning rate should decay. ' +
+                        'Default is 200 for 300 epochs')
+
+    parser.add_argument('-dR', '--decay-rate', type=checkFloatPos, default=0.1,
+                        help='The factor by which learning rate ' +
+                        'should decay after each interval. Default 0.1')
+
+    parser.add_argument('-oF', '--output-file', default=None,
+                        help='Output file for dumping the program output, ' +
+                        '(default: stdout)')
+
+    return parser.parse_args()
+
+
+def getQuantArgs():
+    '''
+    Function to parse arguments for Model Quantisation
+    '''
+    parser = argparse.ArgumentParser(
+        description='Arguments for quantizing Fast models. ' +
+        'Works only for piece-wise linear non-linearities, ' +
+        'like relu, quantTanh, quantSigm (check rnn.py for the definitions)')
+    parser.add_argument('-dir', '--model-dir', required=True,
+                        help='model directory containing' +
+                        '*.npy weight files dumped from the trained model')
+    parser.add_argument('-m', '--max-val', type=checkIntNneg, default=127,
+                        help='this represents the maximum possible value ' +
+                        'in model, essentially the byte complexity, ' +
+                        '127=> 1 byte is default')
+    parser.add_argument('-s', '--scalar-scale', type=checkIntNneg,
+                        default=1000, help='maximum granularity/decimals ' +
+                        'you wish to get when quantising simple sclars ' +
+                        'involved. Default is 1000')
+
+    return parser.parse_args()
+
+
+def createTimeStampDir(dataDir, cell):
+    '''
+    Creates a Directory with timestamp as it's name
+    '''
+    if os.path.isdir(os.path.join(dataDir, str(cell) + 'Results')) is False:
+        try:
+            os.mkdir(os.path.join(dataDir, str(cell) + 'Results'))
+        except OSError:
+            print("Creation of the directory %s failed" %
+                  os.path.join(dataDir, str(cell) + 'Results'))
+
+    currDir = os.path.join(str(cell) + 'Results',
+        datetime.datetime.now().strftime("%Y-%m-%dT%H-%M-%S"))
+    if os.path.isdir(os.path.join(dataDir, currDir)) is False:
+        try:
+            os.mkdir(os.path.join(dataDir, currDir))
+        except OSError:
+            print("Creation of the directory %s failed" %
+                  os.path.join(dataDir, currDir))
+        else:
+            return (os.path.join(dataDir, currDir))
+    return None
+
+
+def preProcessData(dataDir):
+    '''
+    Function to pre-process input data
+
+    Expects a .npy file of form [lbl feats] for each datapoint,
+    feats is timesteps*inputDims, flattened across timestep dimension.
+    So input of 1st timestep followed by second and so on.
+
+    Outputs train and test set datapoints
+    dataDimension, numClasses are inferred directly
+    '''
+    train = np.load(os.path.join(dataDir, 'train.npy'))
+    test = np.load(os.path.join(dataDir, 'test.npy'))
+
+    dataDimension = int(train.shape[1]) - 1
+
+    Xtrain = train[:, 1:dataDimension + 1]
+    Ytrain_ = train[:, 0]
+    numClasses = max(Ytrain_) - min(Ytrain_) + 1
+
+    Xtest = test[:, 1:dataDimension + 1]
+    Ytest_ = test[:, 0]
+
+    numClasses = int(max(numClasses, max(Ytest_) - min(Ytest_) + 1))
+
+    # Mean Var Normalisation
+    mean = np.mean(Xtrain, 0)
+    std = np.std(Xtrain, 0)
+    std[std[:] < 0.000001] = 1
+    Xtrain = (Xtrain - mean) / std
+
+    Xtest = (Xtest - mean) / std
+    # End Mean Var normalisation
+
+    lab = Ytrain_.astype('uint8')
+    lab = np.array(lab) - min(lab)
+
+    lab_ = np.zeros((Xtrain.shape[0], numClasses))
+    lab_[np.arange(Xtrain.shape[0]), lab] = 1
+    Ytrain = lab_
+
+    lab = Ytest_.astype('uint8')
+    lab = np.array(lab) - min(lab)
+
+    lab_ = np.zeros((Xtest.shape[0], numClasses))
+    lab_[np.arange(Xtest.shape[0]), lab] = 1
+    Ytest = lab_
+
+    return dataDimension, numClasses, Xtrain, Ytrain, Xtest, Ytest, mean, std
+
+
+def dumpCommand(list, currDir):
+    '''
+    Dumps the current command to a file for further use
+    '''
+    commandFile = open(os.path.join(currDir, 'command.txt'), 'w')
+    command = "python"
+
+    command = command + " " + ' '.join(list)
+    commandFile.write(command)
+
+    commandFile.flush()
+    commandFile.close()
+
+
+def saveMeanStd(mean, std, currDir):
+    '''
+    Function to save Mean and Std vectors
+    '''
+    np.save(os.path.join(currDir, 'mean.npy'), mean)
+    np.save(os.path.join(currDir, 'std.npy'), std)
+
+
+def saveJSon(data, filename):
+    with open(filename, "w") as outfile:
+        json.dump(data, outfile, indent=2)
diff --git a/tf2.0/examples/FastCells/process_usps.py b/tf2.0/examples/FastCells/process_usps.py
new file mode 100644
index 000000000..7ff763b00
--- /dev/null
+++ b/tf2.0/examples/FastCells/process_usps.py
@@ -0,0 +1,41 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+#
+# Processing the USPS Data. It is assumed that the data is already
+# downloaded.
+
+import subprocess
+import os
+import numpy as np
+from sklearn.datasets import load_svmlight_file
+import sys
+
+def processData(workingDir, downloadDir):
+    def loadLibSVMFile(file):
+        data = load_svmlight_file(file)
+        features = data[0]
+        labels = data[1]
+        retMat = np.zeros([features.shape[0], features.shape[1] + 1])
+        retMat[:, 0] = labels
+        retMat[:, 1:] = features.todense()
+        return retMat
+
+    path = workingDir + '/' + downloadDir
+    path = os.path.abspath(path)
+    trf = path + '/train.txt'
+    tsf = path + '/test.txt'
+    assert os.path.isfile(trf), 'File not found: %s' % trf
+    assert os.path.isfile(tsf), 'File not found: %s' % tsf
+    train = loadLibSVMFile(trf)
+    test = loadLibSVMFile(tsf)
+    np.save(path + '/train.npy', train)
+    np.save(path + '/test.npy', test)
+
+if __name__ == '__main__':
+    # Configuration
+    workingDir = './'
+    downloadDir = 'usps10'
+    # End config
+    print("Processing data")
+    processData(workingDir, downloadDir)
+    print("Done")
diff --git a/tf2.0/examples/FastCells/quantizeFastModels.py b/tf2.0/examples/FastCells/quantizeFastModels.py
new file mode 100644
index 000000000..746f6f9f4
--- /dev/null
+++ b/tf2.0/examples/FastCells/quantizeFastModels.py
@@ -0,0 +1,135 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+import helpermethods
+import os
+import numpy as np
+
+
+def sigmoid(x):
+    return 1 / (1 + np.exp(-x))
+
+
+def min_max(A, name):
+    print(name + " has max: " + str(np.max(A)) + " min: " + str(np.min(A)))
+    return np.max([np.abs(np.max(A)), np.abs(np.min(A))])
+
+
+def quantizeFastModels(modelDir, maxValue=127, scalarScaleFactor=1000):
+    ls = os.listdir(modelDir)
+    paramNameList = []
+    paramWeightList = []
+    paramLimitList = []
+
+    classifierNameList = []
+    classifierWeightList = []
+    classifierLimitList = []
+
+    scalarNameList = []
+    scalarWeightList = []
+
+    for file in ls:
+        if file.endswith("npy"):
+            if file.startswith("W"):
+                paramNameList.append(file)
+                temp = np.load(os.path.join(modelDir, file))
+                paramWeightList.append(temp)
+                paramLimitList.append(min_max(temp, file))
+            elif file.startswith("U"):
+                paramNameList.append(file)
+                temp = np.load(os.path.join(modelDir, file))
+                paramWeightList.append(temp)
+                paramLimitList.append(min_max(temp, file))
+            elif file.startswith("B"):
+                paramNameList.append(file)
+                temp = np.load(os.path.join(modelDir, file))
+                paramWeightList.append(temp)
+                paramLimitList.append(min_max(temp, file))
+            elif file.startswith("FC"):
+                classifierNameList.append(file)
+                temp = np.load(os.path.join(modelDir, file))
+                classifierWeightList.append(temp)
+                classifierLimitList.append(min_max(temp, file))
+            elif file.startswith("mean") or file.startswith("std"):
+                continue
+            else:
+                scalarNameList.append(file)
+                scalarWeightList.append(np.load(os.path.join(modelDir, file)))
+
+    paramLimit = np.max(paramLimitList)
+    classifierLimit = np.max(classifierLimitList)
+
+    paramScaleFactor = np.round((2.0 * maxValue + 1.0) / (2.0 * paramLimit))
+    classifierScaleFactor = (2.0 * maxValue + 1.0) / (2.0 * classifierLimit)
+
+    quantParamWeights = []
+    for param in paramWeightList:
+        temp = np.round(paramScaleFactor * param)
+        temp[temp[:] > maxValue] = maxValue
+        temp[temp[:] < -maxValue] = -1 * (maxValue + 1)
+
+        if maxValue <= 127:
+            temp = temp.astype('int8')
+        elif maxValue <= 32767:
+            temp = temp.astype('int16')
+        else:
+            temp = temp.astype('int32')
+
+        quantParamWeights.append(temp)
+
+    quantClassifierWeights = []
+    for param in classifierWeightList:
+        temp = np.round(classifierScaleFactor * param)
+        temp[temp[:] > maxValue] = maxValue
+        temp[temp[:] < -maxValue] = -1 * (maxValue + 1)
+
+        if maxValue <= 127:
+            temp = temp.astype('int8')
+        elif maxValue <= 32767:
+            temp = temp.astype('int16')
+        else:
+            temp = temp.astype('int32')
+
+        quantClassifierWeights.append(temp)
+
+    quantScalarWeights = []
+    for scalar in scalarWeightList:
+        quantScalarWeights.append(
+            np.round(scalarScaleFactor * sigmoid(scalar)).astype('int32'))
+
+    quantModelDir = os.path.join(modelDir, 'QuantizedFastModel')
+    if not os.path.isdir(quantModelDir):
+        try:
+            os.makedirs(quantModelDir, exist_ok=True)
+        except OSError:
+            print("Creation of the directory %s failed" % quantModelDir)
+
+    np.save(os.path.join(quantModelDir, "paramScaleFactor.npy"),
+            paramScaleFactor.astype('int32'))
+    np.save(os.path.join(quantModelDir, "classifierScaleFactor.npy"),
+            classifierScaleFactor)
+    np.save(os.path.join(quantModelDir, "scalarScaleFactor"), scalarScaleFactor)
+
+    for i in range(0, len(scalarNameList)):
+        np.save(os.path.join(quantModelDir, "q" +
+                scalarNameList[i]), quantScalarWeights[i])
+
+    for i in range(len(classifierNameList)):
+        np.save(os.path.join(quantModelDir, "q" +
+                classifierNameList[i]), quantClassifierWeights[i])
+
+    for i in range(len(paramNameList)):
+        np.save(os.path.join(quantModelDir, "q" + paramNameList[i]),
+                quantParamWeights[i])
+
+    print("\n\nQuantized Model Dir: " + quantModelDir)
+
+
+def main():
+    args = helpermethods.getQuantArgs()
+    quantizeFastModels(args.model_dir, int(
+        args.max_val), int(args.scalar_scale))
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tf2.0/examples/ProtoNN/README.md b/tf2.0/examples/ProtoNN/README.md
new file mode 100644
index 000000000..d0137ac4e
--- /dev/null
+++ b/tf2.0/examples/ProtoNN/README.md
@@ -0,0 +1,54 @@
+# Tensorflow ProtoNN Examples
+
+This directory includes an example [notebook](protoNN_example.ipynb)  and a
+command line execution script of ProtoNN developed as part of EdgeML. The
+example is based on the USPS dataset.
+
+`edgeml.graph.protoNN` implements the ProtoNN prediction graph in Tensorflow.
+The training routine for ProtoNN is decoupled from the forward graph to
+facilitate a plug and play behaviour wherein ProtoNN can be combined with or
+used as a final layer classifier for other architectures (RNNs, CNNs). The
+training routine is implemented in `edgeml.trainer.protoNNTrainer`.
+
+Note that, `protoNN_example.py` assumes the data to be in a specific format.  It
+is assumed that train and test data is contained in two files, `train.npy` and
+`test.npy`. Each containing a 2D numpy array of dimension `[numberOfExamples,
+numberOfFeatures + 1]`. The first column of each matrix is assumed to contain
+label information. For an N-Class problem, we assume the labels are integers
+from 0 through N-1. 
+
+**Tested With:** Tensorflow >1.6 with Python 2 and Python 3
+
+## Fetching Data
+
+The script - [fetch_usps.py](fetch_usps.py), can be used to  automatically
+download and [process_usps.py](process_usps.py), can be used to process the
+data into the required format.
+ To run this script, please use:
+
+    python fetch_usps.py
+    python process_usps.py
+
+
+## Running the ProtoNN execution script
+
+Along with the example notebook, a command line execution script for ProtoNN is
+provided in `protoNN_example.py`. After the USPS data has been setup, this
+script can be used with the following command:
+
+```
+python protoNN_example.py \
+      --data-dir ./usps10 \
+      --projection-dim 60 \
+      --num-prototypes 80 \
+      --gamma 0.0015 \
+      --learning-rate 0.1 \
+      --epochs 200 \
+      --val-step 10 \
+      --output-dir ./
+```
+
+You can expect a test set accuracy of about 92.5%.
+
+Copyright (c) Microsoft Corporation. All rights reserved. 
+Licensed under the MIT license.
diff --git a/tf2.0/examples/ProtoNN/fetch_usps.py b/tf2.0/examples/ProtoNN/fetch_usps.py
new file mode 100644
index 000000000..c1b2e0726
--- /dev/null
+++ b/tf2.0/examples/ProtoNN/fetch_usps.py
@@ -0,0 +1,64 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+#
+# Setting up the USPS Data.
+
+import subprocess
+import os
+import numpy as np
+from sklearn.datasets import load_svmlight_file
+import sys
+
+def downloadData(workingDir, downloadDir, linkTrain, linkTest):
+    def runcommand(command):
+        p = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
+        output, error = p.communicate()
+        assert(p.returncode == 0), 'Command failed: %s' % command
+
+    path = workingDir + '/' + downloadDir
+    path = os.path.abspath(path)
+    try:
+        os.mkdir(path)
+    except OSError:
+        print("Could not create %s. Make sure the path does" % path)
+        print("not already exist and you have permisions to create it.")
+        return False
+    cwd = os.getcwd()
+    os.chdir(path)
+    print("Downloading data")
+    command = 'wget %s' % linkTrain
+    runcommand(command)
+    command = 'wget %s' % linkTest
+    runcommand(command)
+    print("Extracting data")
+    command = 'bzip2 -d usps.bz2'
+    runcommand(command)
+    command = 'bzip2 -d usps.t.bz2'
+    runcommand(command)
+    command = 'mv usps train.txt'
+    runcommand(command)
+    command = 'mv usps.t test.txt'
+    runcommand(command)
+    os.chdir(cwd)
+    return True
+
+if __name__ == '__main__':
+    workingDir = './'
+    downloadDir = 'usps10'
+    linkTrain = 'http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.bz2'
+    linkTest = 'http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/usps.t.bz2'
+    failureMsg = '''
+Download Failed!
+To manually perform the download
+\t1. Create a new empty directory named `usps10`.
+\t2. Download the data from the following links into the usps10 directory.
+\t\tTest: %s
+\t\tTrain: %s
+\t3. Extract the downloaded files.
+\t4. Rename `usps` to `train.txt` and,
+\t5. Rename `usps.t` to `test.txt
+''' % (linkTrain, linkTest)
+
+    if not downloadData(workingDir, downloadDir, linkTrain, linkTest):
+        exit(failureMsg)
+    print("Done")
diff --git a/tf2.0/examples/ProtoNN/helpermethods.py b/tf2.0/examples/ProtoNN/helpermethods.py
new file mode 100644
index 000000000..1bd382825
--- /dev/null
+++ b/tf2.0/examples/ProtoNN/helpermethods.py
@@ -0,0 +1,206 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+from __future__ import print_function
+import sys
+import os
+import numpy as np
+import tensorflow as tf
+import edgeml.utils as utils
+import argparse
+
+
+def getModelSize(matrixList, sparcityList, expected=True, bytesPerVar=4):
+    '''
+    expected: Expected size according to the parameters set. The number of
+        zeros could actually be more than that is required to satisfy the
+        sparsity constraint.
+    '''
+    nnzList, sizeList, isSparseList = [], [], []
+    hasSparse = False
+    for i in range(len(matrixList)):
+        A, s = matrixList[i], sparcityList[i]
+        assert A.ndim == 2
+        assert s >= 0
+        assert s <= 1
+        nnz, size, sparse = utils.countnnZ(A, s, bytesPerVar=bytesPerVar)
+        nnzList.append(nnz)
+        sizeList.append(size)
+        hasSparse = (hasSparse or sparse)
+
+    totalnnZ = np.sum(nnzList)
+    totalSize = np.sum(sizeList)
+    if expected:
+        return totalnnZ, totalSize, hasSparse
+    numNonZero = 0
+    totalSize = 0
+    hasSparse = False
+    for i in range(len(matrixList)):
+        A, s = matrixList[i], sparcityList[i]
+        numNonZero_ = np.count_nonzero(A)
+        numNonZero += numNonZero_
+        hasSparse = (hasSparse or (s < 0.5))
+        if s <= 0.5:
+            totalSize += numNonZero_ * 2 * bytesPerVar
+        else:
+            totalSize += A.size * bytesPerVar
+    return numNonZero, totalSize, hasSparse
+
+
+def getGamma(gammaInit, projectionDim, dataDim, numPrototypes, x_train):
+    if gammaInit is None:
+        print("Using median heuristic to estimate gamma.")
+        gamma, W, B = utils.medianHeuristic(x_train, projectionDim,
+                                            numPrototypes)
+        print("Gamma estimate is: %f" % gamma)
+        return W, B, gamma
+    return None, None, gammaInit
+
+def to_onehot(y, numClasses, minlabel = None):
+    '''
+    If the y labelling does not contain the minimum label info, use min-label to
+    provide this value.
+    '''
+    lab = y.astype('uint8')
+    if minlabel is None:
+        minlabel = np.min(lab)
+    minlabel = int(minlabel)
+    lab = np.array(lab) - minlabel
+    lab_ = np.zeros((y.shape[0], numClasses))
+    lab_[np.arange(y.shape[0]), lab] = 1
+    return lab_
+
+def preprocessData(train, test):
+    '''
+    Loads data from the dataDir and does some initial preprocessing
+    steps. Data is assumed to be contained in two files,
+    train.npy and test.npy. Each containing a 2D numpy array of dimension
+    [numberOfExamples, numberOfFeatures + 1]. The first column of each
+    matrix is assumed to contain label information.
+
+    For an N-Class problem, we assume the labels are integers from 0 through
+    N-1.
+    '''
+    dataDimension = int(train.shape[1]) - 1
+    x_train = train[:, 1:dataDimension + 1]
+    y_train_ = train[:, 0]
+    x_test = test[:, 1:dataDimension + 1]
+    y_test_ = test[:, 0]
+
+    numClasses = max(y_train_) - min(y_train_) + 1
+    numClasses = max(numClasses, max(y_test_) - min(y_test_) + 1)
+    numClasses = int(numClasses)
+
+    # mean-var
+    mean = np.mean(x_train, 0)
+    std = np.std(x_train, 0)
+    std[std[:] < 0.000001] = 1
+    x_train = (x_train - mean) / std
+    x_test = (x_test - mean) / std
+
+    # one hot y-train
+    lab = y_train_.astype('uint8')
+    lab = np.array(lab) - min(lab)
+    lab_ = np.zeros((x_train.shape[0], numClasses))
+    lab_[np.arange(x_train.shape[0]), lab] = 1
+    y_train = lab_
+
+    # one hot y-test
+    lab = y_test_.astype('uint8')
+    lab = np.array(lab) - min(lab)
+    lab_ = np.zeros((x_test.shape[0], numClasses))
+    lab_[np.arange(x_test.shape[0]), lab] = 1
+    y_test = lab_
+
+    return dataDimension, numClasses, x_train, y_train, x_test, y_test
+
+
+
+def getProtoNNArgs():
+    def checkIntPos(value):
+        ivalue = int(value)
+        if ivalue <= 0:
+            raise argparse.ArgumentTypeError(
+                "%s is an invalid positive int value" % value)
+        return ivalue
+
+    def checkIntNneg(value):
+        ivalue = int(value)
+        if ivalue < 0:
+            raise argparse.ArgumentTypeError(
+                "%s is an invalid non-neg int value" % value)
+        return ivalue
+
+    def checkFloatNneg(value):
+        fvalue = float(value)
+        if fvalue < 0:
+            raise argparse.ArgumentTypeError(
+                "%s is an invalid non-neg float value" % value)
+        return fvalue
+
+    def checkFloatPos(value):
+        fvalue = float(value)
+        if fvalue <= 0:
+            raise argparse.ArgumentTypeError(
+                "%s is an invalid positive float value" % value)
+        return fvalue
+
+    '''
+    Parse protoNN commandline arguments
+    '''
+    parser = argparse.ArgumentParser(
+        description='Hyperparameters for ProtoNN Algorithm')
+
+    msg = 'Data directory containing train and test data. The '
+    msg += 'data is assumed to be saved as 2-D numpy matrices with '
+    msg += 'names `train.npy` and `test.npy`, of dimensions\n'
+    msg += '\t[numberOfInstances, numberOfFeatures + 1].\n'
+    msg += 'The first column of each file is assumed to contain label information.'
+    msg += ' For a N-class problem, labels are assumed to be integers from 0 to'
+    msg += ' N-1 (inclusive).'
+    parser.add_argument('-d', '--data-dir', required=True, help=msg)
+    parser.add_argument('-l', '--projection-dim', type=checkIntPos, default=10,
+                        help='Projection Dimension.')
+    parser.add_argument('-p', '--num-prototypes', type=checkIntPos, default=20,
+                        help='Number of prototypes.')
+    parser.add_argument('-g', '--gamma', type=checkFloatPos, default=None,
+                        help='Gamma for Gaussian kernel. If not provided, ' +
+                        'median heuristic will be used to estimate gamma.')
+
+    parser.add_argument('-e', '--epochs', type=checkIntPos, default=100,
+                        help='Total training epochs.')
+    parser.add_argument('-b', '--batch-size', type=checkIntPos, default=32,
+                        help='Batch size for each pass.')
+    parser.add_argument('-r', '--learning-rate', type=checkFloatPos,
+                        default=0.001,
+                        help='Initial Learning rate for ADAM Optimizer.')
+
+    parser.add_argument('-rW', type=float, default=0.000,
+                        help='Coefficient for l2 regularizer for predictor' +
+                        ' parameter W ' + '(default = 0.0).')
+    parser.add_argument('-rB', type=float, default=0.00,
+                        help='Coefficient for l2 regularizer for predictor' +
+                        ' parameter B ' + '(default = 0.0).')
+    parser.add_argument('-rZ', type=float, default=0.00,
+                        help='Coefficient for l2 regularizer for predictor' +
+                        'parameter Z ' +
+                        '(default = 0.0).')
+
+    parser.add_argument('-sW', type=float, default=1.000,
+                        help='Sparsity constraint for predictor parameter W ' +
+                        '(default = 1.0, i.e. dense matrix).')
+    parser.add_argument('-sB', type=float, default=1.00,
+                        help='Sparsity constraint for predictor parameter B ' +
+                        '(default = 1.0, i.e. dense matrix).')
+    parser.add_argument('-sZ', type=float, default=1.00,
+                        help='Sparsity constraint for predictor parameter Z ' +
+                        '(default = 1.0, i.e. dense matrix).')
+    parser.add_argument('-pS', '--print-step', type=int, default=200,
+                        help='The number of update steps between print ' +
+                        'calls to console.')
+    parser.add_argument('-vS', '--val-step', type=int, default=3,
+                        help='The number of epochs between validation' +
+                        'performance evaluation')
+    parser.add_argument('-o', '--output-dir', type=str, default='./',
+                        help='Output directory to dump model matrices.')
+    return parser.parse_args()
diff --git a/tf2.0/examples/ProtoNN/process_usps.py b/tf2.0/examples/ProtoNN/process_usps.py
new file mode 100644
index 000000000..dee4d1bbb
--- /dev/null
+++ b/tf2.0/examples/ProtoNN/process_usps.py
@@ -0,0 +1,51 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+#
+# Processing the USPS Data. It is assumed that the data is already
+# downloaded.
+
+import subprocess
+import os
+import numpy as np
+from sklearn.datasets import load_svmlight_file
+import sys
+from helpermethods import preprocessData
+
+def processData(workingDir, downloadDir):
+    def loadLibSVMFile(file):
+        data = load_svmlight_file(file)
+        features = data[0]
+        labels = data[1]
+        retMat = np.zeros([features.shape[0], features.shape[1] + 1])
+        retMat[:, 0] = labels
+        retMat[:, 1:] = features.todense()
+        return retMat
+
+    path = workingDir + '/' + downloadDir
+    path = os.path.abspath(path)
+    trf = path + '/train.txt'
+    tsf = path + '/test.txt'
+    assert os.path.isfile(trf), 'File not found: %s' % trf
+    assert os.path.isfile(tsf), 'File not found: %s' % tsf
+    train = loadLibSVMFile(trf)
+    test = loadLibSVMFile(tsf)
+    np.save(path + '/train_unnormalized.npy', train)
+    np.save(path + '/test_unnormalized.npy', test)
+    _, _, x_train, y_train, x_test, y_test = preprocessData(train, test)
+
+    y_ = np.expand_dims(np.argmax(y_train, axis=1), axis=1)
+    train = np.concatenate([y_, x_train], axis=1)
+    np.save(path + '/train.npy', train)
+    y_ = np.expand_dims(np.argmax(y_test, axis=1), axis=1)
+    test = np.concatenate([y_, x_test], axis=1)
+    np.save(path + '/test.npy', test)
+
+
+if __name__ == '__main__':
+    # Configuration
+    workingDir = './'
+    downloadDir = 'usps10'
+    # End config
+    print("Processing data")
+    processData(workingDir, downloadDir)
+    print("Done")
diff --git a/tf2.0/examples/ProtoNN/protoNN_example.ipynb b/tf2.0/examples/ProtoNN/protoNN_example.ipynb
new file mode 100644
index 000000000..9581b97e9
--- /dev/null
+++ b/tf2.0/examples/ProtoNN/protoNN_example.ipynb
@@ -0,0 +1,449 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ProtoNN in Tensorflow\n",
+    "\n",
+    "This is a simple notebook that illustrates the usage of Tensorflow implementation of ProtoNN. We are using the USPS dataset. Please refer to `fetch_usps.py` and `process_usps.py`for more details on downloading the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T13:06:10.223951Z",
+     "start_time": "2018-08-15T13:06:09.303454Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Copyright (c) Microsoft Corporation. All rights reserved.\n",
+    "# Licensed under the MIT license.\n",
+    "\n",
+    "from __future__ import print_function\n",
+    "import sys\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import tensorflow as tf\n",
+    "\n",
+    "from edgeml.trainer.protoNNTrainer import ProtoNNTrainer\n",
+    "from edgeml.graph.protoNN import ProtoNN\n",
+    "import edgeml.utils as utils\n",
+    "import helpermethods as helper"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# USPS Data\n",
+    "\n",
+    "It is assumed that the USPS data has already been downloaded and set up with the help of [fetch_usps.py](fetch_usps.py) and is placed in the `./usps10` subdirectory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T13:06:10.271026Z",
+     "start_time": "2018-08-15T13:06:10.225900Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# Load data\n",
+    "DATA_DIR = './usps10'\n",
+    "train, test = np.load(DATA_DIR + '/train.npy'), np.load(DATA_DIR + '/test.npy')\n",
+    "x_train, y_train = train[:, 1:], train[:, 0]\n",
+    "x_test, y_test = test[:, 1:], test[:, 0]\n",
+    "\n",
+    "numClasses = max(y_train) - min(y_train) + 1\n",
+    "numClasses = max(numClasses, max(y_test) - min(y_test) + 1)\n",
+    "numClasses = int(numClasses)\n",
+    "\n",
+    "y_train = helper.to_onehot(y_train, numClasses)\n",
+    "y_test = helper.to_onehot(y_test, numClasses)\n",
+    "dataDimension = x_train.shape[1]\n",
+    "numClasses = y_train.shape[1]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Model Parameters\n",
+    "\n",
+    "Note that ProtoNN is very sensitive to the value of the hyperparameter $\\gamma$, here stored in valiable `GAMMA`. If `GAMMA` is set to `None`, median heuristic will be used to estimate a good value of $\\gamma$ through the `helper.getGamma()` method. This method also returns the corresponding `W` and `B` matrices which should be used to initialize ProtoNN (as is done here)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T13:06:10.279204Z",
+     "start_time": "2018-08-15T13:06:10.272880Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "PROJECTION_DIM = 60\n",
+    "NUM_PROTOTYPES = 60\n",
+    "REG_W = 0.000005\n",
+    "REG_B = 0.0\n",
+    "REG_Z = 0.00005\n",
+    "SPAR_W = 0.8\n",
+    "SPAR_B = 1.0\n",
+    "SPAR_Z = 1.0\n",
+    "LEARNING_RATE = 0.05\n",
+    "NUM_EPOCHS = 200\n",
+    "BATCH_SIZE = 32\n",
+    "GAMMA = 0.0015"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T13:06:10.307632Z",
+     "start_time": "2018-08-15T13:06:10.280955Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "W, B, gamma = helper.getGamma(GAMMA, PROJECTION_DIM, dataDimension,\n",
+    "                       NUM_PROTOTYPES, x_train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T13:07:22.641991Z",
+     "start_time": "2018-08-15T13:06:10.309353Z"
+    },
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch:   0 Batch:   0 Loss: 5.85158 Accuracy: 0.03125\n",
+      "Epoch:   1 Batch:   0 Loss: 1.53823 Accuracy: 0.65625\n",
+      "Epoch:   2 Batch:   0 Loss: 0.81371 Accuracy: 0.87500\n",
+      "Epoch:   3 Batch:   0 Loss: 0.51246 Accuracy: 0.87500\n",
+      "Epoch:   4 Batch:   0 Loss: 0.41875 Accuracy: 0.93750\n",
+      "Epoch:   5 Batch:   0 Loss: 0.36797 Accuracy: 0.96875\n",
+      "Epoch:   6 Batch:   0 Loss: 0.32868 Accuracy: 0.96875\n",
+      "Epoch:   7 Batch:   0 Loss: 0.30316 Accuracy: 0.96875\n",
+      "Epoch:   8 Batch:   0 Loss: 0.29075 Accuracy: 0.96875\n",
+      "Epoch:   9 Batch:   0 Loss: 0.28370 Accuracy: 0.96875\n",
+      "Test Loss: 0.50615 Accuracy: 0.89497\n",
+      "Epoch:  10 Batch:   0 Loss: 0.28014 Accuracy: 0.96875\n",
+      "Epoch:  11 Batch:   0 Loss: 0.27734 Accuracy: 0.96875\n",
+      "Epoch:  12 Batch:   0 Loss: 0.27511 Accuracy: 0.96875\n",
+      "Epoch:  13 Batch:   0 Loss: 0.27126 Accuracy: 0.96875\n",
+      "Epoch:  14 Batch:   0 Loss: 0.26776 Accuracy: 0.96875\n",
+      "Epoch:  15 Batch:   0 Loss: 0.26506 Accuracy: 0.96875\n",
+      "Epoch:  16 Batch:   0 Loss: 0.26371 Accuracy: 0.96875\n",
+      "Epoch:  17 Batch:   0 Loss: 0.26249 Accuracy: 0.96875\n",
+      "Epoch:  18 Batch:   0 Loss: 0.26094 Accuracy: 0.96875\n",
+      "Epoch:  19 Batch:   0 Loss: 0.25879 Accuracy: 0.96875\n",
+      "Test Loss: 0.54362 Accuracy: 0.89494\n",
+      "Epoch:  20 Batch:   0 Loss: 0.25642 Accuracy: 0.96875\n",
+      "Epoch:  21 Batch:   0 Loss: 0.25328 Accuracy: 0.96875\n",
+      "Epoch:  22 Batch:   0 Loss: 0.25015 Accuracy: 0.96875\n",
+      "Epoch:  23 Batch:   0 Loss: 0.24684 Accuracy: 0.96875\n",
+      "Epoch:  24 Batch:   0 Loss: 0.24365 Accuracy: 0.96875\n",
+      "Epoch:  25 Batch:   0 Loss: 0.24023 Accuracy: 0.96875\n",
+      "Epoch:  26 Batch:   0 Loss: 0.23747 Accuracy: 0.96875\n",
+      "Epoch:  27 Batch:   0 Loss: 0.23460 Accuracy: 0.96875\n",
+      "Epoch:  28 Batch:   0 Loss: 0.23170 Accuracy: 0.96875\n",
+      "Epoch:  29 Batch:   0 Loss: 0.22903 Accuracy: 0.96875\n",
+      "Test Loss: 0.54884 Accuracy: 0.89391\n",
+      "Epoch:  30 Batch:   0 Loss: 0.22662 Accuracy: 0.96875\n",
+      "Epoch:  31 Batch:   0 Loss: 0.22448 Accuracy: 0.96875\n",
+      "Epoch:  32 Batch:   0 Loss: 0.22245 Accuracy: 0.96875\n",
+      "Epoch:  33 Batch:   0 Loss: 0.22068 Accuracy: 0.96875\n",
+      "Epoch:  34 Batch:   0 Loss: 0.21904 Accuracy: 0.96875\n",
+      "Epoch:  35 Batch:   0 Loss: 0.21723 Accuracy: 0.96875\n",
+      "Epoch:  36 Batch:   0 Loss: 0.21582 Accuracy: 0.96875\n",
+      "Epoch:  37 Batch:   0 Loss: 0.21409 Accuracy: 0.96875\n",
+      "Epoch:  38 Batch:   0 Loss: 0.21246 Accuracy: 0.96875\n",
+      "Epoch:  39 Batch:   0 Loss: 0.21095 Accuracy: 0.96875\n",
+      "Test Loss: 0.52917 Accuracy: 0.90091\n",
+      "Epoch:  40 Batch:   0 Loss: 0.20928 Accuracy: 0.96875\n",
+      "Epoch:  41 Batch:   0 Loss: 0.20770 Accuracy: 0.96875\n",
+      "Epoch:  42 Batch:   0 Loss: 0.20633 Accuracy: 0.96875\n",
+      "Epoch:  43 Batch:   0 Loss: 0.20512 Accuracy: 0.96875\n",
+      "Epoch:  44 Batch:   0 Loss: 0.20377 Accuracy: 0.96875\n",
+      "Epoch:  45 Batch:   0 Loss: 0.20240 Accuracy: 0.96875\n",
+      "Epoch:  46 Batch:   0 Loss: 0.20124 Accuracy: 0.96875\n",
+      "Epoch:  47 Batch:   0 Loss: 0.20002 Accuracy: 0.96875\n",
+      "Epoch:  48 Batch:   0 Loss: 0.19910 Accuracy: 0.96875\n",
+      "Epoch:  49 Batch:   0 Loss: 0.19808 Accuracy: 0.96875\n",
+      "Test Loss: 0.50988 Accuracy: 0.90292\n",
+      "Epoch:  50 Batch:   0 Loss: 0.19705 Accuracy: 0.96875\n",
+      "Epoch:  51 Batch:   0 Loss: 0.19629 Accuracy: 1.00000\n",
+      "Epoch:  52 Batch:   0 Loss: 0.19560 Accuracy: 1.00000\n",
+      "Epoch:  53 Batch:   0 Loss: 0.19483 Accuracy: 1.00000\n",
+      "Epoch:  54 Batch:   0 Loss: 0.19404 Accuracy: 1.00000\n",
+      "Epoch:  55 Batch:   0 Loss: 0.19351 Accuracy: 1.00000\n",
+      "Epoch:  56 Batch:   0 Loss: 0.19279 Accuracy: 1.00000\n",
+      "Epoch:  57 Batch:   0 Loss: 0.19250 Accuracy: 1.00000\n",
+      "Epoch:  58 Batch:   0 Loss: 0.19207 Accuracy: 1.00000\n",
+      "Epoch:  59 Batch:   0 Loss: 0.19169 Accuracy: 1.00000\n",
+      "Test Loss: 0.48988 Accuracy: 0.90443\n",
+      "Epoch:  60 Batch:   0 Loss: 0.19146 Accuracy: 1.00000\n",
+      "Epoch:  61 Batch:   0 Loss: 0.19119 Accuracy: 1.00000\n",
+      "Epoch:  62 Batch:   0 Loss: 0.19095 Accuracy: 1.00000\n",
+      "Epoch:  63 Batch:   0 Loss: 0.19077 Accuracy: 1.00000\n",
+      "Epoch:  64 Batch:   0 Loss: 0.19066 Accuracy: 1.00000\n",
+      "Epoch:  65 Batch:   0 Loss: 0.19071 Accuracy: 1.00000\n",
+      "Epoch:  66 Batch:   0 Loss: 0.19066 Accuracy: 1.00000\n",
+      "Epoch:  67 Batch:   0 Loss: 0.19071 Accuracy: 1.00000\n",
+      "Epoch:  68 Batch:   0 Loss: 0.19083 Accuracy: 1.00000\n",
+      "Epoch:  69 Batch:   0 Loss: 0.19090 Accuracy: 1.00000\n",
+      "Test Loss: 0.47286 Accuracy: 0.90841\n",
+      "Epoch:  70 Batch:   0 Loss: 0.19108 Accuracy: 1.00000\n",
+      "Epoch:  71 Batch:   0 Loss: 0.19110 Accuracy: 1.00000\n",
+      "Epoch:  72 Batch:   0 Loss: 0.19109 Accuracy: 1.00000\n",
+      "Epoch:  73 Batch:   0 Loss: 0.19118 Accuracy: 1.00000\n",
+      "Epoch:  74 Batch:   0 Loss: 0.19115 Accuracy: 1.00000\n",
+      "Epoch:  75 Batch:   0 Loss: 0.19122 Accuracy: 1.00000\n",
+      "Epoch:  76 Batch:   0 Loss: 0.19099 Accuracy: 1.00000\n",
+      "Epoch:  77 Batch:   0 Loss: 0.19087 Accuracy: 1.00000\n",
+      "Epoch:  78 Batch:   0 Loss: 0.19070 Accuracy: 1.00000\n",
+      "Epoch:  79 Batch:   0 Loss: 0.19069 Accuracy: 0.96875\n",
+      "Test Loss: 0.46010 Accuracy: 0.91289\n",
+      "Epoch:  80 Batch:   0 Loss: 0.19055 Accuracy: 0.96875\n",
+      "Epoch:  81 Batch:   0 Loss: 0.19055 Accuracy: 0.96875\n",
+      "Epoch:  82 Batch:   0 Loss: 0.19013 Accuracy: 0.96875\n",
+      "Epoch:  83 Batch:   0 Loss: 0.19005 Accuracy: 0.96875\n",
+      "Epoch:  84 Batch:   0 Loss: 0.18991 Accuracy: 0.96875\n",
+      "Epoch:  85 Batch:   0 Loss: 0.18985 Accuracy: 0.96875\n",
+      "Epoch:  86 Batch:   0 Loss: 0.18961 Accuracy: 0.96875\n",
+      "Epoch:  87 Batch:   0 Loss: 0.18926 Accuracy: 0.96875\n",
+      "Epoch:  88 Batch:   0 Loss: 0.18901 Accuracy: 0.96875\n",
+      "Epoch:  89 Batch:   0 Loss: 0.18866 Accuracy: 0.96875\n",
+      "Test Loss: 0.45069 Accuracy: 0.91588\n",
+      "Epoch:  90 Batch:   0 Loss: 0.18821 Accuracy: 0.96875\n",
+      "Epoch:  91 Batch:   0 Loss: 0.18801 Accuracy: 0.96875\n",
+      "Epoch:  92 Batch:   0 Loss: 0.18799 Accuracy: 0.96875\n",
+      "Epoch:  93 Batch:   0 Loss: 0.18779 Accuracy: 0.96875\n",
+      "Epoch:  94 Batch:   0 Loss: 0.18743 Accuracy: 0.96875\n",
+      "Epoch:  95 Batch:   0 Loss: 0.18732 Accuracy: 0.96875\n",
+      "Epoch:  96 Batch:   0 Loss: 0.18720 Accuracy: 0.96875\n",
+      "Epoch:  97 Batch:   0 Loss: 0.18696 Accuracy: 0.96875\n",
+      "Epoch:  98 Batch:   0 Loss: 0.18674 Accuracy: 0.96875\n",
+      "Epoch:  99 Batch:   0 Loss: 0.18637 Accuracy: 0.96875\n",
+      "Test Loss: 0.44313 Accuracy: 0.91739\n",
+      "Epoch: 100 Batch:   0 Loss: 0.18625 Accuracy: 0.96875\n",
+      "Epoch: 101 Batch:   0 Loss: 0.18607 Accuracy: 0.96875\n",
+      "Epoch: 102 Batch:   0 Loss: 0.18596 Accuracy: 0.96875\n",
+      "Epoch: 103 Batch:   0 Loss: 0.18585 Accuracy: 0.96875\n",
+      "Epoch: 104 Batch:   0 Loss: 0.18578 Accuracy: 0.96875\n",
+      "Epoch: 105 Batch:   0 Loss: 0.18561 Accuracy: 0.96875\n",
+      "Epoch: 106 Batch:   0 Loss: 0.18538 Accuracy: 0.96875\n",
+      "Epoch: 107 Batch:   0 Loss: 0.18525 Accuracy: 0.96875\n",
+      "Epoch: 108 Batch:   0 Loss: 0.18512 Accuracy: 0.96875\n",
+      "Epoch: 109 Batch:   0 Loss: 0.18494 Accuracy: 0.96875\n",
+      "Test Loss: 0.43641 Accuracy: 0.91939\n",
+      "Epoch: 110 Batch:   0 Loss: 0.18505 Accuracy: 0.96875\n",
+      "Epoch: 111 Batch:   0 Loss: 0.18486 Accuracy: 0.96875\n",
+      "Epoch: 112 Batch:   0 Loss: 0.18481 Accuracy: 0.96875\n",
+      "Epoch: 113 Batch:   0 Loss: 0.18461 Accuracy: 0.96875\n",
+      "Epoch: 114 Batch:   0 Loss: 0.18439 Accuracy: 0.96875\n",
+      "Epoch: 115 Batch:   0 Loss: 0.18411 Accuracy: 0.96875\n",
+      "Epoch: 116 Batch:   0 Loss: 0.18385 Accuracy: 0.96875\n",
+      "Epoch: 117 Batch:   0 Loss: 0.18367 Accuracy: 0.96875\n",
+      "Epoch: 118 Batch:   0 Loss: 0.18353 Accuracy: 0.96875\n",
+      "Epoch: 119 Batch:   0 Loss: 0.18333 Accuracy: 0.96875\n",
+      "Test Loss: 0.43042 Accuracy: 0.92185\n",
+      "Epoch: 120 Batch:   0 Loss: 0.18315 Accuracy: 0.96875\n",
+      "Epoch: 121 Batch:   0 Loss: 0.18304 Accuracy: 0.96875\n",
+      "Epoch: 122 Batch:   0 Loss: 0.18268 Accuracy: 0.96875\n",
+      "Epoch: 123 Batch:   0 Loss: 0.18227 Accuracy: 0.96875\n",
+      "Epoch: 124 Batch:   0 Loss: 0.18209 Accuracy: 0.96875\n",
+      "Epoch: 125 Batch:   0 Loss: 0.18206 Accuracy: 0.96875\n",
+      "Epoch: 126 Batch:   0 Loss: 0.18184 Accuracy: 0.96875\n",
+      "Epoch: 127 Batch:   0 Loss: 0.18185 Accuracy: 0.96875\n",
+      "Epoch: 128 Batch:   0 Loss: 0.18159 Accuracy: 0.96875\n",
+      "Epoch: 129 Batch:   0 Loss: 0.18152 Accuracy: 0.96875\n",
+      "Test Loss: 0.42608 Accuracy: 0.92284\n",
+      "Epoch: 130 Batch:   0 Loss: 0.18125 Accuracy: 0.96875\n",
+      "Epoch: 131 Batch:   0 Loss: 0.18102 Accuracy: 0.96875\n",
+      "Epoch: 132 Batch:   0 Loss: 0.18075 Accuracy: 0.96875\n",
+      "Epoch: 133 Batch:   0 Loss: 0.18055 Accuracy: 0.96875\n",
+      "Epoch: 134 Batch:   0 Loss: 0.18025 Accuracy: 0.96875\n",
+      "Epoch: 135 Batch:   0 Loss: 0.18015 Accuracy: 0.96875\n",
+      "Epoch: 136 Batch:   0 Loss: 0.18000 Accuracy: 0.96875\n",
+      "Epoch: 137 Batch:   0 Loss: 0.17977 Accuracy: 0.96875\n",
+      "Epoch: 138 Batch:   0 Loss: 0.17963 Accuracy: 0.96875\n",
+      "Epoch: 139 Batch:   0 Loss: 0.17950 Accuracy: 0.96875\n",
+      "Test Loss: 0.42305 Accuracy: 0.92238\n",
+      "Epoch: 140 Batch:   0 Loss: 0.17935 Accuracy: 0.96875\n",
+      "Epoch: 141 Batch:   0 Loss: 0.17908 Accuracy: 0.96875\n",
+      "Epoch: 142 Batch:   0 Loss: 0.17903 Accuracy: 0.96875\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: 143 Batch:   0 Loss: 0.17900 Accuracy: 0.96875\n",
+      "Epoch: 144 Batch:   0 Loss: 0.17877 Accuracy: 0.96875\n",
+      "Epoch: 145 Batch:   0 Loss: 0.17863 Accuracy: 0.96875\n",
+      "Epoch: 146 Batch:   0 Loss: 0.17844 Accuracy: 0.96875\n",
+      "Epoch: 147 Batch:   0 Loss: 0.17825 Accuracy: 1.00000\n",
+      "Epoch: 148 Batch:   0 Loss: 0.17808 Accuracy: 1.00000\n",
+      "Epoch: 149 Batch:   0 Loss: 0.17798 Accuracy: 1.00000\n",
+      "Test Loss: 0.42054 Accuracy: 0.92288\n",
+      "Epoch: 150 Batch:   0 Loss: 0.17788 Accuracy: 1.00000\n",
+      "Epoch: 151 Batch:   0 Loss: 0.17773 Accuracy: 1.00000\n",
+      "Epoch: 152 Batch:   0 Loss: 0.17753 Accuracy: 1.00000\n",
+      "Epoch: 153 Batch:   0 Loss: 0.17743 Accuracy: 1.00000\n",
+      "Epoch: 154 Batch:   0 Loss: 0.17736 Accuracy: 1.00000\n",
+      "Epoch: 155 Batch:   0 Loss: 0.17724 Accuracy: 1.00000\n",
+      "Epoch: 156 Batch:   0 Loss: 0.17721 Accuracy: 1.00000\n",
+      "Epoch: 157 Batch:   0 Loss: 0.17721 Accuracy: 1.00000\n",
+      "Epoch: 158 Batch:   0 Loss: 0.17713 Accuracy: 1.00000\n",
+      "Epoch: 159 Batch:   0 Loss: 0.17701 Accuracy: 1.00000\n",
+      "Test Loss: 0.41836 Accuracy: 0.92337\n",
+      "Epoch: 160 Batch:   0 Loss: 0.17695 Accuracy: 1.00000\n",
+      "Epoch: 161 Batch:   0 Loss: 0.17691 Accuracy: 1.00000\n",
+      "Epoch: 162 Batch:   0 Loss: 0.17687 Accuracy: 1.00000\n",
+      "Epoch: 163 Batch:   0 Loss: 0.17687 Accuracy: 1.00000\n",
+      "Epoch: 164 Batch:   0 Loss: 0.17682 Accuracy: 1.00000\n",
+      "Epoch: 165 Batch:   0 Loss: 0.17686 Accuracy: 1.00000\n",
+      "Epoch: 166 Batch:   0 Loss: 0.17686 Accuracy: 1.00000\n",
+      "Epoch: 167 Batch:   0 Loss: 0.17687 Accuracy: 1.00000\n",
+      "Epoch: 168 Batch:   0 Loss: 0.17688 Accuracy: 1.00000\n",
+      "Epoch: 169 Batch:   0 Loss: 0.17688 Accuracy: 1.00000\n",
+      "Test Loss: 0.41637 Accuracy: 0.92536\n",
+      "Epoch: 170 Batch:   0 Loss: 0.17685 Accuracy: 1.00000\n",
+      "Epoch: 171 Batch:   0 Loss: 0.17684 Accuracy: 1.00000\n",
+      "Epoch: 172 Batch:   0 Loss: 0.17688 Accuracy: 1.00000\n",
+      "Epoch: 173 Batch:   0 Loss: 0.17695 Accuracy: 1.00000\n",
+      "Epoch: 174 Batch:   0 Loss: 0.17693 Accuracy: 1.00000\n",
+      "Epoch: 175 Batch:   0 Loss: 0.17693 Accuracy: 1.00000\n",
+      "Epoch: 176 Batch:   0 Loss: 0.17697 Accuracy: 1.00000\n",
+      "Epoch: 177 Batch:   0 Loss: 0.17707 Accuracy: 1.00000\n",
+      "Epoch: 178 Batch:   0 Loss: 0.17715 Accuracy: 1.00000\n",
+      "Epoch: 179 Batch:   0 Loss: 0.17728 Accuracy: 1.00000\n",
+      "Test Loss: 0.41443 Accuracy: 0.92585\n",
+      "Epoch: 180 Batch:   0 Loss: 0.17732 Accuracy: 1.00000\n",
+      "Epoch: 181 Batch:   0 Loss: 0.17738 Accuracy: 1.00000\n",
+      "Epoch: 182 Batch:   0 Loss: 0.17747 Accuracy: 1.00000\n",
+      "Epoch: 183 Batch:   0 Loss: 0.17750 Accuracy: 1.00000\n",
+      "Epoch: 184 Batch:   0 Loss: 0.17762 Accuracy: 1.00000\n",
+      "Epoch: 185 Batch:   0 Loss: 0.17775 Accuracy: 1.00000\n",
+      "Epoch: 186 Batch:   0 Loss: 0.17791 Accuracy: 1.00000\n",
+      "Epoch: 187 Batch:   0 Loss: 0.17795 Accuracy: 1.00000\n",
+      "Epoch: 188 Batch:   0 Loss: 0.17811 Accuracy: 1.00000\n",
+      "Epoch: 189 Batch:   0 Loss: 0.17818 Accuracy: 1.00000\n",
+      "Test Loss: 0.41260 Accuracy: 0.92536\n",
+      "Epoch: 190 Batch:   0 Loss: 0.17835 Accuracy: 1.00000\n",
+      "Epoch: 191 Batch:   0 Loss: 0.17847 Accuracy: 1.00000\n",
+      "Epoch: 192 Batch:   0 Loss: 0.17855 Accuracy: 1.00000\n",
+      "Epoch: 193 Batch:   0 Loss: 0.17863 Accuracy: 1.00000\n",
+      "Epoch: 194 Batch:   0 Loss: 0.17880 Accuracy: 1.00000\n",
+      "Epoch: 195 Batch:   0 Loss: 0.17885 Accuracy: 1.00000\n",
+      "Epoch: 196 Batch:   0 Loss: 0.17900 Accuracy: 1.00000\n",
+      "Epoch: 197 Batch:   0 Loss: 0.17912 Accuracy: 1.00000\n",
+      "Epoch: 198 Batch:   0 Loss: 0.17927 Accuracy: 1.00000\n",
+      "Epoch: 199 Batch:   0 Loss: 0.17948 Accuracy: 1.00000\n",
+      "Test Loss: 0.41087 Accuracy: 0.92536\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Setup input and train protoNN\n",
+    "X = tf.placeholder(tf.float32, [None, dataDimension], name='X')\n",
+    "Y = tf.placeholder(tf.float32, [None, numClasses], name='Y')\n",
+    "protoNN = ProtoNN(dataDimension, PROJECTION_DIM,\n",
+    "                  NUM_PROTOTYPES, numClasses,\n",
+    "                  gamma, W=W, B=B)\n",
+    "trainer = ProtoNNTrainer(protoNN, REG_W, REG_B, REG_Z,\n",
+    "                         SPAR_W, SPAR_B, SPAR_Z,\n",
+    "                         LEARNING_RATE, X, Y, lossType='xentropy')\n",
+    "sess = tf.Session()\n",
+    "trainer.train(BATCH_SIZE, NUM_EPOCHS, sess, x_train, x_test, y_train, y_test,\n",
+    "              printStep=600, valStep=10)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Model Evaluation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2018-08-15T13:07:22.671507Z",
+     "start_time": "2018-08-15T13:07:22.645050Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Final test accuracy 0.92526156\n",
+      "Model size constraint (Bytes):  78240\n",
+      "Number of non-zeros:  19560\n",
+      "Actual model size:  78240\n",
+      "Actual non-zeros:  16488\n"
+     ]
+    }
+   ],
+   "source": [
+    "acc = sess.run(protoNN.accuracy, feed_dict={X: x_test, Y: y_test})\n",
+    "# W, B, Z are tensorflow graph nodes\n",
+    "W, B, Z, _ = protoNN.getModelMatrices()\n",
+    "matrixList = sess.run([W, B, Z])\n",
+    "sparcityList = [SPAR_W, SPAR_B, SPAR_Z]\n",
+    "nnz, size, sparse = helper.getModelSize(matrixList, sparcityList)\n",
+    "print(\"Final test accuracy\", acc)\n",
+    "print(\"Model size constraint (Bytes): \", size)\n",
+    "print(\"Number of non-zeros: \", nnz)\n",
+    "nnz, size, sparse = helper.getModelSize(matrixList, sparcityList, expected=False)\n",
+    "print(\"Actual model size: \", size)\n",
+    "print(\"Actual non-zeros: \", nnz)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/tf2.0/examples/ProtoNN/protoNN_example.py b/tf2.0/examples/ProtoNN/protoNN_example.py
new file mode 100644
index 000000000..9b49c6542
--- /dev/null
+++ b/tf2.0/examples/ProtoNN/protoNN_example.py
@@ -0,0 +1,88 @@
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT license.
+
+from __future__ import print_function
+import sys
+import os
+import numpy as np
+import tensorflow as tf
+from edgeml.trainer.protoNNTrainer import ProtoNNTrainer
+from edgeml.graph.protoNN import ProtoNN
+import edgeml.utils as utils
+import helpermethods as helper
+
+tf.compat.v1.disable_eager_execution()
+
+def main():
+    config = helper.getProtoNNArgs()
+    # Get hyper parameters
+    DATA_DIR = config.data_dir
+    PROJECTION_DIM = config.projection_dim
+    NUM_PROTOTYPES = config.num_prototypes
+    REG_W = config.rW
+    REG_B = config.rB
+    REG_Z = config.rZ
+    SPAR_W = config.sW
+    SPAR_B = config.sB
+    SPAR_Z = config.sZ
+    LEARNING_RATE = config.learning_rate
+    NUM_EPOCHS = config.epochs
+    BATCH_SIZE = config.batch_size
+    PRINT_STEP = config.print_step
+    VAL_STEP = config.val_step
+    OUT_DIR = config.output_dir
+
+    # Load data
+    train = np.load(DATA_DIR + '/train.npy')
+    test = np.load(DATA_DIR + '/test.npy')
+    x_train, y_train = train[:, 1:], train[:, 0]
+    x_test, y_test = test[:, 1:], test[:, 0]
+    # Convert y to one-hot
+    minval = min(min(y_train), min(y_test))
+    numClasses = max(y_train) - min(y_train) + 1
+    numClasses = max(numClasses, max(y_test) - min(y_test) + 1)
+    numClasses = int(numClasses)
+    y_train = helper.to_onehot(y_train, numClasses, minlabel=minval)
+    y_test = helper.to_onehot(y_test, numClasses, minlabel=minval)
+    dataDimension = x_train.shape[1]
+
+    W, B, gamma = helper.getGamma(config.gamma, PROJECTION_DIM, dataDimension,
+                                  NUM_PROTOTYPES, x_train)
+
+    # Setup input and train protoNN
+    X = tf.compat.v1.placeholder(tf.float32, [None, dataDimension], name='X')
+    Y = tf.compat.v1.placeholder(tf.float32, [None, numClasses], name='Y')
+    protoNN = ProtoNN(dataDimension, PROJECTION_DIM,
+                      NUM_PROTOTYPES, numClasses,
+                      gamma, W=W, B=B)
+    trainer = ProtoNNTrainer(protoNN, REG_W, REG_B, REG_Z,
+                             SPAR_W, SPAR_B, SPAR_Z,
+                             LEARNING_RATE, X, Y, lossType='xentropy')
+    sess = tf.compat.v1.Session()
+    trainer.train(BATCH_SIZE, NUM_EPOCHS, sess, x_train, x_test,
+                  y_train, y_test, printStep=PRINT_STEP, valStep=VAL_STEP)
+
+    # Print some summary metrics
+    acc = sess.run(protoNN.accuracy, feed_dict={X: x_test, Y: y_test})
+    # W, B, Z are tensorflow graph nodes
+    W, B, Z, gamma  = protoNN.getModelMatrices()
+    matrixList = sess.run([W, B, Z])
+    gamma = sess.run(gamma)
+    sparcityList = [SPAR_W, SPAR_B, SPAR_Z]
+    nnz, size, sparse = helper.getModelSize(matrixList, sparcityList)
+    print("Final test accuracy", acc)
+    print("Model size constraint (Bytes): ", size)
+    print("Number of non-zeros: ", nnz)
+    nnz, size, sparse = helper.getModelSize(matrixList, sparcityList,
+                                            expected=False)
+    print("Actual model size: ", size)
+    print("Actual non-zeros: ", nnz)
+    print("Saving model matrices to: ", OUT_DIR)
+    np.save(OUT_DIR + '/W.npy', matrixList[0])
+    np.save(OUT_DIR + '/B.npy', matrixList[1])
+    np.save(OUT_DIR + '/Z.npy', matrixList[2])
+    np.save(OUT_DIR + '/gamma.npy', gamma)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tf2.0/requirements-cpu.txt b/tf2.0/requirements-cpu.txt
new file mode 100644
index 000000000..25fdc1788
--- /dev/null
+++ b/tf2.0/requirements-cpu.txt
@@ -0,0 +1,7 @@
+jupyter==1.0.0
+numpy==1.14.5
+pandas==0.23.4
+scikit-learn==0.19.2
+scipy==1.1.0
+tensorflow==1.10.1
+requests
\ No newline at end of file
diff --git a/tf2.0/requirements-gpu.txt b/tf2.0/requirements-gpu.txt
new file mode 100644
index 000000000..f181ccc0a
--- /dev/null
+++ b/tf2.0/requirements-gpu.txt
@@ -0,0 +1,7 @@
+jupyter==1.0.0
+numpy==1.14.5
+pandas==0.23.4
+scikit-learn==0.19.2
+scipy==1.1.0
+tensorflow-gpu==1.10.1
+requests
\ No newline at end of file
diff --git a/tf2.0/setup.py b/tf2.0/setup.py
new file mode 100644
index 000000000..dfb6fac46
--- /dev/null
+++ b/tf2.0/setup.py
@@ -0,0 +1,9 @@
+from distutils.core import setup
+
+setup(
+    name='edgeml',
+    version='0.2',
+    packages=['edgeml', ],
+    license='MIT License',
+    long_description=open('../License.txt').read(),
+)