updated README.md

harsha-simhadri · harsha-simhadri · commit 7f906035f2af · 2019-10-19T21:13:03.000-07:00
diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ offline.
 A tool that adapts models trained by above algorithms to be inferred by fixed point arithmetic.
  - **SeeDot**: Floating-point to fixed-point quantization tool.
 
-Applications demonstrating usecases of these algorithms.
+Applications demonstrating usecases of these algorithms, such as [GesturePod](/docs/publications).
 
 ### Organization
  - The `tf` directory contains the `edgeml_tf` package which specifies these architectures in TensorFlow,
@@ -41,16 +41,18 @@ Please see install/run instructions in the README pages within these directories
 
 ### Details and project pages
 For details, please see our
- [project page](https://microsoft.github.io/EdgeML/) and
- [Microsoft Research page](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/).
-our ICML'17 publications on [Bonsai](docs/publications/Bonsai.pdf) and
-[ProtoNN](docs/publications/ProtoNN.pdf) algorithms, 
-NeurIPS'18 publications on [EMI-RNN](docs/publications/emi-rnn-nips18.pdf) and
-[FastGRNN](docs/publications/FastGRNN.pdf),
-and PLDI'19 publication on [SeeDot](docs/publications/SeeDot.pdf).
-
-
-Checkout the [ELL](https://github.com/Microsoft/ELL) project which can
+ [project page](https://microsoft.github.io/EdgeML/), 
+ [Microsoft Research page](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/),
+the ICML'17 publications on [Bonsai](/docs/publications/Bonsai.pdf) and
+[ProtoNN](/docs/publications/ProtoNN.pdf) algorithms, 
+the NeurIPS'18 publications on [EMI-RNN](/docs/publications/emi-rnn-nips18.pdf) and
+[FastGRNN](/docs/publications/FastGRNN.pdf),
+the PLDI'19 publication on [SeeDot compiler](/docs/publications/SeeDot.pdf),
+the UIST'19 publication on [Gesturepod](/docs/publications/ICane-UIST19.pdf),
+and the NeurIPS'19 publication on [S-RNN](/docs/publications/SRNN.pdf).
+
+
+Also checkout the [ELL](https://github.com/Microsoft/ELL) project which can
 provide optimized binaries for some of the ONNX models trained by this library.
 
 ### Contributors:
@@ -75,7 +77,8 @@ If you use software from this library in your work, please use the BibTex entry
 ```
 @software{edgeml01,
    author = {{Dennis, Don Kurian and Gaurkar, Yash and Gopinath, Sridhar and Gupta, Chirag and
-      Kumar, Ashish and Kusupati, Aditya and Lovett, Chris and Patil, Shishir G and Simhadri, Harsha Vardhan}},
+              Jain, Moksh and Kumar, Ashish and Kusupati, Aditya and Lovett, Chris 
+              and Patil, Shishir G and Simhadri, Harsha Vardhan}},
    title = {{EdgeML: Machine Learning for resource-constrained edge devices}},
    url = {https://github.com/Microsoft/EdgeML},
    version = {0.2},
diff --git a/examples/pytorch/Bonsai/README.md b/examples/pytorch/Bonsai/README.md
@@ -7,7 +7,8 @@ use-case on the USPS10 public dataset.
 `edgeml_pytorch.graph.bonsai` implements the Bonsai prediction graph in pytorch.
 The three-phase training routine for Bonsai is decoupled from the forward graph
 to facilitate a plug and play behaviour wherein Bonsai can be combined with or
-used as a final layer classifier for other architectures (RNNs, CNNs).
+used as a final layer classifier for other architectures (RNNs, CNNs). 
+See `edgeml_pytorch.trainer.bonsaiTrainer` for 3-phase training.
 
 Note that `bonsai_example.py` assumes that data is in a specific format.  It is
 assumed that train and test data is contained in two files, `train.npy` and
diff --git a/examples/pytorch/FastCells/README.md b/examples/pytorch/FastCells/README.md
@@ -1,86 +1,90 @@
 # EdgeML FastCells on a sample public dataset
 
-This directory includes example notebook and general execution script of
-FastCells (FastRNN & FastGRNN) developed as part of EdgeML along with modified
+This directory includes example notebooks and scripts of
+FastCells (FastRNN & FastGRNN) along with modified
 UGRNN, GRU and LSTM to support the LSQ training routine. 
-Also, we include a sample cleanup and use-case on the USPS10 public dataset.
-
-`edgeml_pytorch.graph.rnn` implements the custom RNN cells of **FastRNN** ([`FastRNNCell`](../../pytorch_edgeml/graph/rnn.py#L226)) and **FastGRNN** ([`FastGRNNCell`](../../pytorch_edgeml/graph/rnn.py#L80)) with
-multiple additional features like Low-Rank parameterisation, custom
-non-linearities etc., Similar to Bonsai and ProtoNN, the three-phase training
-routine for FastRNN and FastGRNN is decoupled from the custom cells to
-facilitate a plug and play behaviour of the custom RNN cells in other
-architectures (NMT, Encoder-Decoder etc.,) in place of the inbuilt `RNNCell`, `GRUCell`, `BasicLSTMCell` etc., 
-`edgeml_pytorch.graph.rnn` also contains modified RNN cells of **UGRNN** ([`UGRNNLRCell`](../../pytorch_edgeml/graph/rnn.py#L742)), 
-**GRU** ([`GRULRCell`](../../edgeml/graph/rnn.py#L565)) and **LSTM** ([`LSTMLRCell`](../../pytorch_edgeml/graph/rnn.py#L369)). These cells also can be substituted for FastCells where ever feasible. 
-
-`edgeml_pytorch.graph.rnn` also contains fully wrapped RNNs which are equivalent to `nn.LSTM` and `nn.GRU`. Implemented cells:
-**FastRNN** ([`FastRNN`](../../pytorch_edgeml/graph/rnn.py#L968)), **FastGRNN** ([`FastGRNN`](../../pytorch_edgeml/graph/rnn.py#L993)), **UGRNN** ([`UGRNN`](../../edgeml_pytorch/graph/rnn.py#L945)), **GRU** ([`GRU`](../../edgeml/graph/rnn.py#L922)) and **LSTM** ([`LSTM`](../../pytorch_edgeml/graph/rnn.py#L899)).
-
-Note that all the cells and wrappers (when used independently from `fastcell_example.py` or `edgeml_pytorch.trainer.fastTrainer`) take in data in a batch first format ie., [batchSize, timeSteps, inputDims] by default but it can also support [timeSteps, batchSize, inputDims] format by setting `batch_first` argument to False when used. `fast_example.py` automatically takes care it while assuming the standard format between tf, c++ and pytorch.
+There is also a sample cleanup and train/test script for the USPS10 public dataset.
+
+[`edgeml_pytorch.graph.rnn`](../../../pytorch/pytorch_edgeml/graph/rnn.py) 
+provides two RNN cells **FastRNNCell**  and **FastGRNNCell** with additional
+features like low-rank parameterisation and custom non-linearities. Akin to
+Bonsai and ProtoNN, the three-phase training routine for FastRNN and FastGRNN
+is decoupled from the custom cells to facilitate a plug and play behaviour of 
+the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.).
+Additionally, numerically  equivalent CUDA-based implementations FastRNNCuda
+and FastGRNNCuda are provided for faster training. 
+`edgeml_pytorch.graph.rnn` also contains modified RNN cells of **UGRNNCell**, 
+**GRUCell**, and **LSTMCell**, which can be substituted for Fast(G)RNN,
+as well as untrolled RNNs which are equivalent to `nn.LSTM` and `nn.GRU`. 
+
+Note that all the cells and wrappers, when used independently from `fastcell_example.py`
+or `edgeml_pytorch.trainer.fastTrainer`, take in data in a batch first format, i.e.,
+[batchSize, timeSteps, inputDims] by default, but can also support [timeSteps,
+batchSize, inputDims] format if  `batch_first` argument is set to False.
+`fast_example.py` automatically adjusts to the correct format across tf, c++ and pytorch.
 
 For training FastCells, `edgeml_pytorch.trainer.fastTrainer` implements the three-phase
-FastCell training routine in PyTorch. A simple example,
-`examples/fastcell_example.py` is provided to illustrate its usage.
-
-Note that `fastcell_example.py` assumes that data is in a specific format.  It
-is assumed that train and test data is contained in two files, `train.npy` and
-`test.npy`. Each containing a 2D numpy array of dimension `[numberOfExamples,
+FastCell training routine in PyTorch. A simple example `fastcell_example.py` is provided
+to illustrate its usage. Note that `fastcell_example.py` assumes that data is in a specific format.
+It is assumed that train and test data is contained in two files, `train.npy` and
+`test.npy`, each containing a 2D numpy array of dimension `[numberOfExamples,
 numberOfFeatures]`. numberOfFeatures is `timesteps x inputDims`, flattened
-across timestep dimension. So the input of 1st timestep followed by second and
-so on.  For an N-Class problem, we assume the labels are integers from 0
+across timestep dimension with the input of the first time step followed by the second
+and so on.  For an N-Class problem, we assume the labels are integers from 0
 through N-1. Lastly, the training data, `train.npy`, is assumed to well shuffled 
 as the training routine doesn't shuffle internally.
 
 **Tested With:** PyTorch = 1.1 with Python 3.6
 
 ## Download and clean up sample dataset
 
-We will be testing out the validation of the code by using the USPS dataset.
-The download and cleanup of the dataset to match the above-mentioned format is
-done by the script [fetch_usps.py](fetch_usps.py) and
+To validate the code with USPS dataset, first download and format the dataset to match
+the required format using the script [fetch_usps.py](fetch_usps.py) and
 [process_usps.py](process_usps.py)
 
 ```
 python fetch_usps.py
 python process_usps.py
 ```
 
+Note: Even though usps10 is not a time-series dataset, it can be regarding as a time-series
+dataset where time step sees a new row. So the number of timesteps = 16 and inputDims = 16.
 
 ## Sample command for FastCells on USPS10
-The following sample run on usps10 should validate your library:
-
-Note: Even though usps10 is not a time-series dataset, it can be assumed as, a time-series where each row is coming in at one single time.
-So the number of timesteps = 16 and inputDims = 16
+The following is a sample run on usps10 :
 
 ```bash
 python fastcell_example.py -dir usps10/ -id 16 -hd 32
 ```
-This command should give you a final output screen which reads roughly similar to (might not be exact numbers due to various version mismatches):
+This command should give you a final output that reads roughly similar to
+(might not be exact numbers due to various version mismatches):
 
 ```
 Maximum Test accuracy at compressed model size(including early stopping): 0.9407075 at Epoch: 262
 Final Test Accuracy: 0.93721974
 
 Non-Zeros: 1932 Model Size: 7.546875 KB hasSparse: False
 ```
-`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or `FastGRNNResults.txt` depending on the choice of the RNN cell.
-A directory `FastRNNResults` or `FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset.
+`usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or 
+`FastGRNNResults.txt` depending on the choice of the RNN cell. A directory `FastRNNResults` or
+`FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset.
 
-Note that the scalars like `alpha`, `beta`, `zeta` and `nu` are all before the application of the sigmoid function over them.
+Note that the scalars like `alpha`, `beta`, `zeta` and `nu` correspond to the values before
+the application of the sigmoid function.
 
 ## Byte Quantization(Q) for model compression
-If you wish to quantize the generated model to use byte quantized integers use `quantizeFastModels.py`. Usage Instructions:
+If you wish to quantize the generated model, use `quantizeFastModels.py`. Usage Instructions:
 
 ```
 python quantizeFastModels.py -h
 ```
 
-This will generate quantized models with a suffix of `q` before every param stored in a new directory `QuantizedFastModel` inside the model directory.
-One can use this model further on edge devices. 
+This will generate quantized models with a suffix of `q` before every param stored in a
+new directory `QuantizedFastModel` inside the model directory.
 
-Note that the scalars like `qalpha`, `qbeta`, `qzeta` and `qnu` are all after the application of the sigmoid function over them and quantization, they can be directly plugged into the inference pipleines.
+Note that the scalars like `qalpha`, `qbeta`, `qzeta` and `qnu` correspond to values 
+after the application of the sigmoid function over them post quantization;
+they can be directly plugged into the inference pipleines.
 
 Copyright (c) Microsoft Corporation. All rights reserved. 
-
 Licensed under the MIT license.
diff --git a/pytorch/README.md b/pytorch/README.md
@@ -1,24 +1,39 @@
 ## Edge Machine Learning: Pytorch Library 
 
-This directory includes PyTorch implementations of various techniques and
-algorithms developed as part of EdgeML. Currently, the following algorithms are
-available in Tensorflow:
-
-1. [Bonsai](/docs/publications/Bonsai.pdf)
-2. S-RNN
-3. [FastRNN & FastGRNN](/docs/publications/FastGRNN.pdf)
-4. [ProtoNN](/docs/publications/ProtoNN.pdf)
-
-The PyTorch graphs for these algoriths are packaged as `edgeml_pytorch.graph`.
-Trainers for these algorithms are in `edgeml_pytorch.trainer`. 
-Usage directions and examples for these algorithms are provided in 
-`$EDGEML_ROOT/examples/pytorch` directory. To get started with any 
-of the provided algorithms, please follow the notebooks in the the 
-`examples/pytorch` directory.
+This package includes PyTorch implementations of following algorithms and training
+techniques developed as part of EdgeML. The PyTorch graphs for the forward/backward
+pass of these algorithms are packaged as `edgeml_pytorch.graph` and the trainers
+for these algorithms are in `edgeml_pytorch.trainer`. 
 
-## Installation
+1. [Bonsai](/docs/publications/Bonsai.pdf): `edgeml_pytorch.graph.bonsai` implements
+   the Bonsai prediction graph. The three-phase training routine for Bonsai is decoupled
+   from the forward graph to facilitate a plug and play behaviour wherein Bonsai can be
+   combined with or used as a final layer classifier for other architectures (RNNs, CNNs).
+   See `edgeml_pytorch.trainer.bonsaiTrainer` for 3-phase training.  
+2. [ProtoNN](/docs/publications/ProtoNN.pdf): `edgeml_pytorch.graph.protoNN` implements the
+   ProtoNN prediction functions. The training routine for ProtoNN is decoupled from the forward
+   graph to facilitate a plug and play behaviour wherein ProtoNN can be combined with or used
+   as a final layer classifier for other architectures (RNNs, CNNs). The training routine is
+   implemented in `edgeml_pytorch.trainer.protoNNTrainer`.
+3. [FastRNN & FastGRNN](/docs/publications/FastGRNN.pdf): `edgeml_pytorch.graph.rnn` provides
+    various RNN cells --- including new cells `FastRNNCell` and `FastGRNNCell` as well as 
+    `UGRNNCell`, `GRUCell`, and `LSTMCell` --- with features like low-rank parameterisation
+    of weight matrices and custom non-linearities. Akin to Bonsai and ProtoNN, the three-phase
+    training routine for FastRNN and FastGRNN is decoupled from the custom cells to enable plug and
+    play behaviour of the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.).
+    Additionally, numerically equivalent CUDA-based implementations `FastRNNCUDACell` and 
+    `FastGRNNCUDACell` are provided for faster training. `edgeml_pytorch.graph.rnn`.
+    `edgeml_pytorch.graph.rnn.Fast(G)RNN(CUDA)` provides unrolled RNNs equivalent to `nn.LSTM` and `nn.GRU`.
+    `edgeml_pytorch.trainer.fastmodel` presents a sample multi-layer RNN + multi-class classifier model.
+4. [S-RNN](/docs/publications/SRNN.pdf): `edgeml_pytorch.graph.rnn.SRNN2` implements a 
+    2 layer SRNN network which can be instantied with a choice of RNN cell. The training
+    routine for SRNN is in `edgeml_pytorch.trainer.srnnTrainer`.
+
+Usage directions and examples notebooks for this package are provided [here](/examples/pytorch).
 
 
+## Installation
+
 It is highly recommended that EdgeML be installed in a virtual environment. 
 Please create a new virtual environment using your environment manager
  ([virtualenv](https://virtualenv.pypa.io/en/stable/userguide/#usage) or
diff --git a/pytorch/edgeml_pytorch/trainer/fastmodel.py b/pytorch/edgeml_pytorch/trainer/fastmodel.py
diff --git a/tf/README.md b/tf/README.md