Skip to content

Commit

Permalink
Merge pull request #4 from CrowdStrike/feature/update_project_depende…
Browse files Browse the repository at this point in the history
…ncies_and_format

Update dependencies and bump to 0.4.0
  • Loading branch information
makr11st authored Sep 15, 2023
2 parents 2fdfe16 + 7e48f48 commit a11f63e
Show file tree
Hide file tree
Showing 20 changed files with 2,225 additions and 430 deletions.
18 changes: 17 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,25 @@

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.4.0] 2023-09-12

### Changed

- Update `tensorflow` to `2.13`
- Update generated template Rust code dependencies
- Update Rust edition to 2021
- Use `once_cell` instead of `lazy_static`

## [0.3.0] 2023-07-12

### Changed

- Modify package to make it a python wheel that is buildable with poetry
- Moved to tox and pytest

## [0.2.0] 2022-11-08

### Fixed
Expand Down
149 changes: 66 additions & 83 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,50 +3,48 @@
## Special thanks to `Marian Radu` for his support during the development of the package

## Training Workflow
![](images/training_overview.png)
<br/>

![training overview](images/training_overview.png)

## General Information
![](images/conversion_mechanism.png)

<br/>
![conversion mechanism](images/conversion_mechanism.png)

A Python package that converts a TensorFlow model (.pb or .h5 format) into pure Rust code.
This package is dependent on [`tf-layers`](https://github.com/CrowdStrike/tf-layers) (Rust):

Currently, this package supports models that contain the following layers (the layers number is expected to grow in the future with the addition of further architectures):
* InputLayer - input layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputLayer
* Multiply - multiply layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Multiply
* Reshape - reshape layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape
* Conv1D - 1D convolutional layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D
* Embedding - embedding layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding
* Dense - dense layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense
* Flatten - flatten layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten
* Concatenate - concatenate layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate
* GlobalAveragePooling - global average pooling layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D
* MaxPooling - maxpooling layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D
* AveragePooling - averagepooling layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling1D
* BatchNormalization - batchnormalization layer. For further information check: https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization

* InputLayer - input layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/InputLayer>
* Multiply - multiply layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Multiply>
* Reshape - reshape layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Reshape>
* Conv1D - 1D convolutional layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D>
* Embedding - embedding layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding>
* Dense - dense layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense>
* Flatten - flatten layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten>
* Concatenate - concatenate layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate>
* GlobalAveragePooling - global average pooling layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling1D>
* MaxPooling - maxpooling layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D>
* AveragePooling - averagepooling layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling1D>
* BatchNormalization - batchnormalization layer. For further information check: <https://www.tensorflow.org/api_docs/python/tf/keras/layers/BatchNormalization>
* Add - addition layer.
* Mean - mean layer over a specified axis.
* Activation - different types of activation supported (can be used as an independet layer or inside different NN layers such as Dense, Conv1D, etc). Support available for:
* Linear(Identity)
* Relu
* ThresholdedRelu
* Selu
* Sigmoid
* Softmax
* SoftPlus
* SoftSign
* Tanh
* Activation - different types of activation supported (can be used as an independent layer or inside different NN layers such as Dense, Conv1D, etc). Support available for:
* Linear(Identity)
* Relu
* ThresholdedRelu
* Selu
* Sigmoid
* Softmax
* SoftPlus
* SoftSign
* Tanh
* `Note1`: Some layers might not have all the functionalities from TensorFlow implemented.
* `Note2`: It is mandatory to use an `InputLayer` for each input that the model expects. It is also mandatory that `InputLayer's` dtype be exactly specified (default is `float`).
* `Note2`: It is mandatory to use an `InputLayer` for each input that the model expects. It is also mandatory that `InputLayer's` `dtype` be exactly specified (default is `float`).
For instance, if an `InputLayer` is followed by an `EmbeddingLayer`, then the type of that particular `InputLayer` must be set to int - e.g. "int64".
Another requirement is to have the `output_shape` of each layer specified (the only unspecified size should be about the batch size).
This is usually done by setting the `input_shape` parameter when initializing the `InputLayer`.

<br/>

## Requirements

This project targets the Python 3.8 interpreter. You will need to install
Expand All @@ -57,90 +55,79 @@ done with the command:
brew install graphviz
```

Here are the python package requirements for this project:

```
argparse==1.4.0
nose==1.3.7
numpy==1.23.4
pydot==1.4.2
scikit-learn==1.1.3
tensorflow==2.8.3
```

To set up a virtualenv execute the following commands in the project root:
To set up a virtualenv with poetry, execute the following commands in the
project root:

```bash
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
poetry install
poetry shell
```

<br/>

## Configuration arguments

#### --path_to_tf_model
### --path_to_tf_model

The path (relative or absolute) to the TensorFlow model to be converted into pure Rust code. It is mandatory.

#### --path_to_save
### --path_to_save

The path (relative or absolute) where to save the generated Rust code. It is mandatory.

#### --model_name
### --model_name

The model name. A struct named <model_name>Model will be created in Rust. E.g model_name = Mnist => Mnist. It is mandatory.

#### --binary_classification
### --binary_classification

Set this flag to true/false whether the model is a binary classifier or not (false for regression or multiclass classifiers). Default is true.

#### --enable_inplace
### --enable_inplace

Set this flag to true/false whether you want the model written in Rust to use in-place operations whenever possible (in `predict_from_array` function). Default is true.

#### --enable_memdrop
### --enable_memdrop

Set this flag to true/false whether you want the model written in Rust to free the memory of intermediate layers results as soon as possible (instead of the actual ending of `predict_from_array` function). Default is true.

#### --path_to_fv
Set the path to a npz array containing the FV for a bunch of samples. The keys for the arrays should match the keys from perform_fx from NeuralBrain (which must be the same as the InputLayers' names when building the model). Also, the expected predictions should be saved as an array in `features.npz` by the key `predictions`. This flag is optional.
### --path_to_fv

<br/>
Set the path to a npz array containing the FV for a bunch of samples. The keys for the arrays should match the keys from perform_fx from NeuralBrain (which must be the same as the InputLayers' names when building the model). Also, the expected predictions should be saved as an array in `features.npz` by the key `predictions`. This flag is optional.

## Output Files

![](images/generated_files.png)
![generated files](images/generated_files.png)

* saved_model_from_tensorflow:
* computation_graph.json: The computational dependencies.
* model_architecture.json: Different parameters for the actual NN layers (stride, pool_size, kernel_size, activation type, etc).
* model_overview.png: A graph image describing the model.
* model_weights.npz: model's weights.
* computation_graph.json: The computational dependencies.
* model_architecture.json: Different parameters for the actual NN layers (stride, pool_size, kernel_size, activation type, etc).
* model_overview.png: A graph image describing the model.
* model_weights.npz: model's weights.
* rust_generated_code:
* build.rs: A Rust build file used in serialising the model by reading from model_weights.npz
* Cargo.toml: the place where all the imports are specified (and many more).
* build.rs: A Rust build file used in serializing the model by reading from model_weights.npz
* Cargo.toml: the place where all the imports are specified (and many more).
* rust_generated_code/model:
* model_weights.npz: model weights saved in a format that can be used by Rust.
* thresholds.json: the thresholds for `low`, `bottom`, `medium`, `high` confidence levels.
* model_weights.npz: model weights saved in a format that can be used by Rust.
* thresholds.json: the thresholds for `low`, `bottom`, `medium`, `high` confidence levels.
* rust_generated_code/src:
* model.rs: A Rust structure encapsulating all the logic behind prediction.
* lib.rs: the file containing the tests.
* model.rs: A Rust structure encapsulating all the logic behind prediction.
* lib.rs: the file containing the tests.
* rust_generated_code/testdata:
* features.npz: the features to be passed to the model (1D numpy ndarray).
* features.npz: the features to be passed to the model (1D numpy ndarray).
* rust_generated_code/benches:
* benchmarks.rs: the file in charge of benchmarks.
* benchmarks.rs: the file in charge of benchmarks.

<br/>
### In order to asses the performance of the model, run `cargo bench`

#### In order to asses the performance of the model, run `cargo bench`
#### In order to test the predictions and see the translation went as expected, run `cargo test`
#### Note: all this commands need be executed on `rust_generated_code/` directory.
### In order to test the predictions and see the translation went as expected, run `cargo test`

<br/>
### Note: all this commands need be executed on `rust_generated_code/` directory

## Usage

To convert a TensorFlow model use a command-line like the followings:

```bash
python3 main.py \
python3 -m tf2rust \
--path_to_tf_model tests/data/mnist/tf_model/ \
--path_to_save tests/data/generated_classifiers/mnist \
--model_name MNist \
Expand All @@ -162,24 +149,20 @@ model = load_model('new_model.h5', custom_objects={'tpr': None, 'tnr': None, 'au
save_model(model=model, filepath='tf_model/', include_optimizer=False)
```


## Running the tests

Until we'll properly set up tox, we'll be running nosetests, because it's easier to debug and set up.
At the time we'll migrate towards Docker-ising this, we'll also switch to tox (this should not pose any difficulties).
At the time we'll migrate towards Dockerising this, we'll also switch to tox (this should not pose any difficulties).

We have currently set up integration tests, which do the following:

* Given model artifacts, generate Rust code
* Check that the Rust code is the same code to what we expect to be generated
* Compile the Rust code and see that all tests pass
* Tests take in FVs and the DVs generated by the Tensorflow model
* We check that inference with the Rust model yields the same results as the initial Tensorflow model

Run the integration tests using the following command (note that `-s` and `--verbose` are there for debugging purposes):
* Tests take in FVs and the DVs generated by the Tensorflow model
* We check that inference with the Rust model yields the same results as the initial Tensorflow model

```bash
nosetests -s --verbose tests/test_integration.py
pytest
```

## Next steps
Expand Down
Loading

0 comments on commit a11f63e

Please sign in to comment.