diff --git a/tutorials/notebooks/mct_features_notebooks/README.md b/tutorials/notebooks/mct_features_notebooks/README.md index eb0d7bb42..f94a2626b 100644 --- a/tutorials/notebooks/mct_features_notebooks/README.md +++ b/tutorials/notebooks/mct_features_notebooks/README.md @@ -10,11 +10,12 @@ These techniques are essential for further optimizing models and achieving super
Post-Training Quantization (PTQ) - | Tutorial | Included Features | - |------------------------------|-----------------------------------------------------------------------------------------------------| - | [MobileNetV2](../imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb) | ✅ PTQ | - | [Mixed-Precision MobileNetV2](keras/example_keras_mobilenet_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | - | [Nanodet-Plus](../imx500_notebooks/keras/example_keras_nanodet_plus_for_imx500.ipynb) | ✅ PTQ | + | Tutorial | Included Features | + |--------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------| + | [Basic Post-Training Quantization (PTQ)](keras/example_keras_post-training_quantization.ipynb) | ✅ PTQ | + | [MobileNetV2](../imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb) | ✅ PTQ | + | [Mixed-Precision MobileNetV2](keras/example_keras_mobilenet_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision | + | [Nanodet-Plus](../imx500_notebooks/keras/example_keras_nanodet_plus_for_imx500.ipynb) | ✅ PTQ | | [EfficientDetLite0](../imx500_notebooks/keras/example_keras_effdet_lite0_for_imx500.ipynb) | ✅ PTQ
✅ [sony-custom-layers](https://github.com/sony/custom_layers) integration |
diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb index 7a275b22e..f558321f9 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_threshold_search.ipynb @@ -210,7 +210,7 @@ { "cell_type": "markdown", "source": [ - "This functions generates a `tf.data.Dataset` from image files in a directory." + "These functions generate a `tf.data.Dataset` from image files in a directory." ], "metadata": { "collapsed": false @@ -308,8 +308,8 @@ "## Post-Training Quantization using MCT\n", "In this step, we load the model and apply post-training quantization using two threshold error calculation methods: **\"No Clipping\"** and **MSE**.\n", "\n", - "- **\"No Clipping\"** selects the lowest power-of-two threshold that ensures no data is lost.\n", - "- **MSE** selects a power-of-two threshold that minimizes the difference between the original float distribution and the quantized distribution.\n", + "- **\"No Clipping\"** selects the lowest power-of-two threshold that ensures no data is lost (clipped).\n", + "- **MSE** selects a power-of-two threshold that minimizes the mean square error between the original float distribution and the quantized distribution.\n", "\n", "- As a result, the \"No Clipping\" method typically results in a larger threshold, as we will demonstrate later in this tutorial.\n", "\n", @@ -335,9 +335,9 @@ "]\n", "\n", "# If you are curious you can add any of the below quantization methods as well.\n", - "#QuantizationErrorMethod.MAE\n", - "#QuantizationErrorMethod.KL\n", - "#QuantizationErrorMethod.LP\n", + "# QuantizationErrorMethod.MAE\n", + "# QuantizationErrorMethod.KL\n", + "# QuantizationErrorMethod.LP\n", "\n", "# Iterate and build the QuantizationConfig objects\n", "for error_method in error_methods:\n", @@ -574,7 +574,7 @@ "for method, threshold in optimal_thresholds_project.items():\n", " plt.axvline(threshold, linestyle='--', linewidth=2, label=f'{method}: {threshold:.2f}')\n", "\n", - "plt.title('Activation Distribution with Optimal Quantization Thresholds Prohject BN layer')\n", + "plt.title('Activation Distribution with Optimal Quantization Thresholds Project BN layer')\n", "plt.xlabel('Activation Value')\n", "plt.ylabel('Frequency')\n", "plt.legend()\n", diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb index 80302e0bb..acbf5a1aa 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb @@ -194,7 +194,7 @@ { "cell_type": "markdown", "source": [ - "This functions generates a `tf.data.Dataset` from image files in a directory." + "These functions generate a `tf.data.Dataset` from image files in a directory." ], "metadata": { "collapsed": false diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb index 47ded5bc1..0ea59fa7d 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb @@ -24,12 +24,12 @@ "## Overview\n", "This tutorial demonstrates how to export a Keras model to `.keras` and TFLite formats using the Model Compression Toolkit (MCT). It covers the steps of creating a simple Keras model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to `.keras` and TFLite. The tutorial also shows how to use the exported model for inference.\n", "\n", - "## Summary:\n", + "## Summary\n", "In this tutorial, we will cover:\n", "\n", "1. Constructing a simple Keras model for demonstration purposes.\n", "2. Applying post-training quantization to the model using the Model Compression Toolkit.\n", - "3. Exporting the quantized model to the `.keras` and TFLite formats.\n", + "3. Exporting the quantized model to the `.keras` and `TFLite` formats.\n", "4. Using the exported model for inference.\n", "\n", "## Setup\n", @@ -197,7 +197,7 @@ "Note that the fakely-quantized model has the same size as the quantized exportable model, as the weights are still represented as floats.\n", "\n", "### TFLite\n", - "There are two optional tflite serializations available for export: INT8 and FAKELY_QUANT.\n", + "There are two optional tflite serializations available for export: `INT8` and `FAKELY_QUANT`.\n", "\n", "#### INT8 TFLite\n", "\n", @@ -254,10 +254,9 @@ { "cell_type": "markdown", "source": [ - "\n", "#### Fakely-Quantized TFLite\n", "\n", - "The model will be exported as a tflite model where weights and activations are quantized but represented with a " + "The model will be exported as a tflite model where weights and activations are quantized but represented with a float data type." ], "metadata": { "id": "9eVDoIHiGX5-" diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb index 0aebad55b..dba3eb617 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb @@ -184,7 +184,7 @@ { "cell_type": "markdown", "source": [ - "This functions generates a `tf.data.Dataset` from image files in a directory." + "These functions generate a `tf.data.Dataset` from image files in a directory." ], "metadata": { "collapsed": false diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb index 333719c59..6eeec058d 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb @@ -176,7 +176,7 @@ { "cell_type": "markdown", "source": [ - "This functions generates a `tf.data.Dataset` from image files in a directory." + "These functions generate a `tf.data.Dataset` from image files in a directory." ], "metadata": { "collapsed": false diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb index 3b7ef643b..f576d7ee2 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_post-training_quantization.ipynb @@ -171,7 +171,7 @@ { "cell_type": "markdown", "source": [ - "This functions generates a `tf.data.Dataset` from image files in a directory." + "These functions generate a `tf.data.Dataset` from image files in a directory." ], "metadata": { "collapsed": false @@ -405,7 +405,7 @@ "\n", "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n", "\n", - "MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n", + "MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper](https://arxiv.org/abs/2109.09113).\n", "\n", "## Copyrights\n", "\n", diff --git a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb index 5f0dc1100..71e2d84dc 100644 --- a/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb @@ -23,14 +23,14 @@ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_pruning_mnist.ipynb)\n", "\n", "## Overview\n", - "This tutorial provides a step-by-step guide to training, pruning, and retraining a fully connected neural network model using Keras. We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", + "This tutorial provides a step-by-step guide to training, pruning, and finetuning a Keras fully connected neural network model using the Model Compression Toolkit (MCT). We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", "\n", "## Summary\n", "In this tutorial, we will cover:\n", "\n", "1. **Training a Keras model on MNIST:** We'll begin by constructing a basic fully connected neural network and training it on the MNIST dataset. \n", "2. **Applying structured pruning:** We'll introduce a pruning technique to reduce model size while maintaining performance. \n", - "3. **Retraining the pruned model:** After pruning, we'll retrain the model to recover any lost accuracy. \n", + "3. **Finetuning the pruned model:** After pruning, we'll finetune the model to recover any lost accuracy. \n", "4. **Evaluating the pruned model:** We'll evaluate the pruned model’s performance and compare it to the original model.\n", "\n", "## Setup\n", @@ -313,7 +313,7 @@ "### Model Pruning\n", "We are now ready to perform the actual pruning using MCT’s `keras_pruning_experimental` function. The model will be pruned based on the defined resource utilization constraints and the previously generated representative dataset.\n", "\n", - "Each channel’s importance is measured using the LFH (Label-Free-Hessian) method, which approximates the Hessian of the loss function with respect to the model’s weights.\n", + "Each channel’s importance is measured using the [LFH (Label-Free-Hessian) method](https://arxiv.org/abs/2309.11531), which approximates the Hessian of the loss function with respect to the model’s weights.\n", "\n", "For efficiency, we use a single score approximation. Although less precise, it significantly reduces processing time compared to multiple approximations, which offer better accuracy but at the cost of longer runtimes.\n", "\n", @@ -369,8 +369,8 @@ { "cell_type": "markdown", "source": [ - "## Retraining the Pruned Model\n", - "After pruning, it’s common to see a temporary drop in model accuracy due to the reduction in model complexity. Let’s demonstrate this by evaluating the pruned model and observing its initial performance before retraining." + "## Finetuning the Pruned Model\n", + "After pruning, it’s common to see a temporary drop in model accuracy due to the reduction in model complexity. Let’s demonstrate this by evaluating the pruned model and observing its initial performance before finetuning." ], "metadata": { "id": "pAheQ9SGxB13" @@ -395,7 +395,7 @@ { "cell_type": "markdown", "source": [ - "However, to restore the model's performance, we retrain the pruned model, allowing it to adapt to its new, compressed architecture. Through this retraining process, the model can often recover its original accuracy, and in some cases, even surpass it." + "To restore the model's performance, we finetune the pruned model, allowing it to adapt to its new, compressed architecture. Through this finetuning process, the model can often recover its original accuracy, and in some cases, even surpass it." ], "metadata": { "id": "IHORL34t17bA" @@ -417,7 +417,7 @@ "cell_type": "markdown", "source": [ "## Conclusion\n", - "In this tutorial, we explored the process of structured model pruning using MCT to optimize a dense neural network. We demonstrated how to define resource constraints, apply pruning based on channel importance, and evaluate the impact on model architecture and performance. Finally, we showed how retraining can recover the pruned model’s accuracy. This approach highlights the effectiveness of structured pruning for reducing model size while maintaining performance, making it a powerful tool for model optimization" + "In this tutorial, we explored the process of structured model pruning using MCT to optimize a dense neural network. We demonstrated how to define resource constraints, apply pruning based on channel importance, and evaluate the impact on model architecture and performance. Finally, we showed how finetuning can recover the pruned model’s accuracy. This approach highlights the effectiveness of structured pruning for reducing model size while maintaining performance, making it a powerful tool for model optimization." ], "metadata": { "collapsed": false diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb index 3ed4c4e80..82525cd05 100644 --- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb +++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb @@ -8,14 +8,14 @@ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb)\n", "\n", "## Overview\n", - "This tutorial provides a step-by-step guide to training, pruning, and retraining a fully connected neural network model using PyTorch. We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", + "This tutorial provides a step-by-step guide to training, pruning, and finetuning a PyTorch fully connected neural network model using the Model Compression Toolkit (MCT). We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n", "\n", "## Summary\n", "In this tutorial, we will cover:\n", "\n", "1. **Training a PyTorch model on MNIST:** We'll begin by constructing a basic fully connected neural network and training it on the MNIST dataset. \n", "2. **Applying structured pruning:** We'll introduce a pruning technique to reduce model size while maintaining performance. \n", - "3. **Retraining the pruned model:** After pruning, we'll retrain the model to recover any lost accuracy. \n", + "3. **Finetuning the pruned model:** After pruning, we'll finetune the model to recover any lost accuracy. \n", "4. **Evaluating the pruned model:** We'll evaluate the pruned model’s performance and compare it to the original model.\n", "\n", "## Setup\n", @@ -304,8 +304,16 @@ { "cell_type": "markdown", "source": [ - "## Pruning the Model\n", - "Next,we'll proceed with pruning our trained model to decrease its size, targeting a 50% reduction in the memory footprint of the model's weights. Given that the model's weights utilize the float32 data type, where each parameter occupies 4 bytes, we calculate the memory requirement by multiplying the total number of parameters by 4." + "## Model Pruning\n", + "We are now ready to perform the actual pruning using MCT’s `pytorch_pruning_experimental` function. The model will be pruned based on the defined resource utilization constraints and the previously generated representative dataset.\n", + "\n", + "Each channel’s importance is measured using the [LFH (Label-Free-Hessian) method](https://arxiv.org/abs/2309.11531), which approximates the Hessian of the loss function with respect to the model’s weights.\n", + "\n", + "For efficiency, we use a single score approximation. Although less precise, it significantly reduces processing time compared to multiple approximations, which offer better accuracy but at the cost of longer runtimes.\n", + "\n", + "MCT’s structured pruning will target the first two dense layers, where output channel reduction can be propagated to subsequent layers by adjusting their input channels accordingly.\n", + "\n", + "The output is a pruned model along with pruning information, including layer-specific pruning masks and scores." ], "metadata": { "collapsed": false @@ -355,8 +363,8 @@ "outputs": [], "source": [ "pruned_model_nparams = display_model_params(pruned_model)\n", - "acc_before_retrain = test(pruned_model, device, test_loader)\n", - "print(f'Pruned model accuracy before retraining {acc_before_retrain}%')" + "acc_before_finetuning = test(pruned_model, device, test_loader)\n", + "print(f'Pruned model accuracy before finetuning {acc_before_finetuning}%')" ], "metadata": { "collapsed": false @@ -366,8 +374,8 @@ { "cell_type": "markdown", "source": [ - "## Retraining the Pruned Model\n", - "After pruning, we often need to retrain the model to recover any lost performance." + "## Finetuning the Pruned Model\n", + "After pruning, we often need to finetune the model to recover any lost performance." ], "metadata": { "collapsed": false @@ -415,7 +423,7 @@ "cell_type": "markdown", "source": [ "## Conclusions\n", - "In this tutorial, we demonstrated the process of training, pruning, and retraining a neural network model using the Model Compression Toolkit (MCT). We began by setting up our environment and loading the dataset, followed by building and training a fully connected neural network. We then introduced the concept of model pruning, specifically targeting the first two dense layers to efficiently reduce the model's memory footprint by 50%. After applying structured pruning, we evaluated the pruned model's performance and concluded the tutorial by fine-tuning the pruned model to recover any lost accuracy due to the pruning process. This tutorial provided a hands-on approach to model optimization through pruning, showcasing the balance between model size, performance, and efficiency.\n", + "In this tutorial, we demonstrated the process of training, pruning, and finetuning a neural network model using MCT. We began by setting up our environment and loading the dataset, followed by building and training a fully connected neural network. We then introduced the concept of model pruning, specifically targeting the first two dense layers to efficiently reduce the model's memory footprint by 50%. After applying structured pruning, we evaluated the pruned model's performance and concluded the tutorial by fine-tuning the pruned model to recover any lost accuracy due to the pruning process. This tutorial provided a hands-on approach to model optimization through pruning, showcasing the balance between model size, performance, and efficiency.\n", "\n", "## Copyrights\n", "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",