diff --git a/tutorials/notebooks/mct_features_notebooks/README.md b/tutorials/notebooks/mct_features_notebooks/README.md
index 519ed8b1b..062af4eef 100644
--- a/tutorials/notebooks/mct_features_notebooks/README.md
+++ b/tutorials/notebooks/mct_features_notebooks/README.md
@@ -72,15 +72,11 @@ These techniques are essential for further optimizing models and achieving super
Post-Training Quantization (PTQ)
- | Tutorial | Included Features |
- |---------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
- | [Training & Quantizing Model on MNIST](pytorch/example_pytorch_ptq_mnist.ipynb) | ✅ PTQ |
- | [Mixed-Precision MobileNetV2 on Cifar100](pytorch/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb) | ✅ PTQ
✅ Mixed-Precision |
- | [SSDLite MobileNetV3 Quantization](pytorch/example_pytorch_ssdlite_mobilenetv3_object_detection.ipynb) | ✅ PTQ |
-
-
-
-
+ | Tutorial | Included Features |
+ |-----------------------------------------------------------------------------------------------------------|---------------------------------------------|
+ | [Basic Post-Training Quantization (PTQ)](pytorch/example_pytorch_post_training_quantization.ipynb) | ✅ PTQ |
+ | [Mixed-Precision Post-Training Quantization](pytorch/example_pytorch_mixed_precision_ptq.ipynb) | ✅ PTQ
✅ Mixed-Precision |
+ | [Advanced Gradient-Based Post-Training Quantization (GPTQ)](pytorch/example_pytorch_mobilenet_gptq.ipynb) | ✅ GPTQ |
@@ -97,9 +93,9 @@ These techniques are essential for further optimizing models and achieving super
Data Generation
- | Tutorial | Included Features |
- |-----------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|
- | [Data-Free Quantization using Data Generation](pytorch/example_pytorch_data_generation.ipynb) | ✅ PTQ
✅ Data-Free Quantization
✅ Data Generation |
+ | Tutorial | Included Features |
+ |-----------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|
+ | [Zero-Shot Quantization (ZSQ) using Data Generation](pytorch/example_pytorch_data_generation.ipynb) | ✅ PTQ
✅ ZSQ
✅ Data-Free Quantization
✅ Data Generation |
@@ -112,3 +108,11 @@ These techniques are essential for further optimizing models and achieving super
| [Exporter Usage](pytorch/example_pytorch_export.ipynb) | ✅ Export |
+
+ Quantization Troubleshooting
+
+ | Tutorial | Included Features |
+ |------------------------------------------------------------------------------------------------|-------------------|
+ | [Quantization Troubleshooting using the Xquant Feature](pytorch/example_pytorch_xquant.ipynb) | ✅ Debug |
+
+
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb
index 0cae470e0..6ffc2cc6d 100644
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb
@@ -3,58 +3,34 @@
{
"cell_type": "markdown",
"source": [
- "# Data Generation Tutorial: Data-Free Quantization with the Model Compression Toolkit"
- ],
- "metadata": {
- "collapsed": false
- },
- "id": "74a56f1fe3c17fcf"
- },
- {
- "cell_type": "markdown",
- "source": [
- "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb)"
- ],
- "metadata": {
- "collapsed": false
- },
- "id": "547eb47b9afe4dc0"
- },
- {
- "cell_type": "markdown",
- "source": [
+ "# Data Generation Tutorial: Data-Free (Zero-Shot) Quantization in Pytorch with the Model Compression Toolkit (MCT)\n",
+ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb)\n",
+ "\n",
+ "## Overview\n",
"In this tutorial, we will explore how to generate synthetic images using the Model Compression Toolkit (MCT) and the Data Generation Library. These generated images are based on the statistics stored in the model's batch normalization layers and can be usefull for various compression tasks, such as quantization and pruning. We will use the generated images as a representative dataset to quantize our model to 8-bit using MCT's Post Training Quantization (PTQ).\n",
"\n",
+ "## Summary\n",
"We will cover the following steps:\n",
"1. **Setup** Install and import necessary libraries and load a pre-trained model.\n",
"2. **Configuration**: Define the data generation configuration.\n",
"3. **Data Generation**: Generate synthetic images.\n",
"4. **Visualization**: Visualize the generated images.\n",
- "5. **Quantization**: Quantize our model to 8-bit using PTQ with the generated images as a representative dataset. This is called **\"Data-Free Quantization\"** since no real data is used in the quantization process."
- ],
- "metadata": {
- "collapsed": false
- },
- "id": "25cb505c44118f02"
- },
- {
- "cell_type": "markdown",
- "source": [
+ "5. **Quantization**: Quantize our model to 8-bit using PTQ with the generated images as a representative dataset. This is called **\"Data-Free Quantization\"** since no real data is used in the quantization process.\n",
+ "\n",
"## Step 1: Setup\n",
"Install the necessary packages:"
],
"metadata": {
"collapsed": false
},
- "id": "ce2d053e4b52db07"
+ "id": "74a56f1fe3c17fcf"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
- "!pip install -q torch torchvision\n",
- "!pip install -q model-compression-toolkit"
+ "!pip install -q torch torchvision"
],
"metadata": {
"collapsed": false
@@ -62,14 +38,18 @@
"id": "941089a3a8cbdf3b"
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
"source": [
- "Imports:"
+ "import importlib\n",
+ "if not importlib.util.find_spec('model_compression_toolkit'):\n",
+ " !pip install model_compression_toolkit"
],
"metadata": {
"collapsed": false
},
- "id": "58d031b0b282dd59"
+ "id": "a0d8806b6aa0630a"
},
{
"cell_type": "code",
@@ -78,7 +58,8 @@
"source": [
"import torch\n",
"from torchvision.models import resnet18, ResNet18_Weights\n",
- "import model_compression_toolkit as mct\n",
+ "from torchvision.datasets import ImageNet\n",
+ "from torch.utils.data import DataLoader\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np"
],
@@ -103,7 +84,8 @@
"outputs": [],
"source": [
"# Load a pre-trained model (e.g., ResNet18)\n",
- "model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)"
+ "weights = ResNet18_Weights.DEFAULT\n",
+ "float_model = resnet18(weights=weights)"
],
"metadata": {
"collapsed": false
@@ -114,7 +96,7 @@
"cell_type": "markdown",
"source": [
"## Step 2: Define a Data Generation Configuration\n",
- "Next, we need to specify the configuration for data generation using 'get_pytorch_data_generation_config'. This configuration includes parameters such as the number of iterations, optimizer, batch size, and more. Customize these parameters according to your needs."
+ "Next, we need to specify the configuration for data generation using `get_pytorch_data_generation_config`. This configuration includes parameters such as the number of iterations, optimizer, batch size, and more. Customize these parameters according to your needs."
],
"metadata": {
"collapsed": false
@@ -126,10 +108,12 @@
"execution_count": null,
"outputs": [],
"source": [
+ "import model_compression_toolkit as mct\n",
+ "\n",
"data_gen_config = mct.data_generation.get_pytorch_data_generation_config(\n",
" n_iter=500, # Number of iterations\n",
" optimizer=torch.optim.RAdam, # Optimizer\n",
- " data_gen_batch_size=32, # Batch size for data generation\n",
+ " data_gen_batch_size=128, # Batch size for data generation\n",
" initial_lr=16, # Initial learning rate\n",
" output_loss_multiplier=1e-6, # Multiplier for output loss\n",
" extra_pixels=32, \n",
@@ -146,7 +130,7 @@
"source": [
"## Step 3: Generate Synthetic Images\n",
"\n",
- "Now, let's generate synthetic images using the 'pytorch_data_generation_experimental' function. Specify the number of images you want to generate and the output image size."
+ "Now, let's generate synthetic images using the `pytorch_data_generation_experimental` function. Specify the number of images you want to generate and the output image size."
],
"metadata": {
"collapsed": false
@@ -162,7 +146,7 @@
"output_image_size = 224 # Size of output images\n",
"\n",
"generated_images = mct.data_generation.pytorch_data_generation_experimental(\n",
- " model=model,\n",
+ " model=float_model,\n",
" n_images=n_images,\n",
" output_image_size=output_image_size,\n",
" data_generation_config=data_gen_config\n",
@@ -177,7 +161,7 @@
"cell_type": "markdown",
"source": [
"## Step 4: Visualization\n",
- "Lets begin by defining some functions to display the generated images in a grid:"
+ "Lets define a function to display the generated images:"
],
"metadata": {
"collapsed": false
@@ -189,41 +173,13 @@
"execution_count": null,
"outputs": [],
"source": [
- "def plot_image_grid(images, reverse_preprocess=False, titles=[], ncols=None, cmap='gray', mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):\n",
- " images = [plot_image(img, reverse_preprocess, mean, std, plot_img=False) for img in images]\n",
- " if len(titles) < len(images):\n",
- " titles += ['_' for _ in range(len(images) - len(titles))]\n",
- " '''Plot a grid of images'''\n",
- " if not ncols:\n",
- " factors = [i for i in range(1, len(images)+1) if len(images) % i == 0]\n",
- " ncols = factors[len(factors) // 2] if len(factors) else len(images) // 4 + 1\n",
- " nrows = int(len(images) / ncols) + int(len(images) % ncols)\n",
- " imgs = [images[i] if len(images) > i else None for i in range(nrows * ncols)]\n",
- " f, axes = plt.subplots(nrows, ncols, figsize=(3*ncols, 2*nrows))\n",
- " axes = axes.flatten()[:len(imgs)]\n",
- " for img, ax, t in zip(imgs, axes.flatten(), titles):\n",
- " if np.any(img):\n",
- " if len(img.shape) > 2 and img.shape[2] == 1:\n",
- " img = img.squeeze()\n",
- " ax.imshow(img, cmap=cmap)\n",
- " ax.title.set_text(t)\n",
- " plt.show()\n",
- "\n",
- "\n",
- "def plot_image(image, reverse_preprocess=False, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], plot_img=True):\n",
- " image = image.detach().cpu().numpy()\n",
- " if len(image.shape) == 4:\n",
- " image = image[0, :, :, :]\n",
- " if image.shape[0] == 3:\n",
- " image = image.transpose(1, 2, 0)\n",
+ "def plot_image(image, reverse_preprocess=False, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):\n",
+ " image = image.detach().cpu().numpy()[0]\n",
+ " image = image.transpose(1, 2, 0)\n",
" if reverse_preprocess:\n",
" new_image = np.round(((image.astype(np.float32) * std) + mean) * 255).astype(np.uint8)\n",
- " else:\n",
- " new_image = image\n",
- " if plot_img:\n",
- " plt.imshow(new_image)\n",
- " plt.show()\n",
- " return new_image"
+ " plt.imshow(new_image)\n",
+ " plt.show()"
],
"metadata": {
"collapsed": false
@@ -233,7 +189,7 @@
{
"cell_type": "markdown",
"source": [
- "Now, lets visualize our generated images:"
+ "Now, let's visualize the generated images by selecting an image index to plot. You can modify the index values to experiment with different images."
],
"metadata": {
"collapsed": false
@@ -245,206 +201,181 @@
"execution_count": null,
"outputs": [],
"source": [
- "plot_image_grid(generated_images[69:71], True)"
+ "img_index_to_plot = 0\n",
+ "plot_image(generated_images[img_index_to_plot],True)"
],
"metadata": {
"collapsed": false
},
- "id": "6a7faa41b7481573"
+ "id": "e7da0f42acc69e20"
},
{
"cell_type": "markdown",
"source": [
"## Step 5: Post Training Quantization\n",
- "In order to evaulate our generated images, we will use them to quantize the model using MCT's PTQ.This is called **\"Data-Free Quantization\"** because no real data is used in the quantization process. "
+ "In order to evaulate our generated images, we will use them to quantize the model using MCT's PTQ.This is referred to as **\"Zero-Shot Quantization (ZSQ)\"** or **\"Data-Free Quantization\"** because no real data is used in the quantization process. Next we will define configurations for MCT's PTQ.\n",
+ "\n",
+ "### Target Platform Capabilities (TPC)\n",
+ "MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
],
"metadata": {
"collapsed": false
},
"id": "7b40f70b4132c5fb"
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "target_platform_cap = mct.get_target_platform_capabilities(\"pytorch\", \"default\")"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "672ffbf357234def"
+ },
{
"cell_type": "markdown",
"source": [
- "### Setup for evaluation on the ImageNet dataset\n",
- "Here we define functions for evaluation on ImageNet:\n"
+ "### Representative Dataset\n",
+ "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images. We wil use our generated images for the representative dataset."
],
"metadata": {
"collapsed": false
},
- "id": "c4ccf92648d8bc20"
+ "id": "97073eeea51b4dee"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
- "from torchvision import datasets, transforms\n",
- "from tqdm import tqdm\n",
- "\n",
- "# If GPU available, move the model to GPU\n",
- "DEVICE = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
- "# Load a pre-trained model (e.g., ResNet18)\n",
- "model = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)\n",
- "\n",
- "def get_validation_loader(imagenet_validation_folder, batch_size=50):\n",
- " preprocess = transforms.Compose([\n",
- " transforms.Resize(256),\n",
- " transforms.CenterCrop(224),\n",
- " transforms.ToTensor(),\n",
- " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n",
- " std=[0.229, 0.224, 0.225]),\n",
- " ])\n",
- " data_loader = torch.utils.data.DataLoader(\n",
- " datasets.ImageFolder(imagenet_validation_folder, preprocess),\n",
- " batch_size=batch_size, shuffle=False,\n",
- " num_workers=8, pin_memory=True)\n",
- " return data_loader\n",
- "\n",
- "def eval(outputs, labels, topk=(1,)):\n",
- " maxk = max(topk)\n",
- "\n",
- " _, pred = outputs.topk(maxk, 1, True, True)\n",
- " pred = pred.t()\n",
- " correct = pred.eq(labels.view(1, -1).expand_as(pred))\n",
- " return correct\n",
- "\n",
- "\n",
- "def accuracy(outputs, labels, topk=(1,)):\n",
- " \"\"\"Computes the accuracy over the k top predictions for the specified values of k\"\"\"\n",
- "\n",
- " correct = eval(outputs, labels)\n",
- "\n",
- " batch_size = labels.size(0)\n",
- "\n",
- " res = []\n",
- " for k in topk:\n",
- " correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)\n",
- " res.append(correct_k.mul_(100.0 / batch_size))\n",
+ "batch_size = 64\n",
+ "n_iter = 10\n",
"\n",
- " return res, correct\n",
- "\n",
- "\n",
- "def pytorch_model_accuracy_evaluation(model, val_data_loader) -> float:\n",
- " model = model.to(DEVICE)\n",
- " model.eval()\n",
- " acc_top1 = 0\n",
- "\n",
- " batch_cntr = 1\n",
- " iterations = len(val_data_loader)\n",
- " with torch.no_grad():\n",
- " for input_data, target_data in tqdm(val_data_loader):\n",
- " inputs_batch = input_data.to(DEVICE)\n",
- " target_batch = target_data.to(DEVICE)\n",
- "\n",
- "\n",
- " predicted_batch = model(inputs_batch)\n",
- "\n",
- " batch_avg_top_1, correct_inds = accuracy(outputs=predicted_batch, labels=target_batch)\n",
- " acc_top1 += batch_avg_top_1[0].item()\n",
- " \n",
- " \n",
- " batch_cntr += 1\n",
- " if batch_cntr > iterations:\n",
- " break\n",
- " acc_top1 /= iterations\n",
- " return acc_top1 "
+ "generated_images = np.concatenate(generated_images, axis=0).reshape(*(-1, batch_size, *list(generated_images[0].shape[1:])))\n",
+ " \n",
+ "def representative_data_gen():\n",
+ " for nn in range(n_iter):\n",
+ " nn_mod = nn % generated_images.shape[0]\n",
+ " yield [generated_images[nn_mod]]"
],
"metadata": {
"collapsed": false
},
- "id": "ad895ccd05275eb9"
+ "id": "d6a3d88a51883757"
},
{
"cell_type": "markdown",
"source": [
- "Here we define configurations for MCT's PTQ:"
+ "### Quantization with our generated images\n",
+ "Now, we are ready to use MCT to quantize the model."
],
"metadata": {
"collapsed": false
},
- "id": "e1a9b2df31324281"
+ "id": "8cbc59406d217273"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
- "num_calibration_iter = 10\n",
- "batch_size=50\n",
- "target_platform_cap = mct.get_target_platform_capabilities(\"pytorch\", \"default\")\n",
- "core_config = mct.core.CoreConfig(quantization_config=mct.core.QuantizationConfig())"
+ "# run post training quantization on the model to get the quantized model output\n",
+ "quantized_model_generated_data, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
+ " in_module=float_model,\n",
+ " representative_data_gen=representative_data_gen,\n",
+ " target_platform_capabilities=target_platform_cap\n",
+ ")"
],
"metadata": {
"collapsed": false
},
- "id": "672ffbf357234def"
+ "id": "c7f57ae27466992e"
},
{
"cell_type": "markdown",
"source": [
- "Specify the path to the imagenet validation folder:"
+ "## Setup for evaluation on the ImageNet dataset\n",
+ "### Download ImageNet validation set\n",
+ "Download ImageNet dataset with only the validation split. This step may take several minutes..."
],
"metadata": {
"collapsed": false
},
- "id": "5aa6547351df38bd"
+ "id": "1de293d52f60801"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
- "imagenet_validation_folder = '/path/to/imagenet/validation/folder'\n",
- "val_loader = get_validation_loader(imagenet_validation_folder)"
+ "import os\n",
+ "\n",
+ "if not os.path.isdir('imagenet'):\n",
+ " !mkdir imagenet\n",
+ " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n",
+ " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar\n",
+ "\n",
+ "# Extract ImageNet validation dataset using torchvision \"datasets\" module.\n",
+ "dataset = ImageNet(root='./imagenet', split='val', transform=weights.transforms())\n",
+ "val_dataloader = DataLoader(dataset, batch_size=50, shuffle=False, num_workers=16, pin_memory=True)"
],
"metadata": {
"collapsed": false
},
- "id": "5392d6e44eebb864"
+ "id": "5febfa57873fa2f3"
},
{
"cell_type": "markdown",
"source": [
- "### Quantization with our generated images\n",
- "In this section we use our generated images as a representative dataset for PTQ:"
+ "Here we define functions for evaluation:"
],
"metadata": {
"collapsed": false
},
- "id": "97073eeea51b4dee"
+ "id": "874d5d61f876bc82"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
- "batches_inds = np.random.choice(len(generated_images),\n",
- " size=(int(len(generated_images) / batch_size),batch_size),\n",
- " replace=False)\n",
- "def representative_data_gen():\n",
- " for nn in range(num_calibration_iter):\n",
- " nn_mod = nn % len(batches_inds)\n",
- " yield [np.concatenate([generated_images[b].detach().cpu().numpy() for b in batches_inds[nn_mod]], axis=0)]\n",
- " \n",
- "# run post training quantization on the model to get the quantized model output\n",
- "quantized_model_generated_data, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
- " in_module=model,\n",
- " representative_data_gen=representative_data_gen,\n",
- " core_config=core_config,\n",
- " target_platform_capabilities=target_platform_cap\n",
- ")"
+ "from tqdm import tqdm\n",
+ "\n",
+ "\n",
+ "def evaluate(model, testloader):\n",
+ " device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+ " model.to(device)\n",
+ " model.eval() # Set the model to evaluation mode\n",
+ " correct = 0\n",
+ " total = 0\n",
+ " with torch.no_grad():\n",
+ " for data in tqdm(testloader):\n",
+ " images, labels = data\n",
+ " images, labels = images.to(device), labels.to(device)\n",
+ " outputs = model(images)\n",
+ " _, predicted = outputs.max(1)\n",
+ " total += labels.size(0)\n",
+ " correct += predicted.eq(labels).sum().item()\n",
+ "\n",
+ " # correct += (predicted == labels).sum().item()\n",
+ " val_acc = (100 * correct / total)\n",
+ " print('Accuracy: %.2f%%' % val_acc)\n",
+ " return val_acc"
],
"metadata": {
"collapsed": false
},
- "id": "c7f57ae27466992e"
+ "id": "de8cdf0ada297905"
},
{
"cell_type": "markdown",
"source": [
"### Evaluation of the quantized model's performance\n",
- "Here we evaluate our model's top 1 classification performance after PTQ on the ImageNet validation dataset."
+ "Here we evaluate our model's top 1 classification performance after PTQ on the ImageNet validation dataset.\n",
+ "Let's start with the floating-point model evaluation."
],
"metadata": {
"collapsed": false
@@ -456,20 +387,40 @@
"execution_count": null,
"outputs": [],
"source": [
- "accuracy_values = pytorch_model_accuracy_evaluation(quantized_model_generated_data, val_loader)\n",
- "print('Float model\\'s reported top 1 performance on ImageNet: 69.86')\n",
- "print(f'Data-Free quantized model\\'s top 1 performance on ImageNet: {accuracy_values}')"
+ "evaluate(float_model, val_dataloader)"
],
"metadata": {
"collapsed": false
},
"id": "857b5d4111a42071"
},
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Finally, let's evaluate the quantized model:"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "7451953d684a8497"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "evaluate(quantized_model_generated_data, val_dataloader)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "e77e131927a14217"
+ },
{
"cell_type": "markdown",
"source": [
"## Conclusion:\n",
- "In this tutorial we demonstrated how to generate synthetic images from a trained model and how to use these images for quantizing the model. The quantized model's size is x4 compressed compared to the original float model, however, its performance is similar to the repored float result. No real data was needed in this process. "
+ "In this tutorial, we demonstrated how to generate synthetic images from a trained model and use them for model quantization. The quantized model achieved a 4x reduction in size compared to the original float model, while maintaining performance similar to the reported float results. Notably, no real data was required in this process."
],
"metadata": {
"collapsed": false
@@ -488,16 +439,6 @@
"collapsed": false
},
"id": "b2a030eb3ee565ef"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [],
- "metadata": {
- "collapsed": false
- },
- "id": "44a80837535b9c08"
}
],
"metadata": {
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb
index 53602d01b..04937ae05 100644
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb
@@ -17,16 +17,24 @@
{
"cell_type": "markdown",
"source": [
- "# Export Quantized Pytorch Model\n",
+ "# Export a Quantized Pytorch Model With the Model Compression Toolkit (MCT)\n",
"\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb)\n",
"\n",
+ "## Overview\n",
+ "This tutorial demonstrates how to export a PyTorch model to ONNX and TorchSript formats using the Model Compression Toolkit (MCT). It covers the steps of creating a simple PyTorch model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to ONNX and TorchSript. The tutorial also shows how to use the exported model for inference.\n",
"\n",
- "To export a Pytorch model as a quantized model, it is necessary to first apply quantization\n",
- "to the model using MCT:\n",
+ "## Summary:\n",
+ "In this tutorial, we will cover:\n",
"\n",
+ "1. Constructing a simple PyTorch model for demonstration purposes.\n",
+ "2. Applying post-training quantization to the model using the Model Compression Toolkit.\n",
+ "3. Exporting the quantized model to the ONNX and TorchScript formats.\n",
+ "4. Ensuring compatibility between PyTorch and ONNX during the export process.\n",
+ "5. Using the exported model for inference.\n",
"\n",
- "\n"
+ "## Setup\n",
+ "To export your quantized model to ONNX format and use it for inference, you will need to install some additional packages. Note that these packages are only required if you plan to export the model to ONNX. If ONNX export is not needed, you can skip this step."
],
"metadata": {
"id": "UJDzewEYfSN5"
@@ -35,7 +43,7 @@
{
"cell_type": "code",
"source": [
- "! pip install -q mct-nightly"
+ "! pip install -q onnx onnxruntime onnxruntime-extensions"
],
"metadata": {
"id": "qNddNV6TEsX0"
@@ -46,16 +54,18 @@
{
"cell_type": "markdown",
"source": [
- "In order to export your quantized model to ONNX format, and use it for inference, some additional packages are needed. Notice, this is needed only for models exported to ONNX format, so this part can be skipped if this is not planned:"
+ "Install the Model Compression Toolkit:"
],
"metadata": {
- "id": "_w7xvHbcj1aV"
+ "collapsed": false
}
},
{
"cell_type": "code",
"source": [
- "! pip install -q onnx onnxruntime onnxruntime-extensions"
+ "import importlib\n",
+ "if not importlib.util.find_spec('model_compression_toolkit'):\n",
+ " !pip install model_compression_toolkit"
],
"metadata": {
"id": "g10bFms8jzln"
@@ -63,10 +73,24 @@
"execution_count": null,
"outputs": []
},
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import numpy as np\n",
+ "from torchvision.models.mobilenetv2 import mobilenet_v2\n",
+ "import model_compression_toolkit as mct"
+ ],
+ "metadata": {
+ "collapsed": false
+ }
+ },
{
"cell_type": "markdown",
"source": [
- "Now, let's start the export demonstration by quantizing the model using MCT:"
+ "## Quantize the Model with the Model Compression Toolkit (MCT)\n",
+ "Let's begin the export demonstration by loading a model and applying quantization using MCT. This process will allow us to prepare the model for ONNX export."
],
"metadata": {
"id": "Q36T6YpZkeTC"
@@ -75,15 +99,9 @@
{
"cell_type": "code",
"source": [
- "import model_compression_toolkit as mct\n",
- "import numpy as np\n",
- "import torch\n",
- "from torchvision.models.mobilenetv2 import mobilenet_v2\n",
- "\n",
"# Create a model\n",
"float_model = mobilenet_v2()\n",
"\n",
- "\n",
"# Notice that here the representative dataset is random for demonstration only.\n",
"def representative_data_gen():\n",
" yield [np.random.random((1, 3, 224, 224))]\n",
@@ -103,16 +121,12 @@
"\n",
"\n",
"### ONNX\n",
+ "The model will be exported in ONNX format, where both weights and activations are represented as floats. Make sure that `onnx` is installed to enable exporting.\n",
"\n",
- "The model will be exported in ONNX format where weights and activations are represented as float. Notice that `onnx` should be installed in order to export the model to an ONNX model.\n",
- "\n",
- "There are two optional formats to choose: MCTQ or FAKELY_QUANT.\n",
+ "There are two optional formats available for export: MCTQ or FAKELY_QUANT.\n",
"\n",
"#### MCTQ Quantization Format\n",
- "\n",
- "By default, `mct.exporter.pytorch_export_model` will export the quantized pytorch model to\n",
- "an ONNX model with custom quantizers from mct_quantizers module. \n",
- "\n"
+ "By default, `mct.exporter.pytorch_export_model` exports the quantized PyTorch model to ONNX using custom quantizers from the `mct_quantizers` module. "
],
"metadata": {
"id": "-n70LVe6DQPw"
@@ -125,9 +139,10 @@
"onnx_file_path = 'model_format_onnx_mctq.onnx'\n",
"\n",
"# Export ONNX model with mctq quantizers.\n",
- "mct.exporter.pytorch_export_model(model=quantized_exportable_model,\n",
- " save_model_path=onnx_file_path,\n",
- " repr_dataset=representative_data_gen)"
+ "mct.exporter.pytorch_export_model(\n",
+ " model=quantized_exportable_model,\n",
+ " save_model_path=onnx_file_path,\n",
+ " repr_dataset=representative_data_gen)"
],
"metadata": {
"id": "PO-Hh0bzD1VJ"
@@ -138,11 +153,10 @@
{
"cell_type": "markdown",
"source": [
- "Notice that the model has the same size as the quantized exportable model as weights data types are float.\n",
- "\n",
- "#### ONNX opset version\n",
+ "Note that the model's size remains unchanged compared to the quantized exportable model, as the weight data types are still represented as floats.\n",
"\n",
- "By default, the used ONNX opset version is 15, but this can be changed using `onnx_opset_version`:"
+ "#### ONNX Opset Version\n",
+ "By default, the ONNX opset version used is 15. However, this can be adjusted by specifying the `onnx_opset_version` parameter during export."
],
"metadata": {
"id": "Bwx5rxXDF_gb"
@@ -152,10 +166,11 @@
"cell_type": "code",
"source": [
"# Export ONNX model with mctq quantizers.\n",
- "mct.exporter.pytorch_export_model(model=quantized_exportable_model,\n",
- " save_model_path=onnx_file_path,\n",
- " repr_dataset=representative_data_gen,\n",
- " onnx_opset_version=16)"
+ "mct.exporter.pytorch_export_model(\n",
+ " model=quantized_exportable_model,\n",
+ " save_model_path=onnx_file_path,\n",
+ " repr_dataset=representative_data_gen,\n",
+ " onnx_opset_version=16)"
],
"metadata": {
"id": "S9XtcX8s3dU9"
@@ -166,9 +181,9 @@
{
"cell_type": "markdown",
"source": [
- "### Use exported model for inference\n",
- "\n",
- "To load and infer using the exported model, which was exported to an ONNX file in MCTQ format, we will use `mct_quantizers` method `get_ort_session_options` during onnxruntime session creation. **Notice**, inference on models that are exported in this format are slowly and suffers from longer latency. However, inference of these models on IMX500 will not suffer from this issue."
+ "### Using the Exported Model for Inference\n",
+ "To load and perform inference with the ONNX model exported in MCTQ format, use the `mct_quantizers` method `get_ort_session_options` during the creation of an ONNX Runtime session. \n",
+ "**Note:** Inference on models exported in this format tends to be slower and experiences higher latency. However, inference on hardware such as the IMX500 will not suffer from this issue."
],
"metadata": {
"id": "OygCt_iHQQiz"
@@ -200,9 +215,8 @@
{
"cell_type": "markdown",
"source": [
- "#### Fakely-Quantized\n",
- "\n",
- "To export a fakely-quantized model, use QuantizationFormat.FAKELY_QUANT:"
+ "#### Fakely-Quantized Format\n",
+ "To export a fakely-quantized model, use the `QuantizationFormat.FAKELY_QUANT` option. This format ensures that quantization is simulated but does not alter the data types of the weights and activations during export."
],
"metadata": {
"id": "Uf4SbpNC28GA"
@@ -231,14 +245,11 @@
{
"cell_type": "markdown",
"source": [
+ "Note that the fakely-quantized model has the same size as the quantized exportable model, as the weights are still represented as floats.\n",
"\n",
- "Notice that the fakely-quantized model has the same size as the quantized\n",
- "exportable model as weights data types are float.\n",
- "\n",
- "### TorchScript\n",
+ "### TorchScript Format\n",
"\n",
- "The model will be exported in TorchScript format where weights and activations are\n",
- "quantized but represented as float (fakely quant)."
+ "The model can also be exported in TorchScript format, where weights and activations are quantized but represented as floats (fakely quantized)."
],
"metadata": {
"id": "-L1aRxFGGFeF"
@@ -268,8 +279,7 @@
{
"cell_type": "markdown",
"source": [
- "Notice that the fakely-quantized model has the same size as the quantized exportable model as weights data types are\n",
- "float."
+ "Note that the fakely-quantized model retains the same size as the quantized exportable model, as the weight data types remain in float format."
],
"metadata": {
"id": "SBqtJV9AGRzN"
@@ -281,6 +291,7 @@
"id": "bb7e1572"
},
"source": [
+ "## Copyrights:\n",
"Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\");\n",
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb
new file mode 100644
index 000000000..31569618b
--- /dev/null
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb
@@ -0,0 +1,476 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "7cf96fb4",
+ "metadata": {
+ "id": "7cf96fb4"
+ },
+ "source": [
+ "# Mixed-Precision Post-Training Quantization in PyTorch using the Model Compression Toolkit (MCT)\n",
+ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb)\n",
+ "\n",
+ "## Overview\n",
+ "This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a PyTorch model with post-training mixed-precision quantization. This quantization assigns different precision levels to various layers based on their impact on the model's output. We will load a pre-trained model and quantize it using the MCT. Finally, we will evaluate the quantized model and export it to an ONNX file.\n",
+ "\n",
+ "## Summary\n",
+ "In this tutorial, we will cover:\n",
+ "\n",
+ "1. Loading and preprocessing ImageNet’s validation dataset.\n",
+ "2. Constructing an unlabeled representative dataset.\n",
+ "3. Applying mixed-precision post-training quantization to the model's weights using MCT.\n",
+ "3. Accuracy evaluation of the floating-point and the quantized models.\n",
+ "\n",
+ "## Setup\n",
+ "Install the relevant packages:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "89e0bb04",
+ "metadata": {
+ "id": "89e0bb04"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install -q torch torchvision"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5441efd2978cea5a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import importlib\n",
+ "if not importlib.util.find_spec('model_compression_toolkit'):\n",
+ " !pip install model_compression_toolkit"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a82928d0",
+ "metadata": {
+ "id": "a82928d0"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "from torch.utils.data import DataLoader\n",
+ "from torchvision.models import mobilenet_v2, MobileNet_V2_Weights\n",
+ "from torchvision.datasets import ImageNet\n",
+ "import numpy as np\n",
+ "import random"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Load a pre-trained MobileNetV2 model from torchvision, in 32-bits floating-point precision format."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "3c2556ce8144e1d3"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "weights = MobileNet_V2_Weights.IMAGENET1K_V2\n",
+ "\n",
+ "float_model = mobilenet_v2(weights=weights)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "7a302610146f1ec3"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Dataset preparation\n",
+ "### Download ImageNet validation set\n",
+ "Download ImageNet dataset with only the validation split.\n",
+ "\n",
+ "**Note** that for demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n",
+ "\n",
+ "This step may take several minutes..."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "4df074784266e12e"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "if not os.path.isdir('imagenet'):\n",
+ " !mkdir imagenet\n",
+ " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n",
+ " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "a8a3327f28c20caf"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Extract ImageNet validation dataset using torchvision \"datasets\" module."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "8ff2ea33659f0c1a"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "dataset = ImageNet(root='./imagenet', split='val', transform=weights.transforms())"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "18f57edc3b87cad3"
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0321aad",
+ "metadata": {
+ "id": "c0321aad"
+ },
+ "source": [
+ "## Representative Dataset\n",
+ "For quantization with MCT, we need to define a representative dataset required by the Post-Training Quantization (PTQ) algorithm. This dataset is a generator that returns a list of images:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "618975be",
+ "metadata": {
+ "id": "618975be"
+ },
+ "outputs": [],
+ "source": [
+ "batch_size = 50\n",
+ "n_iter = 10\n",
+ "\n",
+ "dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n",
+ "\n",
+ "def representative_dataset_gen():\n",
+ " dataloader_iter = iter(dataloader)\n",
+ " for _ in range(n_iter):\n",
+ " yield [next(dataloader_iter)[0]]\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Target Platform Capabilities (TPC)\n",
+ "In addition, MCT optimizes models for dedicated hardware platforms using Target Platform Capabilities (TPC). \n",
+ "**Note:** To apply mixed-precision quantization to specific layers, the TPC must define different bit-width options for those layers. For more details, please refer to our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html). In this example, we use the default PyTorch TPC, which supports 2, 4, and 8-bit options for convolution and linear layers."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "caa87e1b976c9767"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import model_compression_toolkit as mct\n",
+ "\n",
+ "# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
+ "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "794b23bfe4a3f41d"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Mixed Precision Configurations\n",
+ "We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n",
+ "1. **Number of images** - Determines how many images from the representative dataset are used to find an optimal bit-width configuration. More images result in higher accuracy but increase search time.\n",
+ "2. **Gradient weighting** - Improves bit-width configuration accuracy at the cost of longer search time. This method will not be used in this example.\n",
+ "\n",
+ "MCT will determine a bit-width for each layer and quantize the model based on this configuration. The candidate bit-widths for quantization should be defined in the target platform model."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "ad960242931c2d86"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "configuration = mct.core.CoreConfig(\n",
+ " mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n",
+ " num_of_images=32,\n",
+ " use_hessian_based_scores=False))"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "b26381aea3a94eb8"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "To enable mixed-precision quantization, we define the desired compression ratio. In this example, we will configure the model to compress the weights to 75% of the size of the 8-bit model's weights. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "9f48885ac931bae5"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "# Get Resource Utilization information to constraint your model's memory size.\n",
+ "resource_utilization_data = mct.core.pytorch_resource_utilization_data(\n",
+ " float_model,\n",
+ " representative_dataset_gen,\n",
+ " configuration,\n",
+ " target_platform_capabilities=target_platform_cap)\n",
+ "\n",
+ "# Create a ResourceUtilization object \n",
+ "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * 0.75)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "edee094198b3a558"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now, we are ready to use MCT to quantize the model."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "a64b5ce7a3f861e"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
+ " in_module=float_model,\n",
+ " representative_data_gen=representative_dataset_gen,\n",
+ " target_resource_utilization=resource_utilization,\n",
+ " core_config=configuration,\n",
+ " target_platform_capabilities=target_platform_cap)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "d769042646dca720"
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c677bd61c3ab4649",
+ "metadata": {},
+ "source": [
+ "## Model Evaluation\n",
+ "In order to evaluate our models, we first need to load the validation dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "val_dataloader = DataLoader(dataset, batch_size=50, shuffle=False, num_workers=16, pin_memory=True)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "57ee11ff6934aa9f"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now, we will create a function for evaluating a model."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "31ced59d1514509e"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "\n",
+ "def evaluate(model, testloader):\n",
+ " \"\"\"\n",
+ " Evaluate a model using a test loader.\n",
+ " \"\"\"\n",
+ " device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+ " model.to(device)\n",
+ " model.eval() # Set the model to evaluation mode\n",
+ " correct = 0\n",
+ " total = 0\n",
+ " with torch.no_grad():\n",
+ " for data in tqdm(testloader):\n",
+ " images, labels = data\n",
+ " images, labels = images.to(device), labels.to(device)\n",
+ " outputs = model(images)\n",
+ " _, predicted = outputs.max(1)\n",
+ " total += labels.size(0)\n",
+ " correct += predicted.eq(labels).sum().item()\n",
+ "\n",
+ " # correct += (predicted == labels).sum().item()\n",
+ " val_acc = (100 * correct / total)\n",
+ " print('Accuracy: %.2f%%' % val_acc)\n",
+ " return val_acc"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "5f120e924b5d8cf4"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Let's start with the floating-point model evaluation."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "a10499a2b79b19da"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fdd038f7aff8cde7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "evaluate(float_model, val_dataloader)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4f564f31e253f5c",
+ "metadata": {},
+ "source": [
+ "Finally, let's evaluate the quantized model:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c9da2134f0bde415",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "evaluate(quantized_model, val_dataloader)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fd09fa27",
+ "metadata": {
+ "id": "fd09fa27"
+ },
+ "source": [
+ "Now, we can export the quantized model to ONNX:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "oXMn6bFjbQad",
+ "metadata": {
+ "id": "oXMn6bFjbQad"
+ },
+ "outputs": [],
+ "source": [
+ "mct.exporter.pytorch_export_model(quantized_model, save_model_path='qmodel.onnx', repr_dataset=representative_dataset_gen)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bb7e1572",
+ "metadata": {
+ "id": "bb7e1572"
+ },
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this tutorial, we demonstrated how to quantize a classification model for MNIST in a hardware-friendly manner using MCT. We observed that a 4x compression ratio was achieved with minimal performance degradation.\n",
+ "\n",
+ "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n",
+ "\n",
+ "While this was a simple model and task, MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n",
+ "\n",
+ "## Copyrights:\n",
+ "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",
+ "\n",
+ "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "you may not use this file except in compliance with the License.\n",
+ "You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ "Unless required by applicable law or agreed to in writing, software\n",
+ "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "See the License for the specific language governing permissions and\n",
+ "limitations under the License.\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenet_gptq.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenet_gptq.ipynb
index feea90840..2913206a4 100644
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenet_gptq.ipynb
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenet_gptq.ipynb
@@ -380,14 +380,6 @@
"evaluate(quantized_model, val_dataloader)"
]
},
- {
- "cell_type": "markdown",
- "id": "e316c34cadd054e7",
- "metadata": {
- "collapsed": false
- },
- "source": []
- },
{
"cell_type": "markdown",
"id": "ebfbb4de-5b6e-4732-83d3-a21e96cdd866",
@@ -395,16 +387,7 @@
"id": "ebfbb4de-5b6e-4732-83d3-a21e96cdd866"
},
"source": [
- "You can see that we got a very small degradation with a compression rate of x4 !"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "6YjIdiRRjgkL",
- "metadata": {
- "id": "6YjIdiRRjgkL"
- },
- "source": [
+ "You can see that we got a very small degradation with a compression rate of x4 !\n",
"Now, we can export the model to ONNX:"
]
},
@@ -427,28 +410,10 @@
"id": "14877777"
},
"source": [
- "## Conclusion"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bb7e1572",
- "metadata": {
- "id": "bb7e1572"
- },
- "source": [
+ "## Conclusion\n",
"In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with gradient-based optimization with a few lines of code. We saw that we can achieve an x4 compression ratio with minimal performance degradation.\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "01c1645e-205c-4d9a-8af3-e497b3addec1",
- "metadata": {
- "id": "01c1645e-205c-4d9a-8af3-e497b3addec1"
- },
- "source": [
"\n",
+ "## Copyrights\n",
"\n",
"Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.\n",
"\n",
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb
deleted file mode 100644
index 42efd0a19..000000000
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb
+++ /dev/null
@@ -1,662 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "7cf96fb4",
- "metadata": {
- "id": "7cf96fb4"
- },
- "source": [
- "# Mixed-Precision PTQ - Pytorch MobileNetV2 on CIFAR100"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "59ed8f02",
- "metadata": {
- "id": "59ed8f02"
- },
- "source": [
- "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenetv2_cifar100_mixed_precision.ipynb)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "822944a1",
- "metadata": {
- "id": "822944a1"
- },
- "source": [
- "## Overview"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "743dbc3d",
- "metadata": {
- "id": "743dbc3d"
- },
- "source": [
- "This tutorial demonstrates the process of retraining and quantizing a MobileNetV2 on CIFAR100 dataset. It starts by fine-tuning a pretrained MobileNetV2 model on the CIFAR100 dataset. After retraining, the model is quantized using MCT. This tutorial specifically uses mixed-precision quantization, which assigns different precision levels to different layers in the model based on their impact on the output. The quantized model is then evaluated and exported to an ONNX file."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "59e2eeae",
- "metadata": {
- "id": "59e2eeae"
- },
- "source": [
- "## Summary"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1daf577a",
- "metadata": {
- "id": "1daf577a"
- },
- "source": [
- "In this tutorial we will cover:\n",
- "1. Retraining Pytorch MobileNetV2 on CIFAR100.\n",
- "2. Quantizing the model using post-training quantization in mixes-precision for the weights.\n",
- "3. Evaluating and exporting the model to ONNX."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "8b3396bf",
- "metadata": {
- "id": "8b3396bf"
- },
- "source": [
- "## Setup"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "5e7690ef",
- "metadata": {
- "id": "5e7690ef"
- },
- "source": [
- "First install the relevant packages and import them:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "89e0bb04",
- "metadata": {
- "id": "89e0bb04"
- },
- "outputs": [],
- "source": [
- "! pip install -q model-compression-toolkit\n",
- "! pip install -q torch\n",
- "! pip install -q torchvision"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "a82928d0",
- "metadata": {
- "id": "a82928d0"
- },
- "outputs": [],
- "source": [
- "import copy\n",
- "import tempfile\n",
- "\n",
- "import torch\n",
- "import torchvision\n",
- "from torch import nn, optim\n",
- "from torchvision import transforms\n",
- "from tqdm import tqdm\n",
- "import numpy as np\n",
- "import random\n",
- "\n",
- "import model_compression_toolkit as mct"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bafa05a4-b897-4d58-afa9-6416221266b5",
- "metadata": {},
- "source": [
- "In addition, let's set a seed for reproduction results purposes:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "4d9c893c-b3b5-41fc-858b-af886e818f1c",
- "metadata": {},
- "outputs": [],
- "source": [
- "def seed_everything(seed_value):\n",
- " random.seed(seed_value)\n",
- " np.random.seed(seed_value)\n",
- " torch.manual_seed(seed_value)\n",
- " torch.cuda.manual_seed_all(seed_value)\n",
- " torch.backends.cudnn.deterministic = True\n",
- " torch.backends.cudnn.benchmark = False\n",
- "\n",
- "seed_everything(0)\n"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "1653425b",
- "metadata": {
- "id": "1653425b"
- },
- "source": [
- "## Define functions for creating dataset loaders\n",
- "\n",
- "We use two functions to create data loaders for the CIFAR100 dataset:\n",
- "\n",
- "get_cifar100_trainloader - This function creates a data loader for the CIFAR100 training dataset, applying the specified transformations and using the provided batch size.\n",
- "\n",
- "get_cifar100_testloader - Similarly, this function creates a data loader for the CIFAR100 testing dataset with the given transformations and batch size."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "ac38177b-6ba4-4fc0-b1f5-a72d63631e40",
- "metadata": {},
- "outputs": [],
- "source": [
- "\n",
- "def get_cifar100_trainloader(dataset_folder, transform, train_batch_size):\n",
- " \"\"\"\n",
- " Get CIFAR100 train loader.\n",
- " \"\"\"\n",
- " trainset = torchvision.datasets.CIFAR100(root=dataset_folder, train=True, download=True, transform=transform)\n",
- " trainloader = torch.utils.data.DataLoader(trainset, batch_size=train_batch_size, shuffle=True)\n",
- " return trainloader\n",
- "\n",
- "\n",
- "def get_cifar100_testloader(dataset_folder, transform, eval_batch_size):\n",
- " \"\"\"\n",
- " Get CIFAR100 test loader.\n",
- " \"\"\"\n",
- " testset = torchvision.datasets.CIFAR100(root=dataset_folder, train=False, download=True, transform=transform)\n",
- " testloader = torch.utils.data.DataLoader(testset, batch_size=eval_batch_size, shuffle=False)\n",
- " return testloader\n"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "02312089",
- "metadata": {
- "id": "02312089"
- },
- "source": [
- "## Evaluation helper function\n",
- "Now, we will create a function for evaluating a model (we will use it later on)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "16f9bcc0",
- "metadata": {
- "id": "16f9bcc0"
- },
- "outputs": [],
- "source": [
- "\n",
- "def evaluate(model, testloader, device):\n",
- " \"\"\"\n",
- " Evaluate a model using a test loader.\n",
- " \"\"\"\n",
- " model.to(device)\n",
- " model.eval() # Set the model to evaluation mode\n",
- " correct = 0\n",
- " total = 0\n",
- " with torch.no_grad():\n",
- " for data in testloader:\n",
- " images, labels = data\n",
- " images, labels = images.to(device), labels.to(device)\n",
- " outputs = model(images)\n",
- " _, predicted = torch.max(outputs.data, 1)\n",
- " total += labels.size(0)\n",
- " correct += (predicted == labels).sum().item()\n",
- " val_acc = (100 * correct / total)\n",
- " print('Accuracy: %.2f%%' % val_acc)\n",
- " return val_acc"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c24d3c5a",
- "metadata": {
- "id": "c24d3c5a"
- },
- "source": [
- "## Fine-tuning MobileNetV2 to CIFAR100\n",
- "\n",
- "We now create a function for the retraining phase of our model. This is a simple training schema for 20 wpochs. The trained model is evaluated after each epoch and the returned model is the model with the best observed accuracy."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "c615a27e",
- "metadata": {
- "id": "c615a27e"
- },
- "outputs": [],
- "source": [
- "def retrain(model, transform, device, args):\n",
- " trainloader = get_cifar100_trainloader(args.representative_dataset_dir,\n",
- " transform,\n",
- " args.retrain_batch_size)\n",
- "\n",
- " testloader = get_cifar100_testloader(args.representative_dataset_dir,\n",
- " transform,\n",
- " args.eval_batch_size)\n",
- "\n",
- " model.to(device)\n",
- "\n",
- " # Define loss function and optimizer\n",
- " criterion = nn.CrossEntropyLoss()\n",
- " optimizer = optim.SGD(model.parameters(),\n",
- " lr=args.retrain_lr,\n",
- " momentum=args.retrain_momentum)\n",
- "\n",
- " best_acc = 0.0\n",
- " # Training loop\n",
- " for epoch in range(args.retrain_num_epochs):\n",
- " prog_bar = tqdm(enumerate(trainloader),\n",
- " total=len(trainloader),\n",
- " leave=True)\n",
- "\n",
- " print(f'Retrain epoch: {epoch}')\n",
- " for i, data in prog_bar:\n",
- " inputs, labels = data\n",
- " inputs, labels = inputs.to(device), labels.to(device)\n",
- "\n",
- " # Zero the parameter gradients\n",
- " optimizer.zero_grad()\n",
- "\n",
- " # Forward, backward, and update parameters\n",
- " outputs = model(inputs)\n",
- " loss = criterion(outputs, labels)\n",
- " loss.backward()\n",
- " optimizer.step()\n",
- "\n",
- " val_acc = evaluate(model, testloader, device)\n",
- "\n",
- " # Check if this model has the best accuracy, and if so, save it\n",
- " if val_acc > best_acc:\n",
- " print(f'Best accuracy so far {val_acc}')\n",
- " best_acc = val_acc\n",
- " best_state_dict = copy.deepcopy(model.state_dict())\n",
- "\n",
- " model.load_state_dict(best_state_dict)\n",
- " return model"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b64a5f1f-9583-4982-ab0b-fb3ecba80ecb",
- "metadata": {},
- "source": [
- "Let's create an object for the retraining parameters:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "35cac44f-5813-4559-9753-c88660d2229c",
- "metadata": {},
- "outputs": [],
- "source": [
- "class RetrainArguments:\n",
- " def __init__(self):\n",
- " self.retrain_num_epochs = 20 # Number of epochs to retrain the model\n",
- " self.eval_batch_size = 32 # Batch size of test loader\n",
- " self.retrain_batch_size = 32 # Batch size of train loader\n",
- " self.retrain_lr = 0.001 # Learning rate to use during retraining\n",
- " self.retrain_momentum = 0.9 # SGD momentum to use during retraining\n",
- " self.representative_dataset_dir = './data' # Path to save the dataset (CIFAR100)\n",
- "\n",
- "retrain_args = RetrainArguments()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "69366614",
- "metadata": {
- "id": "69366614"
- },
- "source": [
- "In order to retrain MobileNetV2 we first load the ImageNet weights and then fine-tune it using the above-mentioned retraining function:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "9de970bf-f50e-45a8-a59c-57367b2a4559",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Load pretrained MobileNetV2 model on ImageNet\n",
- "model = torchvision.models.mobilenet_v2(pretrained=True)\n",
- "\n",
- "# Modify last layer to match CIFAR-100 classes\n",
- "model.classifier[1] = nn.Linear(model.last_channel, 100)\n",
- "\n",
- "# Create preprocessing pipeline for training and evaluation\n",
- "transform = transforms.Compose([\n",
- " transforms.Resize((224, 224)), # Resize images to fit MobileNetV2 input\n",
- " transforms.ToTensor(),\n",
- " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) # Normalize inputs to range [-1, 1]\n",
- "\n",
- "# If GPU available, move the model to GPU\n",
- "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
- "\n",
- "# Fine-tune the model to adapt to CIFAR100\n",
- "model = retrain(model,\n",
- " transform,\n",
- " device,\n",
- " retrain_args)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b9d099ed-9b44-4e4c-b89d-0a7a2da3eb03",
- "metadata": {},
- "source": [
- "Finally, let's evaluate our new model:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "aa5fbd70-8f0f-47c6-ad78-dd3a41d0e7bd",
- "metadata": {},
- "outputs": [],
- "source": [
- "# Evaluate the retrained model\n",
- "testloader = get_cifar100_testloader(retrain_args.representative_dataset_dir,\n",
- " transform,\n",
- " retrain_args.eval_batch_size)\n",
- "evaluate(model, testloader, device)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "e9cd25a7",
- "metadata": {
- "id": "e9cd25a7"
- },
- "source": [
- "## Mixed-Precision Quantization Using MCT"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c0321aad",
- "metadata": {
- "id": "c0321aad"
- },
- "source": [
- "Now we would like to quantize this model using MCT.\n",
- "To do so, we need to define a representative dataset, which is a generator that returns a list of images for 10 times (in this example):"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "618975be",
- "metadata": {
- "id": "618975be"
- },
- "outputs": [],
- "source": [
- "# Create representative_data_gen function from the train dataset\n",
- "trainloader = get_cifar100_trainloader(retrain_args.representative_dataset_dir,\n",
- " transform,\n",
- " retrain_args.retrain_batch_size)\n",
- "\n",
- "num_calibration_iterations = 10\n",
- "def representative_data_gen() -> list:\n",
- " for _ in range(num_calibration_iterations):\n",
- " yield [next(iter(trainloader))[0]]"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d0a92bee",
- "metadata": {
- "id": "d0a92bee"
- },
- "source": [
- "In addition, MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "63f695dd",
- "metadata": {
- "id": "63f695dd"
- },
- "outputs": [],
- "source": [
- "# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference.\n",
- "# Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
- "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d3521637",
- "metadata": {
- "id": "d3521637"
- },
- "source": [
- "In order to use mixed-precision quantization we need to set some parameters in the CoreConfig that MCT uses:\n",
- "1. Number of images - MCT uses images from the representative dataset to search for a suitable bit-width configuration. This parameter determine the number of images MCT will use. The more images, the bit-width configuration is expected to be more accurate (however this affects the search time, so there is a trade-off between runtime and expected accuracy).\n",
- "2. Gradient weighting - A method to improve the bit-width configuration search (in exchange for longer search time). In this example, we will not use it."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "3ce7789b-aa3d-4a44-8dc5-dc052ece9cad",
- "metadata": {
- "id": "4f5fa4a2"
- },
- "outputs": [],
- "source": [
- "# Create a mixed-precision quantization configuration with possible mixed-precision search options.\n",
- "# MCT will search a mixed-precision configuration (namely, bit-width for each layer)\n",
- "# and quantize the model according to this configuration.\n",
- "# The candidates bit-width for quantization should be defined in the target platform model:\n",
- "configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n",
- " num_of_images=32,\n",
- " use_hessian_based_scores=False))"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "534eeb45-dba7-45cc-b8c7-75cc60e6e002",
- "metadata": {},
- "source": [
- "In addition, when using mixed-precision we define the desired compression ratio. Here, we will search for a mixed-precision configuration that will compress the weights to 0.75% of the 8bits model weights:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "ec6f2ea5-9a79-4961-b746-aac45a95aecb",
- "metadata": {
- "id": "4f5fa4a2"
- },
- "outputs": [],
- "source": [
- "# Get Resource Utilization information to constraint your model's memory size.\n",
- "# Retrieve a ResourceUtilization object with helpful information of each resource utilization metric,\n",
- "# to constraint the quantized model to the desired memory size.\n",
- "resource_utilization_data = mct.core.pytorch_resource_utilization_data(model,\n",
- " representative_data_gen,\n",
- " configuration,\n",
- " target_platform_capabilities=target_platform_cap)\n",
- "\n",
- "# Set a constraint for each of the resource utilization metrics.\n",
- "# Create a ResourceUtilization object to limit our returned model's size. Note that this values affect only layers and attributes\n",
- "# that should be quantized (for example, the kernel of Conv2D in Pytorch will be affected by this value,\n",
- "# while the bias will not)\n",
- "# examples:\n",
- "# weights_compression_ratio = 0.75 - About 0.75 of the model's weights memory size when quantized with 8 bits.\n",
- "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * 0.75)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "fd09fa27",
- "metadata": {
- "id": "fd09fa27"
- },
- "source": [
- "Now, we are ready to use MCT to quantize the model:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "13263c5e-aac0-4f54-a0d4-705fd97451a5",
- "metadata": {},
- "outputs": [],
- "source": [
- "quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(model,\n",
- " representative_data_gen,\n",
- " target_resource_utilization=resource_utilization,\n",
- " core_config=configuration,\n",
- " target_platform_capabilities=target_platform_cap)\n",
- " "
- ]
- },
- {
- "cell_type": "markdown",
- "id": "7697b885-ed23-411c-903f-72542593b6e0",
- "metadata": {},
- "source": [
- "Finally, we evaluate the quantized model:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "886f063c-bb61-4e2e-bff1-c0c7333613f9",
- "metadata": {},
- "outputs": [],
- "source": [
- "evaluate(quantized_model,\n",
- " testloader,\n",
- " device)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "9nQBVWFhbKXV",
- "metadata": {
- "id": "9nQBVWFhbKXV"
- },
- "source": [
- "Now, we can export the quantized model to ONNX. Notice that onnx is not in MCT requierments, so first it should be installed:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "52313ce3-c735-40aa-8ec7-7d32c7359326",
- "metadata": {},
- "outputs": [],
- "source": [
- "! pip install -q onnx"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "oXMn6bFjbQad",
- "metadata": {
- "id": "oXMn6bFjbQad"
- },
- "outputs": [],
- "source": [
- "# Export quantized model to ONNX\n",
- "import tempfile\n",
- "_, onnx_file_path = tempfile.mkstemp('.onnx') # Path of exported model\n",
- "mct.exporter.pytorch_export_model(model=quantized_model, \n",
- " save_model_path=onnx_file_path,\n",
- " repr_dataset=representative_data_gen)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "14877777",
- "metadata": {
- "id": "14877777"
- },
- "source": [
- "## Conclusion"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bb7e1572",
- "metadata": {
- "id": "bb7e1572"
- },
- "source": [
- "\n",
- "\n",
- "Copyright 2023 Sony Semiconductor Israel, Inc. All rights reserved.\n",
- "\n",
- "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "you may not use this file except in compliance with the License.\n",
- "You may obtain a copy of the License at\n",
- "\n",
- " http://www.apache.org/licenses/LICENSE-2.0\n",
- "\n",
- "Unless required by applicable law or agreed to in writing, software\n",
- "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "See the License for the specific language governing permissions and\n",
- "limitations under the License.\n"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "provenance": []
- },
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.9.12"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb
new file mode 100644
index 000000000..65b9a61a6
--- /dev/null
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb
@@ -0,0 +1,427 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "7cf96fb4",
+ "metadata": {
+ "id": "7cf96fb4"
+ },
+ "source": [
+ "# Post-Training Quantization in PyTorch using the Model Compression Toolkit (MCT)\n",
+ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb)\n",
+ "\n",
+ "## Overview\n",
+ "This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a PyTorch model. We will load a pre-trained model and quantize it using the MCT with **Post-Training Quatntization (PTQ)**. Finally, we will evaluate the quantized model and export it to an ONNX file.\n",
+ "\n",
+ "## Summary\n",
+ "In this tutorial, we will cover:\n",
+ "\n",
+ "1. Loading and preprocessing ImageNet’s validation dataset.\n",
+ "2. Constructing an unlabeled representative dataset.\n",
+ "3. Post-Training Quantization using MCT.\n",
+ "4. Accuracy evaluation of the floating-point and the quantized models.\n",
+ "\n",
+ "## Setup\n",
+ "Install the relevant packages:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "89e0bb04",
+ "metadata": {
+ "id": "89e0bb04"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install -q torch torchvision"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5441efd2978cea5a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import importlib\n",
+ "if not importlib.util.find_spec('model_compression_toolkit'):\n",
+ " !pip install model_compression_toolkit"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a82928d0",
+ "metadata": {
+ "id": "a82928d0"
+ },
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "from torch.utils.data import DataLoader\n",
+ "from torchvision.models import mobilenet_v2, MobileNet_V2_Weights\n",
+ "from torchvision.datasets import ImageNet\n",
+ "import numpy as np\n",
+ "import random"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Load a pre-trained MobileNetV2 model from torchvision, in 32-bits floating-point precision format."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "3c2556ce8144e1d3"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "weights = MobileNet_V2_Weights.IMAGENET1K_V2\n",
+ "\n",
+ "float_model = mobilenet_v2(weights=weights)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "7a302610146f1ec3"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Dataset preparation\n",
+ "### Download ImageNet validation set\n",
+ "Download ImageNet dataset with only the validation split.\n",
+ "\n",
+ "**Note** that for demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.\n",
+ "\n",
+ "This step may take several minutes..."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "4df074784266e12e"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "\n",
+ "if not os.path.isdir('imagenet'):\n",
+ " !mkdir imagenet\n",
+ " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz\n",
+ " !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "a8a3327f28c20caf"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Extract ImageNet validation dataset using torchvision \"datasets\" module."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "8ff2ea33659f0c1a"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "dataset = ImageNet(root='./imagenet', split='val', transform=weights.transforms())"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "18f57edc3b87cad3"
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0321aad",
+ "metadata": {
+ "id": "c0321aad"
+ },
+ "source": [
+ "## Representative Dataset\n",
+ "For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "618975be",
+ "metadata": {
+ "id": "618975be"
+ },
+ "outputs": [],
+ "source": [
+ "batch_size = 16\n",
+ "n_iter = 10\n",
+ "\n",
+ "dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)\n",
+ "\n",
+ "def representative_dataset_gen():\n",
+ " dataloader_iter = iter(dataloader)\n",
+ " for _ in range(n_iter):\n",
+ " yield [next(dataloader_iter)[0]]\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Target Platform Capabilities (TPC)\n",
+ "In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "33271e23c3eff3b5"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import model_compression_toolkit as mct\n",
+ "\n",
+ "# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
+ "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "ae04779a863facd7"
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d0a92bee",
+ "metadata": {
+ "id": "d0a92bee"
+ },
+ "source": [
+ "## Post-Training Quantization using MCT\n",
+ "Now for the exciting part! Let’s run PTQ on the model. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "63f695dd",
+ "metadata": {
+ "id": "63f695dd"
+ },
+ "outputs": [],
+ "source": [
+ "quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
+ " in_module=float_model,\n",
+ " representative_data_gen=representative_dataset_gen,\n",
+ " target_platform_capabilities=target_platform_cap\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d3521637",
+ "metadata": {
+ "id": "d3521637"
+ },
+ "source": [
+ "Our model is now quantized. MCT has created a simulated quantized model within the original PyTorch framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `PytorchQuantizationWrapper` and `PytorchActivationQuantizationHolder`, wrap PyTorch layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x. Let's print the quantized model and examine the quantization modules:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c677bd61c3ab4649",
+ "metadata": {},
+ "source": [
+ "## Model Evaluation\n",
+ "In order to evaluate our models, we first need to load the validation dataset. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "val_dataloader = DataLoader(dataset, batch_size=50, shuffle=False, num_workers=16, pin_memory=True)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "57ee11ff6934aa9f"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now, we will create a function for evaluating a model."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "31ced59d1514509e"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "from tqdm import tqdm\n",
+ "\n",
+ "\n",
+ "def evaluate(model, testloader):\n",
+ " \"\"\"\n",
+ " Evaluate a model using a test loader.\n",
+ " \"\"\"\n",
+ " device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+ " model.to(device)\n",
+ " model.eval() # Set the model to evaluation mode\n",
+ " correct = 0\n",
+ " total = 0\n",
+ " with torch.no_grad():\n",
+ " for data in tqdm(testloader):\n",
+ " images, labels = data\n",
+ " images, labels = images.to(device), labels.to(device)\n",
+ " outputs = model(images)\n",
+ " _, predicted = outputs.max(1)\n",
+ " total += labels.size(0)\n",
+ " correct += predicted.eq(labels).sum().item()\n",
+ "\n",
+ " # correct += (predicted == labels).sum().item()\n",
+ " val_acc = (100 * correct / total)\n",
+ " print('Accuracy: %.2f%%' % val_acc)\n",
+ " return val_acc"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "5f120e924b5d8cf4"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Let's start with the floating-point model evaluation."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "a10499a2b79b19da"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fdd038f7aff8cde7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "evaluate(float_model, val_dataloader)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4f564f31e253f5c",
+ "metadata": {},
+ "source": [
+ "Finally, let's evaluate the quantized model:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c9da2134f0bde415",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "evaluate(quantized_model, val_dataloader)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fd09fa27",
+ "metadata": {
+ "id": "fd09fa27"
+ },
+ "source": [
+ "You can see that we got a very small degradation with a compression rate of x4 !\n",
+ "Now, we can export the quantized model to ONNX:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "oXMn6bFjbQad",
+ "metadata": {
+ "id": "oXMn6bFjbQad"
+ },
+ "outputs": [],
+ "source": [
+ "mct.exporter.pytorch_export_model(quantized_model, save_model_path='qmodel.onnx', repr_dataset=representative_dataset_gen)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bb7e1572",
+ "metadata": {
+ "id": "bb7e1572"
+ },
+ "source": [
+ "## Conclusion\n",
+ "\n",
+ "In this tutorial, we demonstrated how to quantize a classification model for MNIST in a hardware-friendly manner using MCT. We observed that a 4x compression ratio was achieved with minimal performance degradation.\n",
+ "\n",
+ "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n",
+ "\n",
+ "While this was a simple model and task, MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n",
+ "\n",
+ "## Copyrights\n",
+ "\n",
+ "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",
+ "\n",
+ "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "you may not use this file except in compliance with the License.\n",
+ "You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ "Unless required by applicable law or agreed to in writing, software\n",
+ "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "See the License for the specific language governing permissions and\n",
+ "limitations under the License.\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.4"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb
index dfafa6c18..3ed4c4e80 100644
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb
@@ -1,430 +1,465 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "source": [
- "# Structured Pruning of a Fully-Connected PyTorch Model\n",
- "\n",
- "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb)\n",
- "\n",
- "Welcome to this tutorial, where we will guide you through the process of training, pruning, and retraining a fully connected neural network model using the PyTorch framework. The tutorial is organized in the following sections:\n",
- "1. We'll start by installing and importing the nessecry packages.\n",
- "2. Next, we will construct and train a simple neural network on the MNIST dataset.\n",
- "2. Following that, we'll introduce model pruning to reduce the model's size while maintaining accuracy.\n",
- "3. Finally, we'll retrain our pruned model to recover any performance lost due to pruning."
- ],
- "metadata": {
- "collapsed": false,
- "id": "81d379c3e030ceb3"
- },
- "id": "81d379c3e030ceb3"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Installing Pytorch and the Model Compression Toolkit\n",
- "We begin by setting up our environment by installing PyTorch and the Model Compression Toolkit, then importing them. These installations will allow us to define, train, prune, and retrain our neural network models within this notebook."
- ],
- "metadata": {
- "collapsed": false,
- "id": "5551cab7da5eb204"
- },
- "id": "5551cab7da5eb204"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "!pip install -q torch torchvision\n",
- "!pip install -q mct-nightly"
- ],
- "metadata": {
- "id": "6b36f0086537151b"
- },
- "id": "6b36f0086537151b"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "# Import necessary libraries\n",
- "import torch\n",
- "import torch.nn as nn\n",
- "import torch.optim as optim\n",
- "from torchvision import datasets, transforms\n",
- "from torch.utils.data import DataLoader\n",
- "import model_compression_toolkit as mct"
- ],
- "metadata": {
- "id": "c5ca27b4acf15197"
- },
- "id": "c5ca27b4acf15197"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Loading and Preprocessing MNIST Dataset\n",
- "Let's create a function to retrieve the train and test parts of the MNIST dataset, including preprocessing:"
- ],
- "metadata": {
- "collapsed": false,
- "id": "c509bd917dbde9ef"
- },
- "id": "c509bd917dbde9ef"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "# MNIST Data Loading and Preprocessing\n",
- "def load_and_preprocess_mnist(batch_size=128, root_path='./data'):\n",
- " transform = transforms.Compose([\n",
- " transforms.ToTensor(),\n",
- " transforms.Normalize((0.5,), (0.5,))\n",
- " ])\n",
- "\n",
- " train_dataset = datasets.MNIST(root=root_path, train=True, download=True, transform=transform)\n",
- " test_dataset = datasets.MNIST(root=root_path, train=False, download=True, transform=transform)\n",
- "\n",
- " train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)\n",
- " test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)\n",
- "\n",
- " return train_loader, test_loader"
- ],
- "metadata": {
- "id": "e2ebe94efb864812"
- },
- "id": "e2ebe94efb864812"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Creating a Fully-Connected Model\n",
- "In this section, we create a simple example of a fully connected model to demonstrate the pruning process. It consists of three linear layers with 128, 64, and 10 neurons."
- ],
- "metadata": {
- "collapsed": false,
- "id": "4c246b8487a151db"
- },
- "id": "4c246b8487a151db"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "# Define the Fully-Connected Model\n",
- "class FCModel(nn.Module):\n",
- " def __init__(self):\n",
- " super(FCModel, self).__init__()\n",
- " self.flatten = nn.Flatten()\n",
- " self.fc_layers = nn.Sequential(\n",
- " nn.Linear(28*28, 128),\n",
- " nn.ReLU(),\n",
- " nn.Linear(128, 64),\n",
- " nn.ReLU(),\n",
- " nn.Linear(64, 10)\n",
- " )\n",
- "\n",
- " def forward(self, x):\n",
- " x = self.flatten(x)\n",
- " logits = self.fc_layers(x)\n",
- " return logits"
- ],
- "metadata": {
- "id": "9060e0fac2ae244"
- },
- "id": "9060e0fac2ae244"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Defining the Training Function\n",
- "\n",
- "Next, we'll define a function to train our neural network model. This function will handle the training loop, including forward propagation, loss calculation, backpropagation, and updating the model parameters. Additionally, we'll evaluate the model's performance on the validation dataset at the end of each epoch to monitor its accuracy."
- ],
- "metadata": {
- "collapsed": false,
- "id": "6acbc81a98082f80"
- },
- "id": "6acbc81a98082f80"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "def test_model(model, test_loader):\n",
- "# Evaluate the model\n",
- " model.eval()\n",
- " total, correct = 0, 0\n",
- " with torch.no_grad():\n",
- " for images, labels in test_loader:\n",
- " images, labels = images.to(device), labels.to(device)\n",
- " outputs = model(images)\n",
- " _, predicted = torch.max(outputs.data, 1)\n",
- " total += labels.size(0)\n",
- " correct += (predicted == labels).sum().item()\n",
- " accuracy = 100 * correct / total\n",
- " return accuracy\n",
- "\n",
- "# Training the Dense Model\n",
- "def train_model(model, train_loader, test_loader, device, epochs=6):\n",
- " model = model.to(device)\n",
- " criterion = nn.CrossEntropyLoss()\n",
- " optimizer = optim.Adam(model.parameters(), lr=0.001)\n",
- "\n",
- " for epoch in range(epochs):\n",
- " model.train()\n",
- " for images, labels in train_loader:\n",
- " images, labels = images.to(device), labels.to(device)\n",
- "\n",
- " optimizer.zero_grad()\n",
- " outputs = model(images)\n",
- " loss = criterion(outputs, labels)\n",
- " loss.backward()\n",
- " optimizer.step()\n",
- "\n",
- " accuracy = test_model(model, test_loader)\n",
- " print(f'Epoch [{epoch+1}/{epochs}], Test Accuracy: {accuracy:.2f}%')\n",
- " return model"
- ],
- "metadata": {
- "id": "86859a5f8b54f0c3"
- },
- "id": "86859a5f8b54f0c3"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Training the Dense Model\n",
- "We will now train the dense model using the MNIST dataset."
- ],
- "metadata": {
- "collapsed": false,
- "id": "1caf2d8a10673d90"
- },
- "id": "1caf2d8a10673d90"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "train_loader, test_loader = load_and_preprocess_mnist()\n",
- "dense_model = FCModel()\n",
- "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
- "dense_model = train_model(dense_model, train_loader, test_loader, device, epochs=6)"
- ],
- "metadata": {
- "id": "2d4660d484d4341b"
- },
- "id": "2d4660d484d4341b"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Dense Model Properties\n",
- "We will display our model's architecture, including layers, their types, and the number of parameters.\n",
- "Notably, MCT's structured pruning will target the first two dense layers for pruning, as these layers have a higher number of channels compared to later layers, offering more opportunities for pruning without affecting accuracy significantly. This reduction can be effectively propagated by adjusting the input channels of subsequent layers."
- ],
- "metadata": {
- "collapsed": false,
- "id": "8af6f660db438605"
- },
- "id": "8af6f660db438605"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "def display_model_params(model):\n",
- " model_params = sum(p.numel() for p in model.state_dict().values())\n",
- " for name, module in model.named_modules():\n",
- " module_params = sum(p.numel() for p in module.state_dict().values())\n",
- " if module_params > 0:\n",
- " print(f'{name} number of parameters {module_params}')\n",
- " print(f'{model}\\nTotal number of parameters {model_params}')\n",
- " return model_params\n",
- "\n",
- "dense_model_params = display_model_params(dense_model)"
- ],
- "metadata": {
- "id": "b0741833e5af5c4f"
- },
- "id": "b0741833e5af5c4f"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Create a Representative Dataset\n",
- "We are creating a representative dataset to guide our model pruning process for computing importance score for each channel:"
- ],
- "metadata": {
- "collapsed": false,
- "id": "9efc6fd59b15662"
- },
- "id": "9efc6fd59b15662"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "# Create a representative dataset\n",
- "ds_train_as_iter = iter(train_loader)\n",
- "\n",
- "def representative_data_gen() -> list:\n",
- " yield [next(ds_train_as_iter)[0]]"
- ],
- "metadata": {
- "id": "f0e2bbdb3df563d3"
- },
- "id": "f0e2bbdb3df563d3"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Pruning the Model\n",
- "Next,we'll proceed with pruning our trained model to decrease its size, targeting a 50% reduction in the memory footprint of the model's weights. Given that the model's weights utilize the float32 data type, where each parameter occupies 4 bytes, we calculate the memory requirement by multiplying the total number of parameters by 4."
- ],
- "metadata": {
- "collapsed": false,
- "id": "ac6c6db5635f8950"
- },
- "id": "ac6c6db5635f8950"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "compression_ratio = 0.5\n",
- "# Define Resource Utilization constraint for pruning. Each float32 parameter requires 4 bytes,\n",
- "# hence we multiply the total parameter count by 4 to calculate the memory footprint.\n",
- "target_resource_utilization = mct.core.ResourceUtilization(weights_memory=dense_model_params * 4 * compression_ratio)\n",
- "# Define a pruning configuration\n",
- "pruning_config=mct.pruning.PruningConfig(num_score_approximations=1)\n",
- "# Prune the model\n",
- "pruned_model, pruning_info = mct.pruning.pytorch_pruning_experimental(model=dense_model, target_resource_utilization=target_resource_utilization, representative_data_gen=representative_data_gen, pruning_config=pruning_config)"
- ],
- "metadata": {
- "id": "b524e66fbb96abd"
- },
- "id": "b524e66fbb96abd"
- },
- {
- "cell_type": "markdown",
- "source": [
- "### Model after pruning\n",
- "Let us view the model after the pruning operation and check the accuracy. We can see that pruning process caused a degradation in accuracy."
- ],
- "metadata": {
- "collapsed": false,
- "id": "9bf54933cd496543"
- },
- "id": "9bf54933cd496543"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "pruned_model_nparams = display_model_params(pruned_model)\n",
- "acc_before_retrain = test_model(pruned_model, test_loader)\n",
- "print(f'Pruned model accuracy before retraining {acc_before_retrain}%')"
- ],
- "metadata": {
- "id": "8c5cfe6b555acf63"
- },
- "id": "8c5cfe6b555acf63"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Retraining the Pruned Model\n",
- "After pruning, we often need to retrain the model to recover any lost performance."
- ],
- "metadata": {
- "collapsed": false,
- "id": "a3eaaa9bb34ebf71"
- },
- "id": "a3eaaa9bb34ebf71"
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "outputs": [],
- "source": [
- "pruned_model_retrained = train_model(pruned_model, train_loader, test_loader, device, epochs=6)"
- ],
- "metadata": {
- "id": "9909464707e538a4"
- },
- "id": "9909464707e538a4"
- },
- {
- "cell_type": "markdown",
- "source": [
- "## Summary\n",
- "In this tutorial, we demonstrated the process of training, pruning, and retraining a neural network model using the Model Compression Toolkit. We began by setting up our environment and loading the dataset, followed by building and training a fully connected neural network. We then introduced the concept of model pruning, specifically targeting the first two dense layers to efficiently reduce the model's memory footprint by 50%. After applying structured pruning, we evaluated the pruned model's performance and concluded the tutorial by fine-tuning the pruned model to recover any lost accuracy due to the pruning process. This tutorial provided a hands-on approach to model optimization through pruning, showcasing the balance between model size, performance, and efficiency."
- ],
- "metadata": {
- "collapsed": false,
- "id": "b5d01318e1c2c02d"
- },
- "id": "b5d01318e1c2c02d"
- },
- {
- "cell_type": "markdown",
- "source": [
- "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",
- "\n",
- "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "you may not use this file except in compliance with the License.\n",
- "You may obtain a copy of the License at\n",
- "\n",
- " http://www.apache.org/licenses/LICENSE-2.0\n",
- "\n",
- "Unless required by applicable law or agreed to in writing, software\n",
- "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "See the License for the specific language governing permissions and\n",
- "limitations under the License.\n"
- ],
- "metadata": {
- "collapsed": false,
- "id": "955d184da72b36c1"
- },
- "id": "955d184da72b36c1"
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 2
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython2",
- "version": "2.7.6"
- },
- "colab": {
- "provenance": []
- }
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Structured Pruning of a Fully-Connected PyTorch Model using the Model Compression Toolkit (MCT)\n",
+ "\n",
+ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_pruning_mnist.ipynb)\n",
+ "\n",
+ "## Overview\n",
+ "This tutorial provides a step-by-step guide to training, pruning, and retraining a fully connected neural network model using PyTorch. We will start by building and training the model from scratch on the MNIST dataset, followed by applying structured pruning to reduce the model size.\n",
+ "\n",
+ "## Summary\n",
+ "In this tutorial, we will cover:\n",
+ "\n",
+ "1. **Training a PyTorch model on MNIST:** We'll begin by constructing a basic fully connected neural network and training it on the MNIST dataset. \n",
+ "2. **Applying structured pruning:** We'll introduce a pruning technique to reduce model size while maintaining performance. \n",
+ "3. **Retraining the pruned model:** After pruning, we'll retrain the model to recover any lost accuracy. \n",
+ "4. **Evaluating the pruned model:** We'll evaluate the pruned model’s performance and compare it to the original model.\n",
+ "\n",
+ "## Setup\n",
+ "Install the relevant packages:"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "4f2fe8612d323dd7"
},
- "nbformat": 4,
- "nbformat_minor": 5
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "!pip install -q torch torchvision"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "45e5057240e9db2d"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import importlib\n",
+ "if not importlib.util.find_spec('model_compression_toolkit'):\n",
+ " !pip install model_compression_toolkit"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "fac1bac87df87eb4"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import torch\n",
+ "import torch.nn as nn\n",
+ "import torch.optim as optim\n",
+ "from torch.utils.data import DataLoader\n",
+ "import torch.nn.functional as F\n",
+ "from torchvision import datasets, transforms"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "eea35a06ae612b5b"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Train a Pytorch classifier model on MNIST\n",
+ "Next, we'll define a function to train our neural network model. This function will handle the training loop, including forward propagation, loss calculation, backpropagation, and updating the model parameters. Additionally, we'll evaluate the model's performance on the validation dataset at the end of each epoch to monitor its accuracy. The following code snippets are adapted from the official [PyTorch examples](https://github.com/pytorch/examples/blob/main/mnist/main.py)."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "9e159019685961bc"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "def train(model, device, train_loader, optimizer, epoch):\n",
+ " model.train()\n",
+ " model.to(device)\n",
+ " for batch_idx, (data, target) in enumerate(train_loader):\n",
+ " data, target = data.to(device), target.to(device)\n",
+ " optimizer.zero_grad()\n",
+ " output = model(data)\n",
+ " loss = F.nll_loss(output, target)\n",
+ "\n",
+ " loss.backward()\n",
+ " optimizer.step()\n",
+ " if batch_idx % 100 == 0:\n",
+ " print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
+ " epoch, batch_idx * len(data), len(train_loader.dataset),\n",
+ " 100. * batch_idx / len(train_loader), loss.item()))\n",
+ "\n",
+ "\n",
+ "def test(model, device, test_loader):\n",
+ " model.eval()\n",
+ " test_loss = 0\n",
+ " correct = 0\n",
+ " with torch.no_grad():\n",
+ " for data, target in test_loader:\n",
+ " data, target = data.to(device), target.to(device)\n",
+ " output = model(data)\n",
+ " test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss\n",
+ " pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability\n",
+ " correct += pred.eq(target.view_as(pred)).sum().item()\n",
+ "\n",
+ " test_loss /= len(test_loader.dataset)\n",
+ " accuracy = 100. * correct / len(test_loader.dataset)\n",
+ " \n",
+ " print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n",
+ " test_loss, correct, len(test_loader.dataset),\n",
+ " accuracy))\n",
+ " \n",
+ " return accuracy \n",
+ "\n",
+ "random_seed = 1\n",
+ "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+ "torch.backends.cudnn.enabled = False\n",
+ "torch.manual_seed(random_seed)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "fc1cd6067ea0edb0"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Creating a Fully-Connected Model\n",
+ "In this section, we create a simple example of a fully connected model to demonstrate the pruning process. It consists of three linear layers with 128, 64, and 10 neurons."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "88e035b343d63af"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "# Define the Fully-Connected Model\n",
+ "class FCModel(nn.Module):\n",
+ " def __init__(self):\n",
+ " super(FCModel, self).__init__()\n",
+ " self.flatten = nn.Flatten()\n",
+ " self.fc_layers = nn.Sequential(\n",
+ " nn.Linear(28*28, 128),\n",
+ " nn.ReLU(),\n",
+ " nn.Linear(128, 64),\n",
+ " nn.ReLU(),\n",
+ " nn.Linear(64, 10)\n",
+ " )\n",
+ "\n",
+ " def forward(self, x):\n",
+ " x = self.flatten(x)\n",
+ " logits = self.fc_layers(x)\n",
+ " output = F.log_softmax(logits, dim=1)\n",
+ " return output"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "b77ae732359978b2"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Loading and Preprocessing MNIST Dataset\n",
+ "Let's define the dataset loaders to retrieve the train and test parts of the MNIST dataset, including preprocessing:"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "567482fb76082cfe"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "batch_size = 128\n",
+ "test_batch_size = 1000\n",
+ "\n",
+ "transform=transforms.Compose([\n",
+ " transforms.ToTensor(),\n",
+ " transforms.Normalize((0.1307,), (0.3081,))\n",
+ " ])\n",
+ "dataset_folder = './mnist'\n",
+ "train_dataset = datasets.MNIST(dataset_folder, train=True, download=True,\n",
+ " transform=transform)\n",
+ "test_dataset = datasets.MNIST(dataset_folder, train=False,\n",
+ " transform=transform)\n",
+ "train_loader = torch.utils.data.DataLoader(train_dataset, pin_memory=True, batch_size=batch_size, shuffle=True)\n",
+ "test_loader = torch.utils.data.DataLoader(test_dataset, pin_memory=True, batch_size=test_batch_size, shuffle=False)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "7ba3424b2ac17a66"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Training the Dense Model\n",
+ "We will now train the dense model using the MNIST dataset."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "219e047aa790e812"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "epochs = 6\n",
+ "lr = 0.001\n",
+ "\n",
+ "dense_model = FCModel().to(device)\n",
+ "optimizer = optim.Adam(dense_model.parameters(), lr=lr)\n",
+ "for epoch in range(1, epochs + 1):\n",
+ " train(dense_model, device, train_loader, optimizer, epoch)\n",
+ " test(dense_model, device, test_loader)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "37ef306565024207"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Dense Model Properties\n",
+ "We will display our model's architecture, including layers, their types, and the number of parameters.\n",
+ "Notably, MCT's structured pruning will target the first two dense layers for pruning, as these layers have a higher number of channels compared to later layers, offering more opportunities for pruning without affecting accuracy significantly. This reduction can be effectively propagated by adjusting the input channels of subsequent layers."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "1566c1c7bbc7cf79"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "def display_model_params(model):\n",
+ " model_params = sum(p.numel() for p in model.parameters())\n",
+ " for name, module in model.named_modules():\n",
+ " module_params = sum(p.numel() for p in module.state_dict().values())\n",
+ " if module_params > 0:\n",
+ " print(f'{name} number of parameters {module_params}')\n",
+ " print(f'\\nTotal number of parameters {model_params}')\n",
+ " return model_params\n",
+ "\n",
+ "dense_model_params = display_model_params(dense_model)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "6e661b6cb0414e90"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Create a Representative Dataset\n",
+ "We are creating a representative dataset to guide the pruning process for computing importance score for each channel:"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "7297a549c27a4b8e"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "n_iter=10\n",
+ "\n",
+ "def representative_dataset_gen():\n",
+ " dataloader_iter = iter(train_loader)\n",
+ " for _ in range(n_iter):\n",
+ " yield [next(dataloader_iter)[0]]\n",
+ " "
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "f22aab1989c92e25"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Pruning the Model\n",
+ "Next,we'll proceed with pruning our trained model to decrease its size, targeting a 50% reduction in the memory footprint of the model's weights. Given that the model's weights utilize the float32 data type, where each parameter occupies 4 bytes, we calculate the memory requirement by multiplying the total number of parameters by 4."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "4ae781eb7d420ad"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import model_compression_toolkit as mct\n",
+ "compression_ratio = 0.5\n",
+ "\n",
+ "# Define Resource Utilization constraint for pruning. Each float32 parameter requires 4 bytes, hence we multiply the total parameter count by 4 to calculate the memory footprint.\n",
+ "target_resource_utilization = mct.core.ResourceUtilization(weights_memory=dense_model_params * 4 * compression_ratio)\n",
+ "\n",
+ "# Define a pruning configuration\n",
+ "pruning_config=mct.pruning.PruningConfig(num_score_approximations=1)\n",
+ "\n",
+ "# Prune the model\n",
+ "pruned_model, pruning_info = mct.pruning.pytorch_pruning_experimental(\n",
+ " model=dense_model,\n",
+ " target_resource_utilization=target_resource_utilization, \n",
+ " representative_data_gen=representative_dataset_gen, \n",
+ " pruning_config=pruning_config)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "96f9ca0490343c18"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Model after pruning\n",
+ "Let us view the model after the pruning operation and check the accuracy. We can see that pruning process caused a degradation in accuracy."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "ad14328ce33ecb97"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "pruned_model_nparams = display_model_params(pruned_model)\n",
+ "acc_before_retrain = test(pruned_model, device, test_loader)\n",
+ "print(f'Pruned model accuracy before retraining {acc_before_retrain}%')"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "85ee17d3804a61bc"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Retraining the Pruned Model\n",
+ "After pruning, we often need to retrain the model to recover any lost performance."
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "6fd438bff45aded3"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "optimizer = optim.Adam(pruned_model.parameters(), lr=lr)\n",
+ "for epoch in range(1, epochs + 1):\n",
+ " train(pruned_model, device, train_loader, optimizer, epoch)\n",
+ " test(pruned_model, device, test_loader)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "81250c2caca111a8"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now, we can export the quantized model to ONNX:"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "29c044b7180c42c"
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "mct.exporter.pytorch_export_model(pruned_model, save_model_path='qmodel.onnx', repr_dataset=representative_dataset_gen)"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "be1eec6652169d4e"
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Conclusions\n",
+ "In this tutorial, we demonstrated the process of training, pruning, and retraining a neural network model using the Model Compression Toolkit (MCT). We began by setting up our environment and loading the dataset, followed by building and training a fully connected neural network. We then introduced the concept of model pruning, specifically targeting the first two dense layers to efficiently reduce the model's memory footprint by 50%. After applying structured pruning, we evaluated the pruned model's performance and concluded the tutorial by fine-tuning the pruned model to recover any lost accuracy due to the pruning process. This tutorial provided a hands-on approach to model optimization through pruning, showcasing the balance between model size, performance, and efficiency.\n",
+ "\n",
+ "## Copyrights\n",
+ "Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.\n",
+ "\n",
+ "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "you may not use this file except in compliance with the License.\n",
+ "You may obtain a copy of the License at\n",
+ "\n",
+ " http://www.apache.org/licenses/LICENSE-2.0\n",
+ "\n",
+ "Unless required by applicable law or agreed to in writing, software\n",
+ "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "See the License for the specific language governing permissions and\n",
+ "limitations under the License.\n"
+ ],
+ "metadata": {
+ "collapsed": false
+ },
+ "id": "68a927746e0f8d2f"
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 2
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython2",
+ "version": "2.7.6"
+ },
+ "colab": {
+ "provenance": []
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
}
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb
deleted file mode 100644
index 768f145c9..000000000
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb
+++ /dev/null
@@ -1,401 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "7cf96fb4",
- "metadata": {
- "id": "7cf96fb4"
- },
- "source": [
- "# Quantization using the Model Compression Toolkit - example in Pytorch\n",
- "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ptq_mnist.ipynb)\n",
- "\n",
- "## Overview\n",
- "This quick-start guide explains how to use the Model Compression Toolkit (MCT) to quantize a PyTorch model. We'll provide an end-to-end example, starting with training a model from scratch on MNIST dataset and then applying MCT for quantization.\n",
- "\n",
- "## Summary\n",
- "In this tutorial, we will explore the following:\n",
- "\n",
- "**1. Training a PyTorch model from scratch on MNIST:** We'll start by building a basic PyTorch model and training it on the MNIST dataset.\n",
- "**2. Quantizing the model using 8-bit activations and weights:** We'll employ a hardware-friendly quantization technique, such as symmetric quantization with power-of-2 thresholds.\n",
- "**3. Evaluating the quantized model:** We'll compare the performance of the quantized model to the original model, focusing on accuracy.\n",
- "**4. Analyzing compression gains:** We'll estimate the compression achieved by quantization.\n",
- "\n",
- "## Setup\n",
- "Install the relevant packages:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "89e0bb04",
- "metadata": {
- "id": "89e0bb04"
- },
- "outputs": [],
- "source": [
- "! pip install -q torch\n",
- "! pip install -q torchvision\n",
- "! pip install -q onnx"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "5441efd2978cea5a",
- "metadata": {},
- "outputs": [],
- "source": [
- "import importlib\n",
- "if not importlib.util.find_spec('model_compression_toolkit'):\n",
- " !pip install model_compression_toolkit"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "a82928d0",
- "metadata": {
- "id": "a82928d0"
- },
- "outputs": [],
- "source": [
- "import torch\n",
- "import torch.nn as nn\n",
- "import torch.nn.functional as F\n",
- "import torch.optim as optim\n",
- "from torchvision import datasets, transforms\n",
- "from torch.optim.lr_scheduler import StepLR"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "02312089",
- "metadata": {
- "id": "02312089"
- },
- "source": [
- "## Train a Pytorch classifier model on MNIST\n",
- "Let's define the network and a few helper functions to train and evaluate the model. The following code snippets are adapted from the official PyTorch examples: https://github.com/pytorch/examples/blob/main/mnist/main.py"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "16f9bcc0",
- "metadata": {
- "id": "16f9bcc0"
- },
- "outputs": [],
- "source": [
- "class Net(nn.Module):\n",
- " def __init__(self):\n",
- " super(Net, self).__init__()\n",
- " self.conv1 = nn.Conv2d(1, 32, 3, 1)\n",
- " self.conv2 = nn.Conv2d(32, 64, 3, 1)\n",
- " self.dropout1 = nn.Dropout(0.25)\n",
- " self.dropout2 = nn.Dropout(0.5)\n",
- " self.fc1 = nn.Linear(9216, 128)\n",
- " self.fc2 = nn.Linear(128, 10)\n",
- "\n",
- " def forward(self, x):\n",
- " x = self.conv1(x)\n",
- " x = F.relu(x)\n",
- " x = self.conv2(x)\n",
- " x = F.relu(x)\n",
- " x = F.max_pool2d(x, 2)\n",
- " x = self.dropout1(x)\n",
- " x = torch.flatten(x, 1)\n",
- " x = self.fc1(x)\n",
- " x = F.relu(x)\n",
- " x = self.dropout2(x)\n",
- " x = self.fc2(x)\n",
- " output = F.log_softmax(x, dim=1)\n",
- " return output\n",
- "\n",
- "\n",
- "def train(model, device, train_loader, optimizer, epoch):\n",
- " model.train()\n",
- " for batch_idx, (data, target) in enumerate(train_loader):\n",
- " data, target = data.to(device), target.to(device)\n",
- " optimizer.zero_grad()\n",
- " output = model(data)\n",
- " loss = F.nll_loss(output, target)\n",
- " loss.backward()\n",
- " optimizer.step()\n",
- " if batch_idx % 100 == 0:\n",
- " print('Train Epoch: {} [{}/{} ({:.0f}%)]\\tLoss: {:.6f}'.format(\n",
- " epoch, batch_idx * len(data), len(train_loader.dataset),\n",
- " 100. * batch_idx / len(train_loader), loss.item()))\n",
- "\n",
- "\n",
- "def test(model, device, test_loader):\n",
- " model.eval()\n",
- " test_loss = 0\n",
- " correct = 0\n",
- " with torch.no_grad():\n",
- " for data, target in test_loader:\n",
- " data, target = data.to(device), target.to(device)\n",
- " output = model(data)\n",
- " test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss\n",
- " pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability\n",
- " correct += pred.eq(target.view_as(pred)).sum().item()\n",
- "\n",
- " test_loss /= len(test_loader.dataset)\n",
- "\n",
- " print('\\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\\n'.format(\n",
- " test_loss, correct, len(test_loader.dataset),\n",
- " 100. * correct / len(test_loader.dataset)))\n",
- "\n",
- "batch_size = 64\n",
- "test_batch_size = 1000\n",
- "random_seed = 1\n",
- "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
- "torch.backends.cudnn.enabled = False\n",
- "torch.manual_seed(random_seed)\n",
- "epochs = 2\n",
- "gamma = 0.7\n",
- "lr = 1.0"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c24d3c5a",
- "metadata": {
- "id": "c24d3c5a"
- },
- "source": [
- "Let's define the dataset loaders and optimizer, then train the model for 2 epochs."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "c615a27e",
- "metadata": {
- "id": "c615a27e"
- },
- "outputs": [],
- "source": [
- "transform=transforms.Compose([\n",
- " transforms.ToTensor(),\n",
- " transforms.Normalize((0.1307,), (0.3081,))\n",
- " ])\n",
- "dataset_folder = './mnist'\n",
- "train_dataset = datasets.MNIST(dataset_folder, train=True, download=True,\n",
- " transform=transform)\n",
- "test_dataset = datasets.MNIST(dataset_folder, train=False,\n",
- " transform=transform)\n",
- "train_loader = torch.utils.data.DataLoader(train_dataset, num_workers=1, pin_memory=True, batch_size=batch_size, shuffle=True)\n",
- "test_loader = torch.utils.data.DataLoader(test_dataset, num_workers=1, pin_memory=True, batch_size=test_batch_size, shuffle=False)\n",
- "\n",
- "model = Net().to(device)\n",
- "optimizer = optim.Adadelta(model.parameters(), lr=lr)\n",
- "\n",
- "scheduler = StepLR(optimizer, step_size=1, gamma=gamma)\n",
- "for epoch in range(1, epochs + 1):\n",
- " train(model, device, train_loader, optimizer, epoch)\n",
- " test(model, device, test_loader)\n",
- " scheduler.step()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c0321aad",
- "metadata": {
- "id": "c0321aad"
- },
- "source": [
- "## Representative Dataset\n",
- "For quantization with MCT, we need to define a representative dataset required by the Post-Training Quantization (PTQ) algorithm. This dataset is a generator that returns a list of images:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "618975be",
- "metadata": {
- "id": "618975be"
- },
- "outputs": [],
- "source": [
- "n_iter=10\n",
- "\n",
- "def representative_dataset_gen():\n",
- " dataloader_iter = iter(train_loader)\n",
- " for _ in range(n_iter):\n",
- " yield [next(dataloader_iter)[0]]"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d0a92bee",
- "metadata": {
- "id": "d0a92bee"
- },
- "source": [
- "## Hardware-friendly quantization using MCT\n",
- "Now for the exciting part! Let’s run hardware-friendly post-training quantization on the model. "
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "63f695dd",
- "metadata": {
- "id": "63f695dd"
- },
- "outputs": [],
- "source": [
- "import model_compression_toolkit as mct\n",
- "\n",
- "# Define a `TargetPlatformCapability` object, representing the HW specifications on which we wish to eventually deploy our quantized model.\n",
- "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')\n",
- "\n",
- "quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
- " in_module=model,\n",
- " representative_data_gen=representative_dataset_gen,\n",
- " target_platform_capabilities=target_platform_cap\n",
- ")"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d3521637",
- "metadata": {
- "id": "d3521637"
- },
- "source": [
- "Our model is now quantized. MCT has created a simulated quantized model within the original PyTorch framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `PytorchQuantizationWrapper` and `PytorchActivationQuantizationHolder`, wrap PyTorch layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x. Let's print the quantized model and examine the quantization modules:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "4f5fa4a2",
- "metadata": {
- "id": "4f5fa4a2"
- },
- "outputs": [],
- "source": [
- "print(quantized_model)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c677bd61c3ab4649",
- "metadata": {},
- "source": [
- "## Models evaluation\n",
- "Using the simulated quantized model, we can evaluate its performance and compare the results to the floating-point model.\n",
- "\n",
- "Let's start with the floating-point model evaluation."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "fdd038f7aff8cde7",
- "metadata": {},
- "outputs": [],
- "source": [
- "test(model, device, test_loader)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f4f564f31e253f5c",
- "metadata": {},
- "source": [
- "Finally, let's evaluate the quantized model:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "c9da2134f0bde415",
- "metadata": {},
- "outputs": [],
- "source": [
- "test(quantized_model, device, test_loader)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "fd09fa27",
- "metadata": {
- "id": "fd09fa27"
- },
- "source": [
- "Now, we can export the quantized model to ONNX:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "oXMn6bFjbQad",
- "metadata": {
- "id": "oXMn6bFjbQad"
- },
- "outputs": [],
- "source": [
- "mct.exporter.pytorch_export_model(quantized_model, save_model_path='qmodel.onnx', repr_dataset=representative_dataset_gen)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bb7e1572",
- "metadata": {
- "id": "bb7e1572"
- },
- "source": [
- "## Conclusion\n",
- "\n",
- "In this tutorial, we demonstrated how to quantize a classification model for MNIST in a hardware-friendly manner using MCT. We observed that a 4x compression ratio was achieved with minimal performance degradation.\n",
- "\n",
- "The key advantage of hardware-friendly quantization is that the model can run more efficiently in terms of runtime, power consumption, and memory usage on designated hardware.\n",
- "\n",
- "While this was a simple model and task, MCT can deliver competitive results across a wide range of tasks and network architectures. For more details, [check out the paper:](https://arxiv.org/abs/2109.09113).\n",
- "\n",
- "\n",
- "Copyright 2022 Sony Semiconductor Israel, Inc. All rights reserved.\n",
- "\n",
- "Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "you may not use this file except in compliance with the License.\n",
- "You may obtain a copy of the License at\n",
- "\n",
- " http://www.apache.org/licenses/LICENSE-2.0\n",
- "\n",
- "Unless required by applicable law or agreed to in writing, software\n",
- "distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "See the License for the specific language governing permissions and\n",
- "limitations under the License.\n"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "provenance": []
- },
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.10.4"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ssdlite_mobilenetv3_object_detection.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ssdlite_mobilenetv3_object_detection.ipynb
deleted file mode 100644
index bca484564..000000000
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ssdlite_mobilenetv3_object_detection.ipynb
+++ /dev/null
@@ -1,476 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "4c261298-309f-41e8-9338-a5e205f09b05",
- "metadata": {},
- "source": [
- "# Post Training Quantization a Pytorch Object Detection Model - A Quick-Start Guide\n",
- "\n",
- "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_ssdlite_mobilenetv3_object_detection.ipynb)\n",
- "\n",
- "## Overview\n",
- "\n",
- "This tutorial shows how to quantize a pre-trained object detection model from the torchvision package using the Model-Compression-Toolkit (MCT). We will do so by giving an example of MCT's post-training quantization. As we will see, post-training quantization is a low complexity yet effective quantization method. In this example, we will quantize the model and evaluate the accuracy before and after quantization.\n",
- "\n",
- "As the pretrained object detection model contains a preprocessing and postprocessing layers that their quantization with MCT is out of this notebook's scope, we'll separate these layers from the model-to-quantize. These layers will be included in the evaluation code.\n",
- "\n",
- "## Summary\n",
- "\n",
- "In this tutorial we will cover:\n",
- "\n",
- "1. Post-Training Quantization using MCT.\n",
- "2. Loading and preprocessing COCO's validation dataset.\n",
- "3. Loading and preprocessing an unlabeled representative dataset from the COCO trainset.\n",
- "4. Accuracy evaluation of the floating-point and the quantized models."
- ]
- },
- {
- "cell_type": "markdown",
- "id": "865ce67a-ce08-4f5a-bf70-e54c63774163",
- "metadata": {},
- "source": [
- "## Setup\n",
- "\n",
- "Install and import the relevant packages:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "4a1bc130-3ed1-4815-8fd9-520fa66db8e1",
- "metadata": {},
- "outputs": [],
- "source": [
- "!pip install -q torch torchvision torchaudio\n",
- "!pip install -q pycocotools\n",
- "!pip install -q model-compression-toolkit"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "6ed80e16-1579-4274-9f3b-3939da8dd8a2",
- "metadata": {
- "is_executing": true
- },
- "outputs": [],
- "source": [
- "import torch\n",
- "import torchvision\n",
- "from torchvision.models.detection.ssdlite import SSDLite320_MobileNet_V3_Large_Weights\n",
- "from torchvision.models.detection.anchor_utils import ImageList\n",
- "import model_compression_toolkit as mct\n",
- "from pycocotools.coco import COCO\n",
- "from pycocotools.cocoeval import COCOeval"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "084c2b8b-3175-4d46-a18a-7c4d8b6fcb38",
- "metadata": {},
- "source": [
- "## Float Model\n",
- "\n",
- "### Load float model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "e8395b28-4732-4d18-b081-5d3bdf508691",
- "metadata": {},
- "outputs": [],
- "source": [
- "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
- "\n",
- "image_size = (320, 320)\n",
- "model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(weights=SSDLite320_MobileNet_V3_Large_Weights.DEFAULT)\n",
- "# mAP=0.2131 (float)\n",
- "# mAP=0.2007 (quantized)\n",
- "\n",
- "model.eval()\n",
- "model = model.to(device)\n",
- "print('model loaded')"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "8a0f6df2-f812-4fd5-91e6-db5d12d96713",
- "metadata": {},
- "source": [
- "### Evaluate float model"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c69a1499-5b24-4737-969d-0c27dca97ea5",
- "metadata": {},
- "source": [
- "#### Create the COCO evaluation metric"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "12a0fc0e-ec2f-465a-987d-c7a4d632296f",
- "metadata": {},
- "outputs": [],
- "source": [
- "def format_results(outputs, img_ids):\n",
- " detections = []\n",
- "\n",
- " # Process model outputs and convert to detection format\n",
- " for idx, output in enumerate(outputs):\n",
- " image_id = img_ids[idx] # Adjust according to your batch size and indexing\n",
- " scores = output['scores'].cpu().numpy()\n",
- " labels = output['labels'].cpu().numpy()\n",
- " boxes = output['boxes'].cpu().numpy()\n",
- "\n",
- " for score, label, box in zip(scores, labels, boxes):\n",
- " detection = {\n",
- " \"image_id\": image_id,\n",
- " \"category_id\": label,\n",
- " \"bbox\": [box[0], box[1], box[2] - box[0], box[3] - box[1]],\n",
- " \"score\": score\n",
- " }\n",
- " detections.append(detection)\n",
- "\n",
- " return detections\n",
- "\n",
- "\n",
- "class CocoEval:\n",
- " def __init__(self, path2json):\n",
- "\n",
- " # Load ground truth annotations\n",
- " self.coco_gt = COCO(path2json)\n",
- "\n",
- " # A list of reformatted model outputs\n",
- " self.all_detections = []\n",
- "\n",
- " def add_batch_detections(self, outputs, targets):\n",
- "\n",
- " # Collect and format results from the batch\n",
- " img_ids, _outs = [], []\n",
- " for t, o in zip(targets, outputs):\n",
- " if len(t) > 0:\n",
- " img_ids.append(t[0]['image_id'])\n",
- " _outs.append(o)\n",
- "\n",
- " batch_detections = format_results(_outs, img_ids) # Implement this function\n",
- "\n",
- " self.all_detections.extend(batch_detections)\n",
- "\n",
- " def result(self):\n",
- " # Initialize COCO evaluation object\n",
- " self.coco_dt = self.coco_gt.loadRes(self.all_detections)\n",
- " coco_eval = COCOeval(self.coco_gt, self.coco_dt, 'bbox')\n",
- "\n",
- " # Run evaluation\n",
- " coco_eval.evaluate()\n",
- " coco_eval.accumulate()\n",
- " coco_eval.summarize()\n",
- "\n",
- " # Print mAP results\n",
- " print(\"mAP: {:.4f}\".format(coco_eval.stats[0]))\n",
- "\n",
- " return coco_eval.stats\n",
- "\n",
- " def reset(self):\n",
- " self.all_detections = []"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "3cde2f8e-0642-4374-a1f4-df2775fe7767",
- "metadata": {},
- "source": [
- "#### Evaluate float model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "56393342-cecf-4f64-b9ca-2f515c765942",
- "metadata": {},
- "outputs": [],
- "source": [
- "EVAL_DATASET_FOLDER = '/path/to/coco/evaluation/images/val2017'\n",
- "EVAL_DATASET_ANNOTATION_FILE = '/path/to/coco/annotations/instances_val2017.json'\n",
- "\n",
- "\n",
- "# The float model accepts a list of images in their original shapes and preprocesses them inside, so collate the batch images as a list\n",
- "def collate_fn(batch_input):\n",
- " images = [b[0] for b in batch_input]\n",
- " targets = [b[1] for b in batch_input]\n",
- " return images, targets\n",
- "\n",
- "\n",
- "# Initialize the COCO evaluation DataLoader\n",
- "coco_eval = torchvision.datasets.CocoDetection(root=EVAL_DATASET_FOLDER,\n",
- " annFile=EVAL_DATASET_ANNOTATION_FILE,\n",
- " transform=torchvision.transforms.ToTensor())\n",
- "batch_size = 50\n",
- "data_loader = torch.utils.data.DataLoader(coco_eval, batch_size=batch_size, shuffle=False,\n",
- " num_workers=0, collate_fn=collate_fn)\n",
- "\n",
- "# Initialize the evaluation metric object\n",
- "coco_metric = CocoEval(EVAL_DATASET_ANNOTATION_FILE)\n",
- "\n",
- "# Iterate and evaluate the COCO evaluation set\n",
- "for batch_idx, (images, targets) in enumerate(data_loader):\n",
- " # Run inference on the batch\n",
- " images = list(image.to(device) for image in images)\n",
- " with torch.no_grad():\n",
- " outputs = model(images)\n",
- "\n",
- " # Add the model outputs to metric object (a dictionary of outputs after postprocess: boxes, scores & classes)\n",
- " coco_metric.add_batch_detections(outputs, targets)\n",
- " if (batch_idx+1) % 10 == 0:\n",
- " print(f'processed {(batch_idx+1)*data_loader.batch_size} images')\n",
- "\n",
- "# Print float model mAP results\n",
- "print(\"Float model mAP: {:.4f}\".format(coco_metric.result()[0]))\n"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "015e760b-6555-45b4-aaf9-500e974c1d86",
- "metadata": {},
- "source": [
- "## Quantize Model\n",
- "\n",
- "### Extract model to be quantized\n",
- "\n",
- "Extract the float model's backcone and head, and construct a torch model that only contains them"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "01e90967-594b-480f-b2e6-45e2c9ce9cee",
- "metadata": {},
- "outputs": [],
- "source": [
- "class SDD4Quant(torch.nn.Module):\n",
- " def __init__(self, in_sdd, *args, **kwargs):\n",
- " super().__init__(*args, **kwargs)\n",
- " # Save the float model under self.base as a module of the model. Later we'll only run \"backbone\" & \"head\"\n",
- " self.add_module(\"base\", in_sdd)\n",
- "\n",
- " # Forward pass of the model to be quantized. This code is copied from the float model forward function (removed the preprocess and postprocess code)\n",
- " def forward(self, x):\n",
- " features = self.base.backbone(x)\n",
- "\n",
- " features = list(features.values())\n",
- "\n",
- " # compute the ssd heads outputs using the features\n",
- " head_outputs = self.base.head(features)\n",
- " return head_outputs\n",
- "\n",
- "\n",
- "model4quant = SDD4Quant(model)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "08fb59fd-3877-45b4-8529-7f9edb687c69",
- "metadata": {
- "tags": []
- },
- "source": [
- "### Extract preproecss and postprocess\n",
- "\n",
- "Extract the preprocess and postprocess functions from the float model object, and construct separate preprocess and postprocess functions for the representative dataset and evaluation code\n",
- "\n",
- "\n",
- "Note: the MCT output model flattens the float model output data structure to a list, so the PostProcess manually rebuilds it as the original data structure (a dictionary)."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "ff336c30-56c9-4de8-9c42-6462ddb8d2c0",
- "metadata": {
- "tags": []
- },
- "outputs": [],
- "source": [
- "def preprocess(image, targets):\n",
- " # need to save the original image sizes before resize for the postprocess part\n",
- " targets = {'gt': targets, 'img_size': list(image.size[::-1])}\n",
- " image = model.transform([torchvision.transforms.ToTensor()(image)])[0].tensors[0, ...]\n",
- " return image, targets\n",
- "\n",
- "\n",
- "# Define the postprocess, which is the code copied from the float model forward code. These layers will not be quantized.\n",
- "class PostProcess:\n",
- " def __init__(self):\n",
- " self.features = [torch.zeros((1, 1, s, s)) for s in [20, 10, 5, 3, 2, 1]]\n",
- "\n",
- " def __call__(self, head_outputs, image_list, original_image_sizes):\n",
- " anchors = [a.to(device) for a in model.anchor_generator(image_list, self.features)]\n",
- "\n",
- " # The MCT flattens the outputs of the head to a list, so need to change it to a dictionary as the psotprocess functions expect.\n",
- " if not isinstance(head_outputs, dict):\n",
- " if head_outputs[0].shape[-1] == 4:\n",
- " head_outputs = {\"bbox_regression\": head_outputs[0],\n",
- " \"cls_logits\": head_outputs[1]}\n",
- " else:\n",
- " head_outputs = {\"bbox_regression\": head_outputs[1],\n",
- " \"cls_logits\": head_outputs[0]}\n",
- "\n",
- " # Float model postprocess functions that handle box regression and NMS\n",
- " detections = model.postprocess_detections(head_outputs, anchors, image_list.image_sizes)\n",
- " detections = model.transform.postprocess(detections, image_list.image_sizes, original_image_sizes)\n",
- " return detections\n",
- "\n",
- "\n",
- "postprocess = PostProcess()"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "25ef70dd-d513-48ae-b7d0-4e1cac164d06",
- "metadata": {},
- "source": [
- "### Dataset preparation\n",
- "\n",
- "Assuming we've downloaded the COCO dataset to a folder, let's set the folder path:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "8bd5d8af-7cbd-4b1c-9b20-aac316b7bbe9",
- "metadata": {},
- "outputs": [],
- "source": [
- "TRAIN_DATASET_FOLDER = '/path/to/coco/training/images/train2017'\n",
- "TRAIN_DATASET_ANNOTATION_FILE = '/path/to/coco/annotations/instances_train2017.json'"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "bf80ebaf-7ae6-4b34-ae48-8463fc47a40d",
- "metadata": {
- "tags": []
- },
- "source": [
- "Now, let's create two dataset loader objects:\n",
- "* Train DataLoader that we'll use to create the representative dataset for the quantization calibration.\n",
- "* Evaluation DataLoader that we'll use the evaluate the quantized model.\n",
- "\n",
- "Note that both objects include the \"preprocess\" function defined above."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "e92968e1-cd96-44a1-9ced-bcefad721de2",
- "metadata": {},
- "outputs": [],
- "source": [
- "def train_collate_fn(batch_input):\n",
- " # collating images for the quantized model should return a single tensor: [B, C, H, W]\n",
- " images = torch.stack([b[0] for b in batch_input])\n",
- " targets = [b[1] for b in batch_input]\n",
- " return images, targets\n",
- "\n",
- "\n",
- "coco_train = torchvision.datasets.CocoDetection(root=TRAIN_DATASET_FOLDER, annFile=TRAIN_DATASET_ANNOTATION_FILE,\n",
- " transforms=preprocess)\n",
- "train_loader = torch.utils.data.DataLoader(coco_train, batch_size=16, shuffle=False, num_workers=0,\n",
- " collate_fn=train_collate_fn)\n",
- "\n",
- "coco_eval = torchvision.datasets.CocoDetection(root=EVAL_DATASET_FOLDER, annFile=EVAL_DATASET_ANNOTATION_FILE,\n",
- " transforms=preprocess)\n",
- "eval_loader = torch.utils.data.DataLoader(coco_eval, batch_size=50, shuffle=False, num_workers=0,\n",
- " collate_fn=train_collate_fn)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "c1d769fc-0c8f-40ce-8a97-2f69a224d73f",
- "metadata": {},
- "source": [
- "### Quantize the model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "74f39855-c63b-4e0f-844f-317f9ec8a92f",
- "metadata": {},
- "outputs": [],
- "source": [
- "def get_representative_dataset(n_iter):\n",
- " \n",
- " def representative_dataset():\n",
- " ds_iter = iter(train_loader)\n",
- " for _ in range(n_iter):\n",
- " yield [next(ds_iter)[0]]\n",
- "\n",
- " return representative_dataset\n",
- "\n",
- "\n",
- "quant_model, _ = mct.ptq.pytorch_post_training_quantization(model4quant,\n",
- " get_representative_dataset(20))"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "4fb6bffc-23d1-4852-8ec5-9007361c8eeb",
- "metadata": {},
- "source": [
- "### Evaluate quantized model"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "id": "8dc7b87c-a9f4-4568-885a-fe009c8f4e8f",
- "metadata": {},
- "outputs": [],
- "source": [
- "coco_metric = CocoEval(EVAL_DATASET_ANNOTATION_FILE)\n",
- "for batch_idx, (images, targets) in enumerate(eval_loader):\n",
- " # Run inference on the batch\n",
- " with torch.no_grad():\n",
- " outputs = quant_model(images.to(device))\n",
- " \n",
- " image_hw = [t['img_size'] for t in targets]\n",
- " image_list = ImageList(images, [image_size] * images.shape[0])\n",
- " detections = postprocess(outputs, image_list, image_hw)\n",
- "\n",
- " coco_metric.add_batch_detections(detections, [t['gt'] for t in targets])\n",
- " if (batch_idx+1) % 10 == 0:\n",
- " print(f'processed {(batch_idx+1)*data_loader.batch_size} images')\n",
- "\n",
- "# Print mAP results\n",
- "print(\"Quantized model mAP: {:.4f}\".format(coco_metric.result()[0]))\n"
- ]
- }
- ],
- "metadata": {
- "kernelspec": {
- "display_name": "Python 3 (ipykernel)",
- "language": "python",
- "name": "python3"
- },
- "language_info": {
- "codemirror_mode": {
- "name": "ipython",
- "version": 3
- },
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.8.13"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_xquant.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_xquant.ipynb
index 78631e988..f5c7f5de4 100644
--- a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_xquant.ipynb
+++ b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_xquant.ipynb
@@ -1,214 +1,210 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "ag0MtvPUkc8i"
- },
- "source": [
- "# Quantization Troubleshooting with XQuant\n",
- "\n",
- "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_xquant.ipynb)\n",
- "\n",
- "This notebook demonstrates the process of generating an Xquant report. The report provides valuable insights regarding the quality and success of the quantization process of a Pytorch model. This includes histograms and similarity metrics between the original float model and the quantized model in key points of the model. The report can be visualized using TensorBoard.\n",
- "\n",
- "## Steps:\n",
- "1. Load a pre-trained MobileNetV2 model and perform post-training quantization.\n",
- "5. Define an Xquant configuration.\n",
- "6. Generate an Xquant report to compare the float and quantized models.\n",
- "7. Visualize the report using TensorBoard."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "EonIXpPQlR_6"
- },
- "source": [
- "## Install"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "background_save": true
- },
- "id": "kCLHJUhTlPDi"
- },
- "outputs": [],
- "source": [
- "!pip install model-compression-toolkit torch torchvision\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "UUrYYDITle3z"
- },
- "source": [
- "## Import necessary libraries"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "NKKHNppSllmU"
- },
- "outputs": [],
- "source": [
- "import model_compression_toolkit as mct\n",
- "import numpy as np\n",
- "from functools import partial\n",
- "from model_compression_toolkit.xquant import XQuantConfig\n",
- "import torch"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "-4kQtkZLlnJj"
- },
- "source": [
- "## Define random data generator\n",
- "For demonstration only, we will use a random dataset generator for the representative dataset and for the validation dataset:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "background_save": true
- },
- "id": "-xM1K6tVlna8"
- },
- "outputs": [],
- "source": [
- "# Function to generate random data. If use_labels is True, it yields data with labels;\n",
- "# otherwise, it yields only data.\n",
- "def random_data_gen(shape=(3, 224, 224), use_labels=False, batch_size=2, num_iter=2):\n",
- " if use_labels:\n",
- " for _ in range(num_iter):\n",
- " yield [[torch.randn(batch_size, *shape)], torch.randn(batch_size)]\n",
- " else:\n",
- " for _ in range(num_iter):\n",
- " yield [torch.randn(batch_size, *shape)]"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "naWFGx_vl6tX"
- },
- "source": [
- "## Quantize MobileNetV2\n",
- "\n",
- "\n",
- "We will start by quantizing MobilNetV2 using `pytorch_post_training_quantization`:"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "RlAuiXAzl7Ef"
- },
- "outputs": [],
- "source": [
- "# Load the pre-trained MobileNetV2 model and perform post-training quantization using\n",
- "# the representative dataset generated by random_data_gen.\n",
- "from torchvision.models.mobilenetv2 import MobileNetV2\n",
- "float_model = MobileNetV2()\n",
- "repr_dataset = random_data_gen\n",
- "quantized_model, _ = mct.ptq.pytorch_post_training_quantization(in_module=float_model,\n",
- " representative_data_gen=repr_dataset)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "6alpyrD8mEm2"
- },
- "source": [
- "## Generate report\n",
- "\n",
- "First, we will create an XQuantConfig object with the directory to use for logs and with custom similarity metrics to compute between key points of the model. Here, we use the `./logs` directory for saving the generated logs, and add MAE similarity metric to compute (in addition to the default similarity metrics that are implemented: MSE, CS and SQNR):"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "e8m0CNs6mE93"
- },
- "outputs": [],
- "source": [
- "# Define the validation dataset and Xquant configuration, including custom similarity metrics.\n",
- "validation_dataset = partial(random_data_gen, use_labels=True)\n",
- "xquant_config = XQuantConfig(report_dir='./logs', custom_similarity_metrics={'mae': lambda x,y: torch.nn.L1Loss()(x,y).item()})\n",
- "\n",
- "# Generate the Xquant report comparing the float model and the quantized model using the\n",
- "# representative and validation datasets.\n",
- "from model_compression_toolkit.xquant import xquant_report_pytorch_experimental\n",
- "result = xquant_report_pytorch_experimental(\n",
- " float_model,\n",
- " quantized_model,\n",
- " repr_dataset,\n",
- " validation_dataset,\n",
- " xquant_config\n",
- " )"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "id": "q4qOygOYpMWz"
- },
- "source": [
- "## Visualize in TensorBoard\n",
- "\n",
- "In the TensorBoard, one can find useful information like statistics of the float layers' outputs and the graph of the quantized model with similarities that were measured comparing to the float model. Currently, the similarity is measured at linear layers like Conv2d, Linear, etc. (may be changed in the future). When observing such node in the graph, the similarities can be found in the node's properties as 'xquant_repr' and 'xquant_val' (the similarity that was computed using the representative dataset and the validation dataset, respectively).\n",
- "Make sure to choose 'xquant' from the 'Run' dropdown menu on the left side of TensorBoard.\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "source": [
- "Now we can run TensorBoard:"
- ],
- "metadata": {
- "id": "bMaSvGW1dAad"
- }
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "id": "X6yk2kI6kSEf"
- },
- "outputs": [],
- "source": [
- "%load_ext tensorboard\n",
- "%tensorboard --logdir logs"
- ]
- }
- ],
- "metadata": {
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ag0MtvPUkc8i"
+ },
+ "source": [
+ "# Quantization Troubleshooting with the Model Compression Toolkit (MCT) Using the XQuant Feature\n",
+ "\n",
+ "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_xquant.ipynb)\n",
+ "\n",
+ "## Overview\n",
+ "This notebook demonstrates the process of generating an Xquant report. The report provides valuable insights regarding the quality and success of the quantization process of a Pytorch model. This includes histograms and similarity metrics between the original float model and the quantized model in key points of the model. The report can be visualized using TensorBoard.\n",
+ "\n",
+ "## Summary:\n",
+ "We will cover the following steps:\n",
+ "\n",
+ "1. Load a pre-trained MobileNetV2 model and perform post-training quantization.\n",
+ "5. Define an Xquant configuration.\n",
+ "6. Generate an Xquant report comparing the float and quantized models.\n",
+ "7. Visualize the report using TensorBoard.\n",
+ "\n",
+ "## Setup\n",
+ "Install the relevant packages:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
"colab": {
- "provenance": []
+ "background_save": true
},
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "id": "kCLHJUhTlPDi"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install torch torchvision"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "outputs": [],
+ "source": [
+ "import importlib\n",
+ "if not importlib.util.find_spec('model_compression_toolkit'):\n",
+ " !pip install model_compression_toolkit"
+ ],
+ "metadata": {
+ "collapsed": false
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "NKKHNppSllmU"
+ },
+ "outputs": [],
+ "source": [
+ "from functools import partial\n",
+ "from model_compression_toolkit.xquant import XQuantConfig\n",
+ "import torch"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-4kQtkZLlnJj"
+ },
+ "source": [
+ "## Define a Random Data Generator\n",
+ "For demonstration purposes, we will use a random dataset generator to create both the representative dataset and the validation dataset. This will allow us to simulate data for quantization and validation without using an actual dataset."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "background_save": true
},
- "language_info": {
- "name": "python"
- }
+ "id": "-xM1K6tVlna8"
+ },
+ "outputs": [],
+ "source": [
+ "# Function to generate random data. If use_labels is True, it yields data with labels;\n",
+ "# otherwise, it yields only data.\n",
+ "def random_data_gen(shape=(3, 224, 224), use_labels=False, batch_size=2, num_iter=2):\n",
+ " if use_labels:\n",
+ " for _ in range(num_iter):\n",
+ " yield [[torch.randn(batch_size, *shape)], torch.randn(batch_size)]\n",
+ " else:\n",
+ " for _ in range(num_iter):\n",
+ " yield [torch.randn(batch_size, *shape)]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "naWFGx_vl6tX"
+ },
+ "source": [
+ "## MobileNetV2 Quantization using MCT\n",
+ "We will begin by quantizing MobileNetV2 using the `pytorch_post_training_quantization` function from MCT.:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "RlAuiXAzl7Ef"
+ },
+ "outputs": [],
+ "source": [
+ "# Load the pre-trained MobileNetV2 model and perform post-training quantization using\n",
+ "# the representative dataset generated by random_data_gen.\n",
+ "from torchvision.models.mobilenetv2 import MobileNetV2\n",
+ "import model_compression_toolkit as mct\n",
+ "\n",
+ "float_model = MobileNetV2()\n",
+ "quantized_model, _ = mct.ptq.pytorch_post_training_quantization(\n",
+ " in_module=float_model, representative_data_gen=random_data_gen)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6alpyrD8mEm2"
+ },
+ "source": [
+ "## Generating an XQuant Report\n",
+ "We will start by creating an XQuantConfig object, specifying the directory for logs and adding custom similarity metrics to be computed between key points of the model. In this example, we use the `./logs` directory for saving the generated logs and include the MAE (Mean Absolute Error) similarity metric, in addition to the default metrics: MSE (Mean Square Error), CS (Cosine Similarity), and SQNR (Signal-to-Quantization-Noise Ratio)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "e8m0CNs6mE93"
+ },
+ "outputs": [],
+ "source": [
+ "# Define the validation dataset and Xquant configuration, including custom similarity metrics.\n",
+ "validation_dataset = partial(random_data_gen, use_labels=True)\n",
+ "xquant_config = XQuantConfig(report_dir='./logs', custom_similarity_metrics={'mae': lambda x,y: torch.nn.L1Loss()(x,y).item()})\n",
+ "\n",
+ "# Generate the Xquant report comparing the float model and the quantized model using the\n",
+ "# representative and validation datasets.\n",
+ "from model_compression_toolkit.xquant import xquant_report_pytorch_experimental\n",
+ "result = xquant_report_pytorch_experimental(\n",
+ " float_model,\n",
+ " quantized_model,\n",
+ " repr_dataset,\n",
+ " validation_dataset,\n",
+ " xquant_config\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "q4qOygOYpMWz"
+ },
+ "source": [
+ "## Visualization using TensorBoard\n",
+ "In the TensorBoard, one can find useful information like statistics of the float layers' outputs and the graph of the quantized model with similarities that were measured comparing to the float model. Currently, the similarity is measured at linear layers like Conv2d, Linear, etc. (may be changed in the future). When observing such node in the graph, the similarities can be found in the node's properties as 'xquant_repr' and 'xquant_val' (the similarity that was computed using the representative dataset and the validation dataset, respectively).\n",
+ "Make sure to choose 'xquant' from the 'Run' dropdown menu on the left side of TensorBoard.\n",
+ ""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now we can run TensorBoard:"
+ ],
+ "metadata": {
+ "id": "bMaSvGW1dAad"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "X6yk2kI6kSEf"
+ },
+ "outputs": [],
+ "source": [
+ "%load_ext tensorboard\n",
+ "%tensorboard --logdir logs"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
},
- "nbformat": 4,
- "nbformat_minor": 0
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
}