PR changes

sony · Sep 26, 2024 · be547b0 · be547b0
1 parent 9f00552
commit be547b0
Show file tree

Hide file tree

Showing 4 changed files with 42 additions and 129 deletions.
diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_data_generation.ipynb
@@ -84,7 +84,7 @@
    "outputs": [],
    "source": [
     "# Load a pre-trained model (e.g., ResNet18)\n",
-    "weights = ResNet18_Weights.IMAGENET1K_V1\n",
+    "weights = ResNet18_Weights.DEFAULT\n",
     "float_model = resnet18(weights=weights)"
    ],
    "metadata": {
@@ -173,7 +173,7 @@
    "execution_count": null,
    "outputs": [],
    "source": [
-    "def plot_image(image, reverse_preprocess=False, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], plot_img=True):\n",
+    "def plot_image(image, reverse_preprocess=False, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):\n",
     "    image = image.detach().cpu().numpy()[0]\n",
     "    image = image.transpose(1, 2, 0)\n",
     "    if reverse_preprocess:\n",
@@ -213,8 +213,10 @@
    "cell_type": "markdown",
    "source": [
     "## Step 5: Post Training Quantization\n",
-    "In order to evaulate our generated images, we will use them to quantize the model using MCT's PTQ.This is referred to as **\"Zero-Shot Quantization (ZSQ)\"** or **\"Data-Free Quantization\"** because no real data is used in the quantization process.\n",
-    "Here we define configurations for MCT's PTQ:"
+    "In order to evaulate our generated images, we will use them to quantize the model using MCT's PTQ.This is referred to as **\"Zero-Shot Quantization (ZSQ)\"** or **\"Data-Free Quantization\"** because no real data is used in the quantization process. Next we will define configurations for MCT's PTQ.\n",
+    "\n",
+    "### Target Platform Capabilities (TPC)\n",
+    "MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
    ],
    "metadata": {
     "collapsed": false
@@ -226,8 +228,7 @@
    "execution_count": null,
    "outputs": [],
    "source": [
-    "target_platform_cap = mct.get_target_platform_capabilities(\"pytorch\", \"default\")\n",
-    "core_config = mct.core.CoreConfig(quantization_config=mct.core.QuantizationConfig())"
+    "target_platform_cap = mct.get_target_platform_capabilities(\"pytorch\", \"default\")"
    ],
    "metadata": {
     "collapsed": false
@@ -250,23 +251,20 @@
    "execution_count": null,
    "outputs": [],
    "source": [
-    "batch_size = 50\n",
+    "batch_size = 64\n",
     "n_iter = 10\n",
     "\n",
-    "batches_inds = np.random.choice(len(generated_images),\n",
-    "                                size=(int(len(generated_images) / batch_size),batch_size),\n",
-    "                                replace=False)\n",
+    "generated_images = np.concatenate(generated_images, axis=0).reshape(*(-1, batch_size, *list(generated_images[0].shape[1:])))\n",
     "        \n",
     "def representative_data_gen():\n",
     "    for nn in range(n_iter):\n",
-    "        nn_mod = nn % len(batches_inds)\n",
-    "        yield [np.concatenate([generated_images[b].detach().cpu().numpy() for b in batches_inds[nn_mod]], axis=0)]\n",
-    "        "
+    "        nn_mod = nn % generated_images.shape[0]\n",
+    "        yield [generated_images[nn_mod]]"
    ],
    "metadata": {
     "collapsed": false
    },
-   "id": "41beccf1f2a4886f"
+   "id": "d6a3d88a51883757"
   },
   {
    "cell_type": "markdown",
@@ -288,7 +286,6 @@
     "quantized_model_generated_data, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
     "    in_module=float_model,\n",
     "    representative_data_gen=representative_data_gen,\n",
-    "    core_config=core_config,\n",
     "    target_platform_capabilities=target_platform_cap\n",
     ")"
    ],
@@ -349,9 +346,6 @@
     "\n",
     "\n",
     "def evaluate(model, testloader):\n",
-    "    \"\"\"\n",
-    "    Evaluate a model using a test loader.\n",
-    "    \"\"\"\n",
     "    device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
     "    model.to(device)\n",
     "    model.eval()  # Set the model to evaluation mode\n",

diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb
@@ -22,17 +22,16 @@
     "[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb)\n",
     "\n",
     "## Overview\n",
-    "This tutorial demonstrates how to export a PyTorch model to ONNX format using the Model Compression Toolkit (MCT). It covers the steps of creating a simple PyTorch model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to ONNX and TorchSript. The tutorial also shows how to use the exported model for inference.\n",
+    "This tutorial demonstrates how to export a PyTorch model to ONNX and TorchSript formats using the Model Compression Toolkit (MCT). It covers the steps of creating a simple PyTorch model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to ONNX and TorchSript. The tutorial also shows how to use the exported model for inference.\n",
     "\n",
     "## Summary:\n",
     "In this tutorial, we will cover:\n",
     "\n",
     "1. Constructing a simple PyTorch model for demonstration purposes.\n",
     "2. Applying post-training quantization to the model using the Model Compression Toolkit.\n",
-    "3. Exporting the quantized model to the ONNX format.\n",
+    "3. Exporting the quantized model to the ONNX and TorchScript formats.\n",
     "4. Ensuring compatibility between PyTorch and ONNX during the export process.\n",
     "5. Using the exported model for inference.\n",
-    "6. Exporting to TorchScript.\n",
     "\n",
     "## Setup\n",
     "To export your quantized model to ONNX format and use it for inference, you will need to install some additional packages. Note that these packages are only required if you plan to export the model to ONNX. If ONNX export is not needed, you can skip this step."
@@ -140,9 +139,10 @@
     "onnx_file_path = 'model_format_onnx_mctq.onnx'\n",
     "\n",
     "# Export ONNX model with mctq quantizers.\n",
-    "mct.exporter.pytorch_export_model(model=quantized_exportable_model,\n",
-    "                                  save_model_path=onnx_file_path,\n",
-    "                                  repr_dataset=representative_data_gen)"
+    "mct.exporter.pytorch_export_model(\n",
+    "    model=quantized_exportable_model,\n",
+    "    save_model_path=onnx_file_path,\n",
+    "    repr_dataset=representative_data_gen)"
    ],
    "metadata": {
     "id": "PO-Hh0bzD1VJ"
@@ -166,10 +166,11 @@
    "cell_type": "code",
    "source": [
     "# Export ONNX model with mctq quantizers.\n",
-    "mct.exporter.pytorch_export_model(model=quantized_exportable_model,\n",
-    "                                  save_model_path=onnx_file_path,\n",
-    "                                  repr_dataset=representative_data_gen,\n",
-    "                                  onnx_opset_version=16)"
+    "mct.exporter.pytorch_export_model(\n",
+    "    model=quantized_exportable_model,\n",
+    "    save_model_path=onnx_file_path,\n",
+    "    repr_dataset=representative_data_gen,\n",
+    "    onnx_opset_version=16)"
    ],
    "metadata": {
     "id": "S9XtcX8s3dU9"

diff --git a/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb b/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mixed_precision_ptq.ipynb
@@ -66,36 +66,6 @@
     "import random"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "We will set a random seed to ensure reproducibility of results."
-   ],
-   "metadata": {
-    "collapsed": false
-   },
-   "id": "87b8a00d13c0a4ef"
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "outputs": [],
-   "source": [
-    "def seed_everything(seed_value):\n",
-    "    random.seed(seed_value)\n",
-    "    np.random.seed(seed_value)\n",
-    "    torch.manual_seed(seed_value)\n",
-    "    torch.cuda.manual_seed_all(seed_value)\n",
-    "    torch.backends.cudnn.deterministic = True\n",
-    "    torch.backends.cudnn.benchmark = False\n",
-    "\n",
-    "seed_everything(0)"
-   ],
-   "metadata": {
-    "collapsed": false
-   },
-   "id": "15053e21484ae217"
-  },
   {
    "cell_type": "markdown",
    "source": [
@@ -209,8 +179,9 @@
   {
    "cell_type": "markdown",
    "source": [
-    "## Target Platform Capabilities\n",
-    "In addition, MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
+    "## Target Platform Capabilities (TPC)\n",
+    "In addition, MCT optimizes models for dedicated hardware platforms using Target Platform Capabilities (TPC). \n",
+    "**Note:**  To apply mixed-precision quantization to specific layers, the TPC must define different bit-width options for those layers. For more details, please refer to our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html). In this example, we use the default PyTorch TPC, which supports 2, 4, and 8-bit options for convolution and linear layers."
    ],
    "metadata": {
     "collapsed": false
@@ -224,7 +195,7 @@
    "source": [
     "import model_compression_toolkit as mct\n",
     "\n",
-    "# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
+    "# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
     "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
    ],
    "metadata": {
@@ -236,7 +207,7 @@
    "cell_type": "markdown",
    "source": [
     "## Mixed Precision Configurations\n",
-    "To enable mixed-precision quantization, specific parameters must be set in the `CoreConfig` used by MCT. We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n",
+    "We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n",
     "1. **Number of images** - Determines how many images from the representative dataset are used to find an optimal bit-width configuration. More images result in higher accuracy but increase search time.\n",
     "2. **Gradient weighting** - Improves bit-width configuration accuracy at the cost of longer search time. This method will not be used in this example.\n",
     "\n",
@@ -252,7 +223,8 @@
    "execution_count": null,
    "outputs": [],
    "source": [
-    "configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n",
+    "configuration = mct.core.CoreConfig(\n",
+    "    mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n",
     "    num_of_images=32,\n",
     "    use_hessian_based_scores=False))"
    ],
@@ -264,7 +236,7 @@
   {
    "cell_type": "markdown",
    "source": [
-    "Additionally, when using mixed-precision, we define the desired compression ratio. In this example, we will configure the model to compress the weights to 75% of the size of the 8-bit model's weights. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)."
+    "To enable mixed-precision quantization, we define the desired compression ratio. In this example, we will configure the model to compress the weights to 75% of the size of the 8-bit model's weights. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)."
    ],
    "metadata": {
     "collapsed": false
@@ -277,10 +249,11 @@
    "outputs": [],
    "source": [
     "# Get Resource Utilization information to constraint your model's memory size.\n",
-    "resource_utilization_data = mct.core.pytorch_resource_utilization_data(float_model,\n",
-    "                                     representative_dataset_gen,\n",
-    "                                     configuration,\n",
-    "                                     target_platform_capabilities=target_platform_cap)\n",
+    "resource_utilization_data = mct.core.pytorch_resource_utilization_data(\n",
+    "    float_model,\n",
+    "    representative_dataset_gen,\n",
+    "    configuration,\n",
+    "    target_platform_capabilities=target_platform_cap)\n",
     "\n",
     "# Create a ResourceUtilization object \n",
     "resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * 0.75)"
@@ -317,18 +290,6 @@
    },
    "id": "d769042646dca720"
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4f5fa4a2",
-   "metadata": {
-    "id": "4f5fa4a2"
-   },
-   "outputs": [],
-   "source": [
-    "print(quantized_model)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "c677bd61c3ab4649",

diff --git a/...notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb b/...notebooks/mct_features_notebooks/pytorch/example_pytorch_post_training_quantization.ipynb
@@ -18,7 +18,7 @@
     "\n",
     "1. Loading and preprocessing ImageNet’s validation dataset.\n",
     "2. Constructing an unlabeled representative dataset.\n",
-    "3. Hardware-Friendly Post-Training Quantization using MCT.\n",
+    "3. Post-Training Quantization using MCT.\n",
     "4. Accuracy evaluation of the floating-point and the quantized models.\n",
     "\n",
     "## Setup\n",
@@ -66,36 +66,6 @@
     "import random"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "source": [
-    "We will set a random seed to ensure reproducibility of results."
-   ],
-   "metadata": {
-    "collapsed": false
-   },
-   "id": "ebd5ee7377e90c74"
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "outputs": [],
-   "source": [
-    "def seed_everything(seed_value):\n",
-    "    random.seed(seed_value)\n",
-    "    np.random.seed(seed_value)\n",
-    "    torch.manual_seed(seed_value)\n",
-    "    torch.cuda.manual_seed_all(seed_value)\n",
-    "    torch.backends.cudnn.deterministic = True\n",
-    "    torch.backends.cudnn.benchmark = False\n",
-    "\n",
-    "seed_everything(0)"
-   ],
-   "metadata": {
-    "collapsed": false
-   },
-   "id": "204f0a87438cab35"
-  },
   {
    "cell_type": "markdown",
    "source": [
@@ -209,8 +179,8 @@
   {
    "cell_type": "markdown",
    "source": [
-    "## Target Platform Capabilities\n",
-    "In addition, MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
+    "## Target Platform Capabilities (TPC)\n",
+    "In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
    ],
    "metadata": {
     "collapsed": false
@@ -224,7 +194,7 @@
    "source": [
     "import model_compression_toolkit as mct\n",
     "\n",
-    "# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
+    "# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
     "target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
    ],
    "metadata": {
@@ -239,9 +209,8 @@
     "id": "d0a92bee"
    },
    "source": [
-    "## Hardware-Friendly Post-Training Quantization using MCT\n",
-    "Now for the exciting part! Let’s run hardware-friendly PTQ on the model. \n",
-    "**Hardware-friendly** means symmetric quantization with power-of-2 thresholds."
+    "## Post-Training Quantization using MCT\n",
+    "Now for the exciting part! Let’s run PTQ on the model. "
    ]
   },
   {
@@ -270,18 +239,6 @@
     "Our model is now quantized. MCT has created a simulated quantized model within the original PyTorch framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `PytorchQuantizationWrapper` and `PytorchActivationQuantizationHolder`, wrap PyTorch layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x. Let's print the quantized model and examine the quantization modules:"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "4f5fa4a2",
-   "metadata": {
-    "id": "4f5fa4a2"
-   },
-   "outputs": [],
-   "source": [
-    "print(quantized_model)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "c677bd61c3ab4649",