Skip to content

Commit

Permalink
PR changes
Browse files Browse the repository at this point in the history
  • Loading branch information
liord committed Sep 26, 2024
1 parent 9f00552 commit be547b0
Show file tree
Hide file tree
Showing 4 changed files with 42 additions and 129 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
"outputs": [],
"source": [
"# Load a pre-trained model (e.g., ResNet18)\n",
"weights = ResNet18_Weights.IMAGENET1K_V1\n",
"weights = ResNet18_Weights.DEFAULT\n",
"float_model = resnet18(weights=weights)"
],
"metadata": {
Expand Down Expand Up @@ -173,7 +173,7 @@
"execution_count": null,
"outputs": [],
"source": [
"def plot_image(image, reverse_preprocess=False, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], plot_img=True):\n",
"def plot_image(image, reverse_preprocess=False, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]):\n",
" image = image.detach().cpu().numpy()[0]\n",
" image = image.transpose(1, 2, 0)\n",
" if reverse_preprocess:\n",
Expand Down Expand Up @@ -213,8 +213,10 @@
"cell_type": "markdown",
"source": [
"## Step 5: Post Training Quantization\n",
"In order to evaulate our generated images, we will use them to quantize the model using MCT's PTQ.This is referred to as **\"Zero-Shot Quantization (ZSQ)\"** or **\"Data-Free Quantization\"** because no real data is used in the quantization process.\n",
"Here we define configurations for MCT's PTQ:"
"In order to evaulate our generated images, we will use them to quantize the model using MCT's PTQ.This is referred to as **\"Zero-Shot Quantization (ZSQ)\"** or **\"Data-Free Quantization\"** because no real data is used in the quantization process. Next we will define configurations for MCT's PTQ.\n",
"\n",
"### Target Platform Capabilities (TPC)\n",
"MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
],
"metadata": {
"collapsed": false
Expand All @@ -226,8 +228,7 @@
"execution_count": null,
"outputs": [],
"source": [
"target_platform_cap = mct.get_target_platform_capabilities(\"pytorch\", \"default\")\n",
"core_config = mct.core.CoreConfig(quantization_config=mct.core.QuantizationConfig())"
"target_platform_cap = mct.get_target_platform_capabilities(\"pytorch\", \"default\")"
],
"metadata": {
"collapsed": false
Expand All @@ -250,23 +251,20 @@
"execution_count": null,
"outputs": [],
"source": [
"batch_size = 50\n",
"batch_size = 64\n",
"n_iter = 10\n",
"\n",
"batches_inds = np.random.choice(len(generated_images),\n",
" size=(int(len(generated_images) / batch_size),batch_size),\n",
" replace=False)\n",
"generated_images = np.concatenate(generated_images, axis=0).reshape(*(-1, batch_size, *list(generated_images[0].shape[1:])))\n",
" \n",
"def representative_data_gen():\n",
" for nn in range(n_iter):\n",
" nn_mod = nn % len(batches_inds)\n",
" yield [np.concatenate([generated_images[b].detach().cpu().numpy() for b in batches_inds[nn_mod]], axis=0)]\n",
" "
" nn_mod = nn % generated_images.shape[0]\n",
" yield [generated_images[nn_mod]]"
],
"metadata": {
"collapsed": false
},
"id": "41beccf1f2a4886f"
"id": "d6a3d88a51883757"
},
{
"cell_type": "markdown",
Expand All @@ -288,7 +286,6 @@
"quantized_model_generated_data, quantization_info = mct.ptq.pytorch_post_training_quantization(\n",
" in_module=float_model,\n",
" representative_data_gen=representative_data_gen,\n",
" core_config=core_config,\n",
" target_platform_capabilities=target_platform_cap\n",
")"
],
Expand Down Expand Up @@ -349,9 +346,6 @@
"\n",
"\n",
"def evaluate(model, testloader):\n",
" \"\"\"\n",
" Evaluate a model using a test loader.\n",
" \"\"\"\n",
" device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
" model.to(device)\n",
" model.eval() # Set the model to evaluation mode\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,16 @@
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb)\n",
"\n",
"## Overview\n",
"This tutorial demonstrates how to export a PyTorch model to ONNX format using the Model Compression Toolkit (MCT). It covers the steps of creating a simple PyTorch model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to ONNX and TorchSript. The tutorial also shows how to use the exported model for inference.\n",
"This tutorial demonstrates how to export a PyTorch model to ONNX and TorchSript formats using the Model Compression Toolkit (MCT). It covers the steps of creating a simple PyTorch model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to ONNX and TorchSript. The tutorial also shows how to use the exported model for inference.\n",
"\n",
"## Summary:\n",
"In this tutorial, we will cover:\n",
"\n",
"1. Constructing a simple PyTorch model for demonstration purposes.\n",
"2. Applying post-training quantization to the model using the Model Compression Toolkit.\n",
"3. Exporting the quantized model to the ONNX format.\n",
"3. Exporting the quantized model to the ONNX and TorchScript formats.\n",
"4. Ensuring compatibility between PyTorch and ONNX during the export process.\n",
"5. Using the exported model for inference.\n",
"6. Exporting to TorchScript.\n",
"\n",
"## Setup\n",
"To export your quantized model to ONNX format and use it for inference, you will need to install some additional packages. Note that these packages are only required if you plan to export the model to ONNX. If ONNX export is not needed, you can skip this step."
Expand Down Expand Up @@ -140,9 +139,10 @@
"onnx_file_path = 'model_format_onnx_mctq.onnx'\n",
"\n",
"# Export ONNX model with mctq quantizers.\n",
"mct.exporter.pytorch_export_model(model=quantized_exportable_model,\n",
" save_model_path=onnx_file_path,\n",
" repr_dataset=representative_data_gen)"
"mct.exporter.pytorch_export_model(\n",
" model=quantized_exportable_model,\n",
" save_model_path=onnx_file_path,\n",
" repr_dataset=representative_data_gen)"
],
"metadata": {
"id": "PO-Hh0bzD1VJ"
Expand All @@ -166,10 +166,11 @@
"cell_type": "code",
"source": [
"# Export ONNX model with mctq quantizers.\n",
"mct.exporter.pytorch_export_model(model=quantized_exportable_model,\n",
" save_model_path=onnx_file_path,\n",
" repr_dataset=representative_data_gen,\n",
" onnx_opset_version=16)"
"mct.exporter.pytorch_export_model(\n",
" model=quantized_exportable_model,\n",
" save_model_path=onnx_file_path,\n",
" repr_dataset=representative_data_gen,\n",
" onnx_opset_version=16)"
],
"metadata": {
"id": "S9XtcX8s3dU9"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,36 +66,6 @@
"import random"
]
},
{
"cell_type": "markdown",
"source": [
"We will set a random seed to ensure reproducibility of results."
],
"metadata": {
"collapsed": false
},
"id": "87b8a00d13c0a4ef"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"def seed_everything(seed_value):\n",
" random.seed(seed_value)\n",
" np.random.seed(seed_value)\n",
" torch.manual_seed(seed_value)\n",
" torch.cuda.manual_seed_all(seed_value)\n",
" torch.backends.cudnn.deterministic = True\n",
" torch.backends.cudnn.benchmark = False\n",
"\n",
"seed_everything(0)"
],
"metadata": {
"collapsed": false
},
"id": "15053e21484ae217"
},
{
"cell_type": "markdown",
"source": [
Expand Down Expand Up @@ -209,8 +179,9 @@
{
"cell_type": "markdown",
"source": [
"## Target Platform Capabilities\n",
"In addition, MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
"## Target Platform Capabilities (TPC)\n",
"In addition, MCT optimizes models for dedicated hardware platforms using Target Platform Capabilities (TPC). \n",
"**Note:** To apply mixed-precision quantization to specific layers, the TPC must define different bit-width options for those layers. For more details, please refer to our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html). In this example, we use the default PyTorch TPC, which supports 2, 4, and 8-bit options for convolution and linear layers."
],
"metadata": {
"collapsed": false
Expand All @@ -224,7 +195,7 @@
"source": [
"import model_compression_toolkit as mct\n",
"\n",
"# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
"# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
"target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
],
"metadata": {
Expand All @@ -236,7 +207,7 @@
"cell_type": "markdown",
"source": [
"## Mixed Precision Configurations\n",
"To enable mixed-precision quantization, specific parameters must be set in the `CoreConfig` used by MCT. We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n",
"We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:\n",
"1. **Number of images** - Determines how many images from the representative dataset are used to find an optimal bit-width configuration. More images result in higher accuracy but increase search time.\n",
"2. **Gradient weighting** - Improves bit-width configuration accuracy at the cost of longer search time. This method will not be used in this example.\n",
"\n",
Expand All @@ -252,7 +223,8 @@
"execution_count": null,
"outputs": [],
"source": [
"configuration = mct.core.CoreConfig(mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n",
"configuration = mct.core.CoreConfig(\n",
" mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(\n",
" num_of_images=32,\n",
" use_hessian_based_scores=False))"
],
Expand All @@ -264,7 +236,7 @@
{
"cell_type": "markdown",
"source": [
"Additionally, when using mixed-precision, we define the desired compression ratio. In this example, we will configure the model to compress the weights to 75% of the size of the 8-bit model's weights. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)."
"To enable mixed-precision quantization, we define the desired compression ratio. In this example, we will configure the model to compress the weights to 75% of the size of the 8-bit model's weights. To achieve this, we will retrieve the model's resource utilization information, `resource_utilization_data`, specifically focusing on the weights' memory. Then, we will create a `ResourceUtilization` object to enforce the size constraint on the weight's memory, which applies only to the quantized layers and attributes (e.g., Conv2D kernels, but not biases)."
],
"metadata": {
"collapsed": false
Expand All @@ -277,10 +249,11 @@
"outputs": [],
"source": [
"# Get Resource Utilization information to constraint your model's memory size.\n",
"resource_utilization_data = mct.core.pytorch_resource_utilization_data(float_model,\n",
" representative_dataset_gen,\n",
" configuration,\n",
" target_platform_capabilities=target_platform_cap)\n",
"resource_utilization_data = mct.core.pytorch_resource_utilization_data(\n",
" float_model,\n",
" representative_dataset_gen,\n",
" configuration,\n",
" target_platform_capabilities=target_platform_cap)\n",
"\n",
"# Create a ResourceUtilization object \n",
"resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * 0.75)"
Expand Down Expand Up @@ -317,18 +290,6 @@
},
"id": "d769042646dca720"
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f5fa4a2",
"metadata": {
"id": "4f5fa4a2"
},
"outputs": [],
"source": [
"print(quantized_model)"
]
},
{
"cell_type": "markdown",
"id": "c677bd61c3ab4649",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"\n",
"1. Loading and preprocessing ImageNet’s validation dataset.\n",
"2. Constructing an unlabeled representative dataset.\n",
"3. Hardware-Friendly Post-Training Quantization using MCT.\n",
"3. Post-Training Quantization using MCT.\n",
"4. Accuracy evaluation of the floating-point and the quantized models.\n",
"\n",
"## Setup\n",
Expand Down Expand Up @@ -66,36 +66,6 @@
"import random"
]
},
{
"cell_type": "markdown",
"source": [
"We will set a random seed to ensure reproducibility of results."
],
"metadata": {
"collapsed": false
},
"id": "ebd5ee7377e90c74"
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"def seed_everything(seed_value):\n",
" random.seed(seed_value)\n",
" np.random.seed(seed_value)\n",
" torch.manual_seed(seed_value)\n",
" torch.cuda.manual_seed_all(seed_value)\n",
" torch.backends.cudnn.deterministic = True\n",
" torch.backends.cudnn.benchmark = False\n",
"\n",
"seed_everything(0)"
],
"metadata": {
"collapsed": false
},
"id": "204f0a87438cab35"
},
{
"cell_type": "markdown",
"source": [
Expand Down Expand Up @@ -209,8 +179,8 @@
{
"cell_type": "markdown",
"source": [
"## Target Platform Capabilities\n",
"In addition, MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
"## Target Platform Capabilities (TPC)\n",
"In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sony.github.io/model_optimization/docs/api/api_docs/modules/target_platform.html)). Here, we use the default Pytorch TPC:"
],
"metadata": {
"collapsed": false
Expand All @@ -224,7 +194,7 @@
"source": [
"import model_compression_toolkit as mct\n",
"\n",
"# Get a TargetPlatformCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
"# Get a TargetPlatformCapabilities object that models the hardware platform for the quantized model inference. Here, for example, we use the default platform that is attached to a Pytorch layers representation.\n",
"target_platform_cap = mct.get_target_platform_capabilities('pytorch', 'default')"
],
"metadata": {
Expand All @@ -239,9 +209,8 @@
"id": "d0a92bee"
},
"source": [
"## Hardware-Friendly Post-Training Quantization using MCT\n",
"Now for the exciting part! Let’s run hardware-friendly PTQ on the model. \n",
"**Hardware-friendly** means symmetric quantization with power-of-2 thresholds."
"## Post-Training Quantization using MCT\n",
"Now for the exciting part! Let’s run PTQ on the model. "
]
},
{
Expand Down Expand Up @@ -270,18 +239,6 @@
"Our model is now quantized. MCT has created a simulated quantized model within the original PyTorch framework by inserting [quantization representation modules](https://github.com/sony/mct_quantizers). These modules, such as `PytorchQuantizationWrapper` and `PytorchActivationQuantizationHolder`, wrap PyTorch layers to simulate the quantization of weights and activations, respectively. While the size of the saved model remains unchanged, all the quantization parameters are stored within these modules and are ready for deployment on the target hardware. In this example, we used the default MCT settings, which compressed the model from 32 bits to 8 bits, resulting in a compression ratio of 4x. Let's print the quantized model and examine the quantization modules:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4f5fa4a2",
"metadata": {
"id": "4f5fa4a2"
},
"outputs": [],
"source": [
"print(quantized_model)"
]
},
{
"cell_type": "markdown",
"id": "c677bd61c3ab4649",
Expand Down

0 comments on commit be547b0

Please sign in to comment.