Skip to content

Commit

Permalink
remove const quantization from TPCv2 (#1065)
Browse files Browse the repository at this point in the history
Remove const quantization from TPCv2
Fix broken links for notebooks
  • Loading branch information
elad-c authored May 9, 2024
1 parent 2376ed4 commit 647d4a7
Show file tree
Hide file tree
Showing 31 changed files with 1,761 additions and 1,783 deletions.
4 changes: 2 additions & 2 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Exporting your model to INT8 format (currently, this is supported only for Keras
but this exporting method is limited to uniform 8-bit quantization only.
Note that the IMX500 converter accepts the "fake quantization" model and supports all the features of MCT (e.g. less than 8 bits for weights bit-width and non-uniform quantization).

For more information and an implementation example, check out the [INT8 TFLite export tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/export/example_keras_export.ipynb)
For more information and an implementation example, check out the [INT8 TFLite export tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb)


### 2. Why does loading a quantized exported model from a file fail?
Expand All @@ -36,7 +36,7 @@ quantized_model = mct.keras_load_quantized_model('my_model.keras')

#### PyTorch

PyTorch models can be exported as onnx models. An example of loading a saved onnx model can be found [here](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/pytorch/export/example_pytorch_export.ipynb).
PyTorch models can be exported as onnx models. An example of loading a saved onnx model can be found [here](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_export.ipynb).

*Note:* Running inference on an ONNX model in the `onnxruntime` package has a high latency.
Inference on the target platform (e.g. the IMX500) is not affected by this latency.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,19 +152,6 @@ def generate_tp_model(default_config: OpQuantizationConfig,
# this configuration will be used for the operation quantization:
default_configuration_options = tp.QuantizationConfigOptions([default_config])

# Create a QuantizationConfigOptions for quantizing constants in functional ops.
# Constant configuration is similar to the default eight bit configuration except for PoT
# quantization method for the constant.
# Since the constants are not named attributes of the layer, we use the default_weight_attr_config to
# define the desired quantization properties for them.
const_config = default_config.clone_and_edit(
default_weight_attr_config=default_config.default_weight_attr_config.clone_and_edit(
enable_weights_quantization=True))
if not (const_config.default_weight_attr_config.weights_quantization_method == tp.QuantizationMethod.POWER_OF_TWO and
const_config.default_weight_attr_config.weights_per_channel_threshold is False):
mct.logger.Logger.error('Constant quantization config should be per-tensor PoT.')
const_configuration_options = tp.QuantizationConfigOptions([const_config])

# Create a TargetPlatformModel and set its default quantization config.
# This default configuration will be used for all operations
# unless specified otherwise (see OperatorsSet, for example):
Expand Down Expand Up @@ -198,10 +185,10 @@ def generate_tp_model(default_config: OpQuantizationConfig,
# Define operations sets without quantization configuration
# options (useful for creating fusing patterns, for example):
any_relu = tp.OperatorsSet("AnyReLU")
add = tp.OperatorsSet("Add", const_configuration_options)
sub = tp.OperatorsSet("Sub", const_configuration_options)
mul = tp.OperatorsSet("Mul", const_configuration_options)
div = tp.OperatorsSet("Div", const_configuration_options)
add = tp.OperatorsSet("Add")
sub = tp.OperatorsSet("Sub")
mul = tp.OperatorsSet("Mul")
div = tp.OperatorsSet("Div")
prelu = tp.OperatorsSet("PReLU")
swish = tp.OperatorsSet("Swish")
sigmoid = tp.OperatorsSet("Sigmoid")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,19 +151,6 @@ def generate_tp_model(default_config: OpQuantizationConfig,
# this configuration will be used for the operation quantization:
default_configuration_options = tp.QuantizationConfigOptions([default_config])

# Create a QuantizationConfigOptions for quantizing constants in functional ops.
# Constant configuration is similar to the default eight bit configuration except for PoT
# quantization method for the constant.
# Since the constants are not named attributes of the layer, we use the default_weight_attr_config to
# define the desired quantization properties for them.
const_config = default_config.clone_and_edit(
default_weight_attr_config=default_config.default_weight_attr_config.clone_and_edit(
enable_weights_quantization=True))
if not (const_config.default_weight_attr_config.weights_quantization_method == tp.QuantizationMethod.POWER_OF_TWO and
const_config.default_weight_attr_config.weights_per_channel_threshold is False):
mct.logger.Logger.error('Constant quantization config should be per-tensor PoT.')
const_configuration_options = tp.QuantizationConfigOptions([const_config])

# Create a TargetPlatformModel and set its default quantization config.
# This default configuration will be used for all operations
# unless specified otherwise (see OperatorsSet, for example):
Expand Down Expand Up @@ -195,10 +182,10 @@ def generate_tp_model(default_config: OpQuantizationConfig,
# Define operations sets without quantization configuration
# options (useful for creating fusing patterns, for example):
any_relu = tp.OperatorsSet("AnyReLU")
add = tp.OperatorsSet("Add", const_configuration_options)
sub = tp.OperatorsSet("Sub", const_configuration_options)
mul = tp.OperatorsSet("Mul", const_configuration_options)
div = tp.OperatorsSet("Div", const_configuration_options)
add = tp.OperatorsSet("Add")
sub = tp.OperatorsSet("Sub")
mul = tp.OperatorsSet("Mul")
div = tp.OperatorsSet("Div")
prelu = tp.OperatorsSet("PReLU")
swish = tp.OperatorsSet("Swish")
sigmoid = tp.OperatorsSet("Sigmoid")
Expand Down
6 changes: 3 additions & 3 deletions quantization_troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ MCT offers advanced features for mitigating these accuracy degradations, such as
In mixed precision quantization, MCT will assign a different bit width to each weight in the model, depending on the weight's layer sensitivity
and a resource constraint defined by the user, such as target model size.

Check out the [mixed precision tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/example_keras_mobilenet_mixed_precision.ipynb)
Check out the [mixed precision tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_mixed_precision.ipynb)
for more information and an implementation example. Following are a few tips for improving the mixed precision quantization.

#### 1. Using more samples in mixed precision quantization
Expand Down Expand Up @@ -204,7 +204,7 @@ When PTQ (either with or without mixed precision) fails to deliver the required
In GPTQ, MCT will finetune the model's weights and quantization parameters for improved accuracy. The finetuning process
will only use the label-less representative dataset.

Check out the [GPTQ tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/gptq/example_keras_mobilenet_gptq.ipynb) for more information and an implementation example.
Check out the [GPTQ tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mobilenet_gptq.ipynb) for more information and an implementation example.

*Note #1*: The finetuning process will take **much** longer to finish than PTQ. As in any finetuning, some hyperparameters optimization may be required.

Expand All @@ -226,7 +226,7 @@ To address this issue and manipulate individual layers within the network, you c

Using the network editor API (mct.core.network_editor) you can define a set of "rules" to apply on the network, using provided filters.

Please refer to our [tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/keras/debug_tools/example_keras_network_editor.ipynb) for more details.
Please refer to our [tutorial](https://github.com/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb) for more details.

A use-case example to understand if a certain layer causes accuracy degradation - set the output quantization bits of the layer's type to 16 bits (instead of the default 8 bits defined in the TPC).
If the accuracy improves, then it is pointing to that layer for causing the issue.
Expand Down
34 changes: 22 additions & 12 deletions tests/doc_tests/test_docs_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def check_link(_url):
def test_readme_and_rst_files(self):
mct_folder = getcwd()
print("MCT folder:", mct_folder)
are_links_ok = True
for filepath, _, filenames in walk(mct_folder):
for filename in filenames:

Expand All @@ -59,14 +60,18 @@ def test_readme_and_rst_files(self):
# A link starting with '#' is a local reference to a headline in the current file --> ignore
pass
elif 'http://' in _link or 'https://' in _link:
self.assertTrue(self.check_link(_link),
msg=f'Broken link: {_link} in {join(filepath, filename)}')
print("Link ok:", _link)
if self.check_link(_link):
print("Link ok:", _link)
else:
are_links_ok = False
print(f'Broken link: {_link} in {join(filepath, filename)}')
else:
_link = _link.split('#')[0]
self.assertTrue(isdir(join(filepath, _link)) or isfile(join(filepath, _link)),
msg=f'Broken link: {_link} in {join(filepath, filename)}')
print("Link ok:", _link)
if isdir(join(filepath, _link)) or isfile(join(filepath, _link)):
print("Link ok:", _link)
else:
are_links_ok = False
print(f'Broken link: {_link} in {join(filepath, filename)}')

elif filename.endswith(".rst"):
# doc source file detected. go over lines in search of links.
Expand All @@ -82,10 +87,15 @@ def test_readme_and_rst_files(self):
# This link is checked when generating the docs
pass
elif 'http://' in _link or 'https://' in _link:
self.assertTrue(self.check_link(_link),
msg=f'Broken link: {_link} in {join(filepath, filename)}')
print("Link ok:", _link)
if self.check_link(_link):
print("Link ok:", _link)
else:
are_links_ok = False
print(f'Broken link: {_link} in {join(filepath, filename)}')
else:
self.assertTrue(isfile(join(filepath, _link.replace('../', '') + '.rst')),
msg=f'Broken link: {_link} in {join(filepath, filename)}')
print("Link ok:", _link)
if isfile(join(filepath, _link.replace('../', '') + '.rst')):
print("Link ok:", _link)
else:
are_links_ok = False
print(f'Broken link: {_link} in {join(filepath, filename)}')
self.assertTrue(are_links_ok, msg='Found broken links!')
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ def generate_inputs(self):
return [1 + np.random.random(in_shape) for in_shape in self.get_input_shapes()]

def get_tpc(self):
return mct.get_target_platform_capabilities(TENSORFLOW, IMX500_TP_MODEL, "v2")
return mct.get_target_platform_capabilities(TENSORFLOW, IMX500_TP_MODEL, "v3")

def create_networks(self):
inputs = layers.Input(shape=self.get_input_shapes()[0][1:])
Expand Down Expand Up @@ -100,7 +100,7 @@ def generate_inputs(self):
return [1 + np.random.random(in_shape) for in_shape in self.get_input_shapes()]

def get_tpc(self):
return mct.get_target_platform_capabilities(TENSORFLOW, IMX500_TP_MODEL, "v2")
return mct.get_target_platform_capabilities(TENSORFLOW, IMX500_TP_MODEL, "v3")

def create_networks(self):
inputs = layers.Input(shape=self.get_input_shapes()[0][1:])
Expand Down
22 changes: 11 additions & 11 deletions tests/keras_tests/feature_networks_tests/test_features_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -548,17 +548,17 @@ def test_linear_collapsing(self):
SixConv2DCollapsingTest(self).run_test()
Op2DAddConstCollapsingTest(self).run_test()

def test_const_quantization(self):
c = (np.ones((16,)) + np.random.random((16,))).astype(np.float32)
for func in [tf.add, tf.multiply, tf.subtract, tf.divide, tf.truediv]:
ConstQuantizationTest(self, func, c).run_test()
ConstQuantizationTest(self, func, c, input_reverse_order=True).run_test()
ConstQuantizationTest(self, func, c, input_reverse_order=True, use_kwargs=True).run_test()
ConstQuantizationTest(self, func, c, use_kwargs=True).run_test()
ConstQuantizationTest(self, func, 2.45).run_test()
ConstQuantizationTest(self, func, 5.1, input_reverse_order=True).run_test()

AdvancedConstQuantizationTest(self).run_test()
# def test_const_quantization(self):
# c = (np.ones((16,)) + np.random.random((16,))).astype(np.float32)
# for func in [tf.add, tf.multiply, tf.subtract, tf.divide, tf.truediv]:
# ConstQuantizationTest(self, func, c).run_test()
# ConstQuantizationTest(self, func, c, input_reverse_order=True).run_test()
# ConstQuantizationTest(self, func, c, input_reverse_order=True, use_kwargs=True).run_test()
# ConstQuantizationTest(self, func, c, use_kwargs=True).run_test()
# ConstQuantizationTest(self, func, 2.45).run_test()
# ConstQuantizationTest(self, func, 5.1, input_reverse_order=True).run_test()
#
# AdvancedConstQuantizationTest(self).run_test()

def test_const_representation(self):
c = (np.ones((16,)) + np.random.random((16,))).astype(np.float32)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def generate_inputs(self):
return [np.random.random(in_shape)+1 for in_shape in self.get_input_shapes()]

def get_tpc(self):
return mct.get_target_platform_capabilities(PYTORCH, IMX500_TP_MODEL, "v2")
return mct.get_target_platform_capabilities(PYTORCH, IMX500_TP_MODEL, "v3")

def create_networks(self):
if self.input_reverse_order:
Expand Down Expand Up @@ -116,7 +116,7 @@ def generate_inputs(self):
return [np.random.random(in_shape)+1 for in_shape in self.get_input_shapes()]

def get_tpc(self):
return mct.get_target_platform_capabilities(PYTORCH, IMX500_TP_MODEL, "v2")
return mct.get_target_platform_capabilities(PYTORCH, IMX500_TP_MODEL, "v3")

def create_networks(self):
return AdvancedConstQuantizationNet(self.const)
Expand Down
18 changes: 9 additions & 9 deletions tests/pytorch_tests/model_tests/test_feature_models_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -232,15 +232,15 @@ def test_residual_collapsing(self):
ResidualCollapsingTest1(self).run_test()
ResidualCollapsingTest2(self).run_test()

def test_const_quantization(self):
c = (np.ones((32,)) + np.random.random((32,))).astype(np.float32)
for func in [torch.add, torch.sub, torch.mul, torch.div]:
ConstQuantizationTest(self, func, c).run_test()
ConstQuantizationTest(self, func, c, input_reverse_order=True).run_test()
ConstQuantizationTest(self, func, 2.45).run_test()
ConstQuantizationTest(self, func, 5, input_reverse_order=True).run_test()

AdvancedConstQuantizationTest(self).run_test()
# def test_const_quantization(self):
# c = (np.ones((32,)) + np.random.random((32,))).astype(np.float32)
# for func in [torch.add, torch.sub, torch.mul, torch.div]:
# ConstQuantizationTest(self, func, c).run_test()
# ConstQuantizationTest(self, func, c, input_reverse_order=True).run_test()
# ConstQuantizationTest(self, func, 2.45).run_test()
# ConstQuantizationTest(self, func, 5, input_reverse_order=True).run_test()
#
# AdvancedConstQuantizationTest(self).run_test()

def test_const_representation(self):
c = (np.ones((32,)) + np.random.random((32,))).astype(np.float32)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"source": [
"# Post Training Quantization an EfficientDet Object Detection Model\n",
"\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/example_keras_effdet_lite0.ipynb)\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_effdet_lite0_for_imx500.ipynb)\n",
"\n",
"## Overview\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
"id": "9be59ea8-e208-4b64-aede-1dd6270b3540"
},
"source": [
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/example_keras_imagenet.ipynb)"
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_mobilenetv2_for_imx500.ipynb)"
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Post Training Quantization a Nanodet-Plus Object Detection Model\n",
"\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/example_keras_nanodet_plus.ipynb)\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/example_keras_nanodet_plus_for_imx500.ipynb)\n",
"\n",
"## Overview\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# DeeplabV3+ Semantic Segmentation Keras Model - Quantization for IMX500\n",
"\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/keras_deeplabv3plus_for_imx500.ipynb)\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/keras_deeplabv3plus_for_imx500.ipynb)\n",
"\n",
"## Overview\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# YOLOv8n Object Detection Keras Model - Quantization for IMX500\n",
"\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/keras/ptq/keras_yolov8n_for_imx500.ipynb)\n",
"[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/imx500_notebooks/keras/keras_yolov8n_for_imx500.ipynb)\n",
"\n",
"## Overview\n",
"\n",
Expand Down
Loading

0 comments on commit 647d4a7

Please sign in to comment.