You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/introduction.md
+56-79Lines changed: 56 additions & 79 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,108 +3,85 @@ Introduction
3
3
4
4
Intel® Low Precision Optimization Tool is an open source python library to help users to fast deploy low-precision inference solution on popular DL frameworks including TensorFlow, PyTorch, MxNet etc. It automatically optimizes low-precision recipes for deep learning models to achieve optimal product objectives like inference performance and memory usage with expected accuracy criteria.
5
5
6
-
# Infrastructure
7
6
8
-
<divalign="left">
9
-
<imgsrc="imgs/infrastructure.jpg"width="700px" />
10
-
</div>
7
+
# User facing API
11
8
12
-
### Three level APIs
9
+
The API is intented to unify the low precision quantization interfaces cross multiple DL frameworks for best out-of-box experiences.
13
10
14
-
1. User API
11
+
It consists of three components:
15
12
16
-
User API is intented to provide best out-of-box experiences and unify the low precision quantization workflow cross multiple DL frameworks.
The tuning config and model-specific information are controlled by user config yaml file. As for the format of yaml file, please refer to [ptq.yaml](../ilit/template/ptq.yaml) or [qat.yaml](../ilit/template/qat.yaml) or [pruning.yaml](../ilit/template/pruning.yaml).
24
+
### pruning related APIs
25
+
```
26
+
class Pruning(object):
27
+
def __init__(self, conf_fname):
28
+
...
29
+
30
+
def on_epoch_begin(self, epoch):
31
+
...
27
32
28
-
Intel® Low Precision Optimization Tool supports three usages:
33
+
def on_batch_begin(self, batch_id):
34
+
...
29
35
30
-
a) Fully yaml configuration:
31
-
This usage is designed for minimal code changes when integrating with Intel® Low Precision Optimization Tool. All calibration and evaluation process is constructed by ilit upon yaml configuration. including dataloaders used in calibration and evaluation
32
-
phases and quantization tuning settings. For this usage, only model parameter is mandotory.
36
+
def on_batch_end(self):
37
+
...
33
38
34
-
35
-
b) User specifies dataloaders:
36
-
The second usage is designed for concise yaml configuration and constructing calibration and evaluation dataloaders by code. ilit provides built-in dataloaders and evaluators, user just need provide a dataset implemented __iter__ or __getitem__ methods to tuner.dataloader() function. User specifies dataloaders used in calibration and evaluation phase by code.
37
-
The tool provides built-in dataloaders and evaluators, user just need provide a dataset implemented __iter__ or
38
-
__getitem__ methods and invoke dataloader() with dataset as input parameter before calling Quantization().
39
-
40
-
After that, User specifies fp32 "model", calibration dataset "q_dataloader" and evaluation dataset "eval_dataloader".
41
-
The calibrated and quantized model is evaluated with "eval_dataloader" with evaluation metrics specified
42
-
in the configuration file. The evaluation tells the tuner whether the quantized model meets
43
-
the accuracy criteria. If not, the tuner starts a new calibration and tuning flow.
44
-
45
-
For this usage, model, q_dataloader and eval_dataloader parameters are mandotory.
46
-
47
-
c) User specifed eval_func:
48
-
The third usage is designed for ease of tuning enablement for models with custom metric evaluation or metrics not supported by Intel® Low Precision Optimization Tool yet. Currently this usage model works for object detection and NLP networks.
49
-
User specifies dataloaders used in calibration phase by code. This usage is quite similar with b), just user specifies a custom "eval_func" which encapsulates the evaluation dataset by itself.
50
-
The calibrated and quantized model is evaluated with "eval_func". The "eval_func" tells the
51
-
tuner whether the quantized model meets the accuracy criteria. If not, the Tuner starts a new
52
-
calibration and tuning flow.
53
-
54
-
For this usage, model, q_dataloader and eval_func parameters are mandotory
55
-
56
-
2. Framework Adaptation API
57
-
58
-
Framework adaptation layer abstracts out the API differences of various DL frameworks needed for supporting low-precision quantization workflow and provides a unified API for auto-tuning engine to use. The abstracted functionalities include quantization configurations, quantization capabilities, calibration, quantization-aware training, graph transformation for quantization, data loader and metric evaluation, and tensor inspection.
Intel® Low Precision Optimization Tool is designed to be highly extensible. New tuning strategies can be added by inheriting "Strategy" class. New frameworks can be added by inheriting "Adaptor" class. New metrics can be added by inheriting "Metric" class. New tuning objectives can be added by inheriting "Objective" class.
The conf_fname parameter used in above class initialization is a path to Intel® Low Precision Optimization Tool configuration file, which is a yaml file and used to control the whole tuning behavior.
69
57
70
-
# Strategies
58
+
# YAML Syntax
71
59
72
-
### Basic Strategy
60
+
Intel® Low Precision Optimization Tool provides three template yaml files for [PTQ](../ilit/template/ptq.yaml), [QAT](../ilit/template/qat.yaml), [Pruning](../ilit/template/pruning.yaml) scenarios. User could refer to this complete template to understand the meaning of each fields.
73
61
74
-
This strategy is Intel® Low Precision Optimization Tool default tuning strategy, which does model-wise tuning by adjusting gloabl tuning parameters, such as calibration related parameters, kl or minmax algo, quantization related parameters, symmetric or asymmetric, per_channel or per_tensor. If the model-wise tuning result does not meet accuracy goal, this strategy will attempt to do op-wise fallback from bottom to top to prioritize which fallback op has biggest impact on final accuracy, and then do incremental fallback till achieving the accuracy goal.
62
+
> Most of fields in yaml template is optional and a typical yaml file needed is very concise. for example, [HelloWorld Yaml](../examples/helloworld/tf2.x/conf.yaml)
75
63
76
-
### Bayesian Strategy
64
+
#How to use the quantization API
77
65
78
-
Bayesian optimization is a sequential design strategy for global optimization of black-box functions. The strategy refers to the Bayesian optimization package [bayesian-optimization](https://github.com/fmfn/BayesianOptimization) and changes it to a discrete version that complies with the strategy standard of Intel® Low Precision Optimization Tool. It uses Gaussian Processes to define the prior/posterior distribution over the black-box function, and then finds the tuning configuration that maximizes the expected improvement.
66
+
Intel® Low Precision Optimization Tool supports three different usages replying on how user code orgnized:
This strategy is very similar to the basic strategy. It needs to get the tensors for each Operator of raw FP32 models and the quantized model based on best model-wise tuning configuration. And then calculate the MSE (Mean Squared Error) for each operator, sort those operators according to the MSE value, finally do the op-wise fallback in this order.
70
+
This first usage is designed for minimal code changes when integrating with Intel® Low Precision Optimization Tool. All calibration and evaluation process is constructed by yaml, including dataloaders used in calibration and evaluation phases and quantization tuning settings. For this usage, only model parameter is mandotory.
83
71
84
-
### Random Strategy
72
+
Examples of this usage are at [TensorFlow Classification Models](../examples/tensorflow/image_recognition/README.md).
85
73
86
-
This strategy is used to randomly choose tuning configuration from the tuning space.
74
+
### *Concise template-based yaml + few lines code changes*
87
75
88
-
### Exhaustive Strategy
76
+
The second usage is designed for concise yaml configuration by moving calibration and evaluation dataloader construction from yaml to code. If user model is using ilit supported evaluation metrics, this usages will be a good choice.
77
+
78
+
user need provide a *dataloader* implemented __iter__ or __getitem__ methods and batch_size attribute, which usually have existed or easily develop in user code. ilit also provides built-in dataloaders to support dynamic batching, user can implement a *dataset* implemented __iter__ or __getitem__ methods to yield one single batch. Quanitzation().dataloader() will take this dataset as input parameter to construct ilit dataloader.
89
79
90
-
This strategy is used to sequentially traverse all the possible tuning configurations in tuning space.
80
+
After that, User specifies fp32 "model", calibration dataset "q_dataloader" and evaluation dataset "eval_dataloader". The calibrated and quantized model is evaluated with "eval_dataloader" with evaluation metrics specified in the yaml configuration file. The evaluation tells the tuner whether the quantized model meets the accuracy criteria. If not, the tuner starts a new calibration and tuning flow. For this usage, model, q_dataloader and eval_dataloader parameters are mandotory.
91
81
92
-
# Objectives
93
-
94
-
Intel® Low Precision Optimization Tool supports below 3 build-in objectives. All objectives are optimized and driven by accuracy metrics.
95
-
96
-
### 1. Performance
97
-
98
-
This objective targets best performance of quantized model. It is default objective.
99
-
100
-
### 2. Memory Footprint
101
-
102
-
This objective targets minimal memory usage of quantized model.
103
-
104
-
### 3. Model Size
105
-
106
-
This objective targets the weight size of quantized model.
82
+
### *Most concise template-based yaml + few lines code changes*
83
+
84
+
The third usage is designed for ease of tuning enablement for models with custom metric evaluation or metrics not supported by Intel® Low Precision Optimization Tool yet. Currently this usage model works for object detection and NLP networks.
107
85
108
-
# Metrics
86
+
User constructs calibration dataloader by code and pass to "q_dataloader" parameter. This usage is quite similar with the second usage, just user specifies a custom "eval_func" which encapsulates the evaluation dataset and evaluation process by self. The FP32 and quantized INT8 model is evaluated with "eval_func". The "eval_func" yields a higher-is-better accuracy value to the tuner, the tuner will check whether the quantized model meets the accuracy criteria. If not, the Tuner starts a new calibration and tuning flow. For this usage, model, q_dataloader and eval_func parameters are mandotory
109
87
110
-
Intel® Low Precision Optimization Tool supports 3 built-in metrics, Topk, F1 and CocoMAP. The metric is easily extensible via inheriting Metric class.
0 commit comments