|
1 | 1 | Adaptor |
2 | 2 | ================= |
| 3 | +1. query fw capbility |
| 4 | +2. parse tune config ( lpot config -> fwk capbility) |
| 5 | +3. (optianal) pre optimize |
| 6 | +4. do the quantization |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | +Extension |
| 11 | +================= |
| 12 | +Let us take onnxruntime as en example. Onnxruntime is a backend proposed by microsoft, and it's based on MLAS kernel defaultly. |
| 13 | +Onnxruntime already has [quantization tools](https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/quantization), so the question becomes how to intergrate onnxruntime quantization tools into LPOT. |
| 14 | +1. capbility |
| 15 | + |
| 16 | + we should explore quantization capbility in first. According to [onnx_quantizer](https://github.com/microsoft/onnxruntime/blob/503b61d897074a494f5798069308ee67d8fb9ace/onnxruntime/python/tools/quantization/onnx_quantizer.py#L77), the quantization tools support following attributes: |
| 17 | + 1.1 whether per_channel |
| 18 | + 1.2 whether reduce_range |
| 19 | + 1.3 QLinear mode or Integer mode (which is only seen in onnxruntime) |
| 20 | + 1.4 whether static (static quantization or dynamci quantization) |
| 21 | + 1.4 weight_qtype (choices are float32, int8 and uint8) |
| 22 | + 1.5 input_qtype (choices are float32, int8 and uint8) |
| 23 | + 1.6 quantization_params (None if dynamic quantization) |
| 24 | + 1.7 &1.8 nodes_to_quantize, nodes_to_exclude |
| 25 | + 1.9 op_types_to_quantize |
| 26 | + |
| 27 | + so we can pass a tune capbility to LPOT like |
| 28 | + |
| 29 | + ```yaml |
| 30 | + {'optypewise': {'conv': |
| 31 | + { |
| 32 | + 'activation': { 'dtype': ['uint8', 'fp32']}, |
| 33 | + 'weight': {'dtype': ['int8', 'fp32']}, |
| 34 | + 'algorithm': ['minmax', ], |
| 35 | + 'granularity': ['per_channel'] |
| 36 | + }, |
| 37 | + 'matmul': |
| 38 | + { |
| 39 | + 'activation': { 'dtype': ['uint8', 'fp32']}, |
| 40 | + 'weight': {'dtype': ['int8', 'fp32']}, |
| 41 | + 'algorithm': ['minmax', ], |
| 42 | + 'granularity': ['per_channel'] |
| 43 | + } |
| 44 | + }, |
| 45 | + 'opwise': {('conv1', 'conv'): |
| 46 | + { |
| 47 | + 'activation': { 'dtype': ['uint8', 'fp32']}, |
| 48 | + 'weight': {'dtype': ['int8', 'fp32']} |
| 49 | + } |
| 50 | + } |
| 51 | + } |
| 52 | + ``` |
| 53 | + |
| 54 | +2. parse tune config |
| 55 | + |
| 56 | + LPOT will generate a tune config from your tune capbility like |
| 57 | + ```yaml |
| 58 | + { |
| 59 | + 'fuse': {'int8': [['CONV2D', 'RELU', 'BN'], ['CONV2D', 'RELU']], |
| 60 | + 'fp32': [['CONV2D', 'RELU', 'BN']]}, |
| 61 | + 'calib_iteration': 10, |
| 62 | + 'op': { |
| 63 | + ['op1', 'CONV2D']: { |
| 64 | + 'activation': {'dtype': 'uint8', |
| 65 | + 'algorithm': 'minmax', |
| 66 | + 'scheme':'sym', |
| 67 | + 'granularity': 'per_tensor'}, |
| 68 | + 'weight': {'dtype': 'int8', |
| 69 | + 'algorithm': 'kl', |
| 70 | + 'scheme':'asym', |
| 71 | + 'granularity': 'per_channel'} |
| 72 | + }, |
| 73 | + ['op2', 'RELU]: { |
| 74 | + 'activation': {'dtype': 'int8', |
| 75 | + 'scheme': 'asym', |
| 76 | + 'granularity': 'per_tensor', |
| 77 | + 'algorithm': 'minmax'} |
| 78 | + }, |
| 79 | + ['op3', 'CONV2D']: { |
| 80 | + 'activation': {'dtype': 'fp32'}, |
| 81 | + 'weight': {'dtype': 'fp32'} |
| 82 | + }, |
| 83 | + ... |
| 84 | + } |
| 85 | + } |
| 86 | + ``` |
| 87 | + then you can parse this config into format that ONNXQuantizer can accept |
| 88 | + please make sure whether your quantization API support model wise or op wise quantization. for example, node "conv1" use "minmax" algorithm and node "conv2" use "KL" algorithm, or the whole model must use "minmax" or "KL" in general. |
| 89 | + |
| 90 | +3. pre-optimize |
| 91 | + if your backend support FP32 graph optimization, you can apply it in **query_fw_capability** and quantize your optimized fp32 model instead of original model |
| 92 | + >model = self.pre_optimized_model if self.pre_optimized_model else model |
| 93 | +
|
| 94 | +4. do quantization |
| 95 | + |
| 96 | + This part depend on your backend implementationm you may refer to [onnxruntime](../lpot/adaptor/onnxrt.py) as an example. |
0 commit comments