-
Notifications
You must be signed in to change notification settings - Fork 794
Open
Labels
module: armIssues related to arm backendIssues related to arm backendpartner: armFor backend delegation, kernels, demo, etc. from the 3rd-party partner, ArmFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm
Description
🐛 Describe the bug
Hi Executorch ARM backend team,
I’m validating the lowering pipeline and collected the following comparison across several models. Quantize with 8-bit only.
Decompose with aten: Decompose by arm backend quantizer.transform_for_annotation
| Model | FP vs Decompose with aten | FP vs Decompose with quant aten (FQ) | FP vs TOSA | Compare FQ vs TOSA |
|---|---|---|---|---|
| Mobilenetv2 | max error: 0.0 mean error: 0.0 cosine: 1.0 |
max error: 8.0 mean error: 1.187 cosine: 0.9997984690887474 |
max error: 13.0 mean error: 1.731 cosine: 0.9995700057524048 |
max error: 8.0 mean error: 1.226 cosine: 0.9997850644158716 |
| Resnet18 | max error: 0.0 mean error: 0.0 cosine: 0.9999999999999999 |
max error: 6.0 mean error: 1.525 cosine: 0.9990599672781051 |
max error: 9.0 mean error: 1.815 cosine: 0.9985450487548194 |
max error: 7.0 mean error: 1.49 cosine: 0.9990350537962956 |
| Deit-tiny | max error: 0.0 mean error: 0.0 cosine: 1.0 |
max error: 96.0 mean error: 21.23 cosine: 0.8994378800187951 |
max error: 83.0 mean error: 18.682 cosine: 0.9264412351657058 |
max error: 96.0 mean error: 21.434 cosine: 0.898735675973962 |
| LLaMA | max error: 89.0 mean error: 20.6328 cosine: 0.7638448052472464 |
max error: 82.0 mean error: 19.37109375 cosine: 0.7869657464752766 |
max error: 82.0 mean error: 19.37109375 cosine: 0.7869657464752766 |
max error: 0.0 mean error: 0.0 cosine: 1.0 |
With the experiments, the variables are as following.
- Executorch commit: 913436a
- TOSA results are inferencing by tosa_reference_model
- For all models, test input is torch.rand(input_shape) (batch size:1)
- For verification only, calibration data == testing input.
- The models are using executorch/example/models
- LLaMA weights:
torch.manual_seed(0)
for p in self.model_.parameters():
# p.data.fill_(0)
torch.nn.init.normal_(p, mean=0.0, std=0.02)
for b in self.model_.buffers():
b.data.fill_(0)Based on the conditions above, I have some questions about the comparison results:
- I can roughly understand why DeiT-tiny shows a large difference between FP and its quantized output — this may come from limited quantization granularity or the nature of the model itself. In that sense, it’s reasonable that the TOSA output would also deviate significantly from FP.
However, what I’m not fully sure about is FQ vs TOSA. In theory, these two should be very close to each other, similar to the other models. - For LLaMA, there is already a very large deviation from the FP model immediately after applying
quantizer.transform_for_annotation, even before running FakeQuant or lowering to TOSA.
Is it possible that these models are currently lowerable, but numerical correctness is not yet guaranteed for them?
Metadata
Metadata
Assignees
Labels
module: armIssues related to arm backendIssues related to arm backendpartner: armFor backend delegation, kernels, demo, etc. from the 3rd-party partner, ArmFor backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm