Skip to content

Commit 8c26898

Browse files
authored
CoreML: Disable 1D ML Program matmul due to bug in coreml (microsoft#21186)
### Description Disable using CoreML ML Program for a matmul where one of the inputs is 1D as the CoreML implementation appears to be broken. See apple/coremltools#2263 Add some debugging notes. ### Motivation and Context Fix failing test on macos-14.
1 parent 56b36a5 commit 8c26898

File tree

5 files changed

+114
-17
lines changed

5 files changed

+114
-17
lines changed

.github/workflows/mac.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,11 +54,10 @@ jobs:
5454
--test \
5555
--build_shared_lib \
5656
--build_objc \
57+
--use_coreml \
5758
--use_xnnpack \
5859
--use_binskim_compliant_compile_flags
5960
60-
# TODO add --use_coreml once unit test failures are addressed
61-
6261
Objective-C-StaticAnalysis:
6362
runs-on: macos-14
6463

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Steps to debug an ML Program operator implementation
2+
3+
Basic debugging of everything, excluding model execution, (e.g. partitioning, checking if operator is supported,
4+
adding CoreML operator input/outputs) can be done anywhere as the code is setup to build and be able to create the
5+
protobuf based CoreML Model on all platforms.
6+
7+
To debug model execution issues you will need a macOS machine.
8+
9+
## Debugging invalid output
10+
11+
If there is a crash during execution or unexpected output, the best approach is to see what using coremltools directly
12+
produces.
13+
14+
NOTE: that doesn't guarantee coremltools is correct as there could be a bug in their implementation. It does however
15+
provide a data point on whether we are generating the same CoreML model as the coremltools python.
16+
17+
### Comparing to coremltools output
18+
19+
Create a small test script that replicates the inputs/outputs of the operator you are debugging.
20+
This script should use the coremltools library to run the operator and print the output.
21+
This can be used to compare the CoreML EP's output with the coremltools output.
22+
23+
https://apple.github.io/coremltools/docs-guides/source/model-intermediate-language.html#create-a-mil-program
24+
25+
Usage is reasonably intuitive. The below example defines a model with 2 inputs and a matmul operator.
26+
The model is printed, and run with randomly generated inputs. The output from doing so is printed.
27+
28+
```python
29+
import numpy as np
30+
import coremltools as ct
31+
from coremltools.converters.mil import Builder as mb
32+
33+
target = ct.target.iOS15
34+
35+
x_shape = (1, 4)
36+
y_shape = (10, 4, 3)
37+
38+
@mb.program(input_specs=[mb.TensorSpec(shape=x_shape), mb.TensorSpec(shape=y_shape)],
39+
opset_version=target)
40+
def prog(x, y):
41+
# For reference, a constant can be added using `mb.const` and specifying the data in the `val` parameter.
42+
# c_shape = (3, )
43+
# c_data = np.random.random_sample(c_shape)
44+
# c = mb.const(val=c_data)
45+
46+
# call the operator you are debugging with the inputs/constants.
47+
# See the spec for the operator names, input/outputs and supported data types.
48+
# https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html
49+
z = mb.matmul(x=x, y=y)
50+
51+
# can have additional function calls here if there are multiple operators involved.
52+
# Contrived example that uses a constant and the output from a previous operator:
53+
# z = mb.add(x=z, y=c)
54+
55+
return z
56+
57+
# Prints the MIL program in a reasonably concise manner.
58+
print(prog)
59+
60+
# Convert to ML Program model
61+
m = ct.convert(prog, minimum_deployment_target=target)
62+
63+
# If you want to dump the full protobuf of the model uncomment this.
64+
# You can compare the values to what is being set by the ORT CoreML EP code if you suspect any issues there.
65+
# spec = m.get_spec()
66+
# print(spec)
67+
68+
# run the model to generate output for comparison with the CoreML EP output
69+
x = np.random.rand(*x_shape)
70+
y = np.random.rand(*y_shape)
71+
72+
print(m.predict({'x': x, 'y': y}))
73+
```
74+
75+
## Dumping the ORT generated mlmodel
76+
77+
You can also dump the mlmodel generated by the ORT CoreML EP. This can be handy with larger models.
78+
79+
In a debug build, set the ORT_COREML_EP_MODEL_DIR environment variable to a directory where you want the ML Package
80+
containing the mlmodel to be saved. The model will remain after the CoreML EP exits, unlike the default behavior
81+
where we write it to a temporary directory that is automatically removed on application exit.
82+
83+
Script to dump: [dump_mlprogram_model.py](dump_mlprogram_model.py)
84+
85+
See [here](https://github.com/microsoft/onnxruntime/blob/3c0b407709fd3c71755ed046edd688b30a786d94/onnxruntime/core/providers/coreml/model/host_utils.h#L70-L75) for environment variable setup and [usage](https://github.com/search?q=repo%3Amicrosoft%2Fonnxruntime%20kOverrideModelOutputDirectoryEnvVar%20&type=code).

onnxruntime/core/providers/coreml/builders/impl/gemm_op_builder.cc

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -109,19 +109,11 @@ Status GemmOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
109109
ORT_IGNORE_RETURN_VALUE(GetShape(b, b_shape, logger));
110110
int64_t b0 = -1, b1 = -1;
111111

112-
// ML Program MatMul supports N-D input
113112
if (model_builder.CreateMLProgram() && is_matmul) {
114-
if (b_shape.size() == 1) {
115-
// B is treated as {b_shape[0], 1} according to the numpy rules.
116-
b0 = b_shape[0];
117-
b1 = 1;
118-
} else {
119-
// last 2 dims are used
120-
b0 = b_shape[b_shape.size() - 2];
121-
b1 = b_shape[b_shape.size() - 1];
122-
}
113+
// ML Program MatMul supports N-D input, however we don't use the 'K' or 'N' values calculated below for it
114+
// so we don't need to update b0 or b1.
123115
} else {
124-
// we only support 2D input
116+
// we only support 2D input for all other combinations
125117
b0 = b_shape[0];
126118
b1 = b_shape[1];
127119
}
@@ -182,7 +174,6 @@ Status GemmOpBuilder::AddToModelBuilderImpl(ModelBuilder& model_builder, const N
182174
model_builder.AddOperation(std::move(gemm_op));
183175
} else {
184176
// CoreML implementation is the same as ONNX MatMul.
185-
// https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.linear.matmul
186177
auto matmul_op = model_builder.CreateOperation(node, "matmul");
187178
AddOperationInput(*matmul_op, "x", a.Name());
188179
AddOperationInput(*matmul_op, "y", b.Name());
@@ -268,14 +259,28 @@ bool GemmOpBuilder::IsOpSupportedImpl(const Node& node, const OpBuilderInputPara
268259
}
269260

270261
if (is_matmul) {
262+
const auto a_rank = a_shape.size();
263+
const auto b_rank = b_shape.size();
264+
271265
if (input_params.create_mlprogram) {
272-
// ML Program matmul op has numpy semantics the same as the ONNX spec so we can use directly
266+
// ML Program matmul op has numpy semantics the same as the ONNX spec, so we can use directly.
267+
// https://apple.github.io/coremltools/source/coremltools.converters.mil.mil.ops.defs.html#coremltools.converters.mil.mil.ops.defs.iOS15.linear.matmul
268+
//
269+
// There does appear to be a bug in handling one of the inputs being 1D, so for now skip these.
270+
// See https://github.com/apple/coremltools/issues/2263
271+
//
272+
// If required for perf we could manually do the shape alterations the spec documents (convert input to 2D,
273+
// and remove extra dimension from output), as the 2D input is correctly handled by CoreML matmul.
274+
if ((a_rank == 1 && b_rank > 1) || (a_rank > 1 && b_rank == 1)) {
275+
LOGS(logger, VERBOSE) << "Skipping due to bug in CoreML ML Program when one of the inputs is 1D.";
276+
return false;
277+
}
273278
} else {
274279
// we could potentially support 1D and 3D if required. beyond 3D the dims that merge diverge.
275280
// https://github.com/apple/coremltools/blob/1931758aae383c83daddfc56f11a24a9d2bf4b87/coremltools/converters/onnx/_operators.py#L1607
276281
// https://github.com/apple/coremltools/blob/1931758aae383c83daddfc56f11a24a9d2bf4b87/coremltools/converters/mil/backend/nn/op_mapping.py#L1374
277282
// https://apple.github.io/coremltools/mlmodel/Format/NeuralNetwork.html#innerproductlayerparams
278-
if (a_shape.size() != 2 || b_shape.size() != 2) {
283+
if (a_rank != 2 || b_rank != 2) {
279284
LOGS(logger, VERBOSE) << "a and b inputs must be 2D. ";
280285
return false;
281286
}

onnxruntime/core/providers/coreml/builders/model_builder.cc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -906,6 +906,7 @@ Status ModelBuilder::SaveModel() {
906906

907907
#if defined(COREML_ENABLE_MLPROGRAM)
908908
if (create_ml_program_) {
909+
// we need to jump through some hoops to get the model path the ML Program load wants.
909910
std::string tmp_model_path = model_output_path_ + "/tmp/model.mlmodel";
910911
CreateEmptyFile(tmp_model_path);
911912

onnxruntime/core/providers/coreml/dump_mlprogram_model.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@
55
if len(sys.argv) < 2:
66
print(f"Usage: {sys.argv[0]} <path to model.mlmodel in ML Package>")
77
print("If generated by onnxruntime this will be <ML Package root>/Data/com.microsoft.onnxruntime/model.mlmodel")
8+
print(
9+
"The ML Package created by the CoreML EP can saved to a specific directory in a debug build of onnxruntime "
10+
"by setting the environment variable ORT_COREML_EP_MODEL_DIR to the desired directory."
11+
)
12+
813
sys.exit(-1)
914

1015
model_path = sys.argv[1]
@@ -13,7 +18,9 @@
1318
spec = m.get_spec()
1419
print(spec)
1520

16-
# Example code if you want to filter output or do more advanced things
21+
# Example code if you want to filter output or do more advanced things.
22+
# In the below example we print out the value of an attribute of one specific node from a larger model.
23+
#
1724
# main = spec.mlProgram.functions["main"]
1825
# block = main.block_specializations[main.opset]
1926
# print(f"{len(block.operations)} operators")

0 commit comments

Comments
 (0)