A model-agnostic method to decompose predictions into linear, nonlinear and pairwise interaction effects. It can be helpful in feature selection and model interpretation.
The algorithm is based on paper Beyond the Black Box: An Intuitive Approach to Investment Prediction with Machine Learning. The following images show the results of model fingerprint algorithm that can break down the predictions of any machine learning model into comprehensible parts.
There are 3 major benefits of model fingerprint algorithm:
- Model fingerprint is a model-agnostic method, which could be applied on top of any machine learning models.
- The resulting decompositions of effects are highly intuitive. We can easily understand which feature has greater influence compared to others, and in what form (linear, nonlinear and interactions).
- The units of all three effects are common, which is also the same as the unit of response variable that is predicted. This makes it even more intuitive.
The model fingerprint algorithm extends the partial dependence function.
The partial dependence function assesses the marginal effect of a feature by following these steps:
- Changing the value of the selected feature across its full range (or a representative set of values).
- For each value of the selected feature, the model predicts outcomes for all instances in the dataset. Here, the varied feature is assumed to be the same for every instance, while other features are kept at their original values.
- The average of these predictions gives us the partial dependence for a value of the chosen feature.
This partial dependence can be understood as the expected prediction of the model as a function of the feature of interest. The same process will be performed for all features.
- The partial dependence function will vary little if the selected feature has little influence on the prediction.
- If the influence of a certain feature on the prediction is purly linear, then its partial dependence plot will be a straight line. For example, for ordinal regression model, the partial dependence plots for all features will be a straight line whose slope equals to the coefficient.
The model fingerprint algorithm decomponse the partial dependence function of a certain feature into a linear part and a nonlinear part. It fits a linear regression model for the partial dependence function. We denote the fitted regression line as
The linear prediction effect of a feature is defined by the mean absolute deviation of the linear predicitons around their average value:
The nonlinear prediction effect is defined by the mean absolute deviation of the total marginal effect around its corresponding linear effect:
Here is a plot demonstrating the partial dependence function, fitted regression line, linear component, and nonlinear component.
(Li, Y., Turkington, D. and Yazdani, A., 2020)
The calculation of pairwise interaction prediction effects is similar but this time a joint partial dependence function for both features are calculated, denoted as
- The pairwise combinations of features need to be manually assigned and too many combinations can lead to slow computation.
- Higher order interactions between features are not demonstrated.
- The sign (direction) of feature influence is not shown.
- compatibility with classifier