Multicollinearity-Solver

Multicollinearity is the presence of high correlations among predictor variables.

Multicollinearity-Solver helps systematically reduce redundancy in feature spaces and improves interpretability of the machine learning model.

The method operates as follows:

Construct an undirected graph where each node corresponds to a feature and edges are drawn between pairs of features whose absolute correlation exceeds a user-specified threshold.
Decompose the graph into connected components, representing clusters of mutually correlated variables.
From each component, select a subset of features according to a criterion:
- Feature importance derived from a supervised model (e.g., Random Forest, Gradient Boosting Tree).
- Variance, to retain features with greater informational content.

Figure 1: feature correlations before removal

Figure 2: feature correlations after removal

Figure 3: clusters of highly correlated features

Usage

git clone https://github.com/Teerat-CH/multicollinearity-solver.git
cd multicollinearity-solver
pip install -r requirements.txt

Let we have a feature dataframe where A, B, C and D are features as follow

A	B	C	D
1	1	4	1
2	2	5	1
3	1	6	1
4	2	8	1

Let we have the feature importance of the model as

Feature	Importance
A	0.5
B	0.3
C	0.7
D	0.2

We can construct such feature importance dataframe from model like LightGBM by

feature_importance = pd.DataFrame({
    "Feature": model.feature_name_,
    "Importance": model.feature_importances_
}).set_index('Feature')

Then we can get a new set of feature by

from solver import mcl_solver

# Run solver -> list of features to be removed
features_to_remove = mcl_solver(
    data,
    feature_importance=feature_importance,
    by="importance",
    threshold=0.8
)

# Keep only features not flagged for removal
new_features = [feature for feature in data.columns if feature not in features_to_remove]
data = data[new_features]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
solver.py		solver.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multicollinearity-Solver

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multicollinearity-Solver

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages