Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
200 commits
Select commit Hold shift + click to select a range
45a75c1
add basic gmIC
tliu68 May 24, 2023
87cc2e8
update code
tliu68 Jun 21, 2023
1a731e7
Merge branch 'main' into gmIC
tingshanL Jun 29, 2023
91d8b3e
Merge branch 'scikit-learn:main' into gmIC
tingshanL Jun 29, 2023
e3b98f7
fix linting
tliu68 Jun 29, 2023
a37949e
fix linting
tliu68 Jun 30, 2023
a6ee201
fix tests
tliu68 Jun 30, 2023
ebb86fe
Update _gaussian_mixture_ic.py
tliu68 Jul 25, 2023
7b27ff2
Merge branch 'main' into gmIC
tingshanL Jul 25, 2023
6b92c5a
Merge branch 'main' into gmIC
tingshanL Sep 11, 2023
a2d20f0
Merge branch 'scikit-learn:main' into gmIC
tingshanL Jun 6, 2024
1c86814
Merge branch 'main' into gmIC
tingshanL Jul 8, 2024
4c962d5
fix docstring
tingshanL Jul 8, 2024
c6074f3
fix attributes
tingshanL Jul 9, 2024
e9044d3
fix attribute typo
tingshanL Jul 9, 2024
9eda4d2
fix docstring example mismatch
tingshanL Jul 9, 2024
1106558
update docstring example
tingshanL Jul 9, 2024
b2a3234
Merge branch 'main' into gmIC
tingshanL Jul 9, 2024
fc2b97d
Merge branch 'main' into gmIC
tingshanL Jul 12, 2024
f7c8773
fix clustering mismatch
tingshanL Jul 12, 2024
b31fc57
Update v1.6.rst
tingshanL Jul 12, 2024
24fc234
fix linting
tingshanL Jul 12, 2024
67378b0
fix linting
tingshanL Jul 12, 2024
a3e0966
fix docstring
tingshanL Jul 12, 2024
a393fa3
Update _parameter_constraints
tingshanL Jul 12, 2024
634aeb1
increase codecov
tingshanL Jul 12, 2024
e2f9a77
Merge branch 'main' into gmIC
tingshanL Jul 12, 2024
4206d14
MNT little refactor and doc improvement for metadata routing consumes…
StefanieSenger Jul 11, 2025
f93e7d4
MNT Update pre-commit ruff legacy alias (#31740)
DimitriPapadopoulos Jul 11, 2025
fc95dd2
DOC: Update a link to a research paper (#31739)
star1327p Jul 11, 2025
aed81ed
MNT Add more sample weight checks in regression metric common tests (…
lucyleeow Jul 11, 2025
f187311
Fix `PandasAdapter` causes crash or misattributed features (#31079)
nicolas-bolle Jul 11, 2025
9b7a86f
Fix spurious warning from type_of_target when called on estimator.cla…
saskra Jul 14, 2025
e4b0849
FIX Avoid fitting a pipeline without steps (#31723)
DeaMariaLeon Jul 14, 2025
6848353
Mention possibility of regression targets in warning about unique cla…
lucyleeow Jul 14, 2025
c47fbe3
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Jul 15, 2025
5dc24c0
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Jul 15, 2025
2495f8e
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Jul 15, 2025
bab34a0
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Jul 15, 2025
fe6960b
FIX: Regression in DecisionBoundaryDisplay.from_estimator with colors…
jshn9515 Jul 15, 2025
f1229ff
CI Avoid miniconda CondaToSNonInteractiveError and stop using the def…
lesteve Jul 16, 2025
588f396
DOC Update plots in Categorical Feature Support in GBDT example (#31062)
ArturoAmorQ Jul 17, 2025
6cd690b
DOC update news for 1.7.1 (#31780)
jeremiedbb Jul 18, 2025
dfc2b8d
DOC Forward changelog 1.7.1 (#31779)
jeremiedbb Jul 18, 2025
f462edd
MNT Update SECURITY.md for 1.7.1 (#31782)
jeremiedbb Jul 18, 2025
298b03e
MNT Add tags to GaussianMixture array API and precise them for PCA (#…
lesteve Jul 18, 2025
919527e
DOC Fix release checklist formatting (#31783)
jeremiedbb Jul 18, 2025
57a6704
DOC improve linear model coefficient interpretation example (#31760)
MarieSacksick Jul 18, 2025
a048a40
MNT Remove unused utils._array_api functions (#31785)
lesteve Jul 18, 2025
a64b6b2
DOC Fix `pos_label` docstring in Display classes (#31696)
lucyleeow Jul 18, 2025
6d2c9f2
FIX Add validation for FeatureUnion transformer outputs (#31318) (#31…
gguiomar Jul 19, 2025
ed996fa
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Jul 21, 2025
7e6afd9
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Jul 21, 2025
cdcdde0
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Jul 21, 2025
5b1eb74
CI Use miniforge for wheel building [cd build] (#31793)
thomasjpfan Jul 21, 2025
420deba
DOC Update two more reference links (#31765)
star1327p Jul 21, 2025
13c7ce8
Update multi_class deprecation to be removed in 1.8 (#31795)
pras529 Jul 21, 2025
30eb762
DOC fix metadata REQUESTER_DOC indentation (#31805)
MatthewSZhang Jul 22, 2025
3843f82
Fix empty column check in ColumnTransformer to be compatible with pan…
jorisvandenbossche Jul 22, 2025
1c1ec5b
DOC: Fix assume_centered parameter documentation in EmpiricalCovarian…
Krish0909 Jul 22, 2025
5464d9a
CI Fix Azure install.sh bash regex match (#31813)
lesteve Jul 22, 2025
6058580
CI Use venv rather than virtualenv (#31812)
lesteve Jul 22, 2025
a619e79
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Jul 22, 2025
2ef4853
MNT Add caller name to scale input validation (#31816)
nithish-74 Jul 22, 2025
a1f5952
DOC improve doc for `_check_n_features` and `_check_feature_names` (…
VirenPassi Jul 22, 2025
f6939e8
DOC Increase prominence of starting from existing issues (#31660)
betatim Jul 23, 2025
e15920c
Corrected broken link in documentation (#31818)
elenafillo Jul 23, 2025
0ca4ac2
MNT Use float64 epsilon when clipping initial probabilities in Gradie…
mohiuddin-khan-shiam Jul 23, 2025
d80b0c7
DOC Fix KernelPCA docstrings for transform functions to match PCA cla…
MarekPokropinski Jul 23, 2025
aa680bc
TST fix check_array_api_input device check (#31814)
StefanieSenger Jul 24, 2025
5c4adff
MNT Use context managers to safely close dataset files (#31836)
pushkar-hue Jul 25, 2025
4e5f636
MNT Improve _check_array_api_dispatch docstring (#31831)
lesteve Jul 25, 2025
6037c68
MNT Remove `ColumnTransformer.remainder` from `get_metadata_routing` …
StefanieSenger Jul 25, 2025
5833812
DOC Clarify 'ovr' as the default decision function shape strategy in …
Shashank1202 Jul 25, 2025
25aeaf3
ENH Add clip parameter to MaxAbsScaler (#31790)
glevv Jul 25, 2025
c84c33e
FIX Add input validation to _basePCA.inverse_transform (#29310)
icfaust Jul 25, 2025
91486d6
API Replace y_pred with y_score in DetCurveDisplay and PrecisionRecal…
luiser1401 Jul 25, 2025
ed5f530
FIX OneVsRestClassifier to ensure that predict == argmax(decision_fun…
lakrish Jul 25, 2025
27e5256
MNT Add `_check_sample_weights` to classification metrics (#31701)
lucyleeow Jul 27, 2025
49af3c9
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Jul 28, 2025
4622eff
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Jul 28, 2025
a0f6714
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Jul 28, 2025
18e89a4
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Jul 28, 2025
4abf564
MNT Consistently use relative imports (#31817)
jeremiedbb Jul 28, 2025
29b379a
FIX Preserve y dimensions within TransformedTargetRegressor (#31563)
kryggird Jul 28, 2025
4b79fdf
MNT refactor _rescale_data in linear models into _preprocess_data (#3…
lorentzenchr Jul 29, 2025
da90c58
DOC add note for `**fit_params` in `fit_transform` if not expected by…
StefanieSenger Jul 29, 2025
1fe6595
MNT Switch to absolute imports enforced by `ruff` (#31847)
lesteve Jul 29, 2025
af4f330
MNT Remove redundant mkdir calls (#31833)
jeremiedbb Jul 30, 2025
8dc7ea9
TST use global_random_seed in `sklearn/linear_model/tests/test_logist…
DeaMariaLeon Jul 30, 2025
ae9d088
MNT Improve codespell support (and add CI) and make it fix few typos …
yarikoptic Jul 31, 2025
a589342
MNT Update .git-blame-ignore-revs with import change PRs (#31858)
lesteve Jul 31, 2025
810b920
FEA D2 Brier Score (#28971)
OmarManzoor Jul 31, 2025
6e2d44c
Merge commit from fork
lesteve Jul 31, 2025
d578de5
Merge commit from fork
lesteve Jul 31, 2025
3d35e02
TST better PassiveAggressive test against simple implementation (#31857)
lorentzenchr Jul 31, 2025
4d3497c
DOC d2 brier score updates (#31863)
OmarManzoor Aug 1, 2025
e8ab263
TST random seed global /svm/tests/test_svm.py (#25891)
Veghit Aug 1, 2025
7d1d968
FEA add temperature scaling to `CalibratedClassifierCV` (#31068)
virchan Aug 1, 2025
bf606a4
DOC add 2nd author to whatsnew of #31068 temperature scaling (#31868)
lorentzenchr Aug 4, 2025
1c1214b
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Aug 4, 2025
4787aa6
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Aug 4, 2025
e890e6b
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Aug 4, 2025
1ff785e
ENH Array API support for confusion_matrix (#30562)
StefanieSenger Aug 4, 2025
fe08016
ENH avoid double input validation in ElasticNet and Lasso (#31848)
lorentzenchr Aug 4, 2025
760edca
DOC Enhance DBSCAN docstrings with clearer parameter guidance and des…
sape94 Aug 4, 2025
52d93e1
Fix requires_fit tag for stateless FeatureHasher and HashingVectorize…
hqkqn32 Aug 4, 2025
4a4e5f5
Bump pypa/cibuildwheel from 3.0.0 to 3.1.2 in the actions group (#31865)
dependabot[bot] Aug 4, 2025
aa58933
Add FAQ entry about the spam label (#31822)
betatim Aug 5, 2025
adb1ae7
DOC Add vector quantization example to KBinsDiscretizer docs (#31613)
pw42020 Aug 5, 2025
b824c72
DOC Improve wording in Categorical Feature support example (#31864)
ArturoAmorQ Aug 6, 2025
1a6e34c
CI First step towards moving Azure CI to GHA (#31832)
lesteve Aug 7, 2025
52fb066
DOC: Fix typo in _HTMLDocumentationLinkMixin docstring (#31887)
sotagg Aug 7, 2025
8525ba5
ENH speedup enet_coordinate_descent_gram (#31880)
lorentzenchr Aug 8, 2025
6fd23fc
ENH/DOC clearer sample weight validation error msg (#31873)
kapslock123 Aug 8, 2025
a665c60
MNT instruct AI tools to not open pull requests in github PULL_REQUES…
StefanieSenger Aug 8, 2025
ba3e753
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Aug 11, 2025
217fe94
FIX LogisticRegression warm start with newton-cholesky solver (#31866)
lorentzenchr Aug 11, 2025
a9a7b7d
CI Add ccache for GitHub Actions (#31895)
StefanieSenger Aug 11, 2025
24844a0
FIX make scorer.repr work with a partial score_func (#31891)
adrinjalali Aug 11, 2025
f1cbccb
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Aug 11, 2025
c786c69
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Aug 11, 2025
df3ae86
Move Nicolas from active maintainer to emeritus (#31921)
NicolasHug Aug 11, 2025
7a26152
CI Remove conda environment cache in CUDA CI (#31900)
lesteve Aug 11, 2025
ff0d6d1
DOC Minor updates to DBSCAN clustering documentation (#31914)
star1327p Aug 11, 2025
3adeabd
DOC better internal docstring for Cython enet_coordinate_descent (#31…
lorentzenchr Aug 11, 2025
e887291
DOC Improve wording in Getting Started page (#31926)
ArturoAmorQ Aug 12, 2025
3c74809
DEP PassiveAggressiveClassifier and PassiveAggressiveRegressor (#29097)
lorentzenchr Aug 12, 2025
33a733e
ENH/FIX stopping criterion for coordinate descent `gap <= tol` (#31906)
lorentzenchr Aug 13, 2025
e402663
DOC Clean up `Building from source` instructions on macOS (#31938)
DeaMariaLeon Aug 13, 2025
b265982
DOC relabel some PRs as efficiency (#31934)
lorentzenchr Aug 13, 2025
78301f5
TST Make test_dtype_preprocess_data pass for all global random seeds …
lorentzenchr Aug 13, 2025
42cbd9d
TST/MNT clean up some tests in coordinate descent (#31909)
lorentzenchr Aug 14, 2025
6f422d8
MNT reduce test duration (#31953)
lorentzenchr Aug 16, 2025
e099dba
DOC: Fix formatting issues with bold font and ` backquote` (#31950)
star1327p Aug 16, 2025
4e2063d
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Aug 18, 2025
50ee91d
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Aug 18, 2025
a4e053e
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Aug 18, 2025
5ff34f7
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Aug 18, 2025
2831228
MNT update Cython 3.0.10 to 3.1.2 (#31905)
lorentzenchr Aug 18, 2025
e1021ba
ENH add sparse_matmul_to_dense (#31952)
lorentzenchr Aug 19, 2025
2883187
ENH avoid copies of X in `_alpha_grid` for coordinate descent (#31946)
lorentzenchr Aug 19, 2025
3883ba7
TST add test_multi_task_lasso_vs_skglm (#31957)
lorentzenchr Aug 19, 2025
092e577
CI Temporary work-around for Windows wheels on Python 3.13 (#31964)
lesteve Aug 19, 2025
18bc6db
DOC: Update a link to Cython-related code (#31967)
star1327p Aug 19, 2025
17bf627
DOC remove custom scorer from scratch from docs (#31890)
adrinjalali Aug 20, 2025
75cd236
docs: minor typos fixed (#31945)
maitreytalware Aug 20, 2025
6aa5a6f
DOC improved plot_semi_supervised_newsgroups.py example (#31104)
elhambbi Aug 21, 2025
faf69cb
TST Fix test_sparse_matmul_to_dense for all random seeds (#31983)
jeremiedbb Aug 21, 2025
b10b73a
Fix uncomparable values in SimpleImputer tie-breaking strategy (#31820)
AlexandreAbraham Aug 21, 2025
866fef1
MNT DNPY_NO_DEPRECATED_API=NPY_1_22_API_VERSION and security fixes (#…
lorentzenchr Aug 22, 2025
884e512
CI Work around loky windows 3.13.7 for free threaded wheel (#31982)
lesteve Aug 22, 2025
492e1ec
ENH add gap safe screening rules to enet_coordinate_descent (#31882)
lorentzenchr Aug 22, 2025
d5715fb
ENH use np.cumsum instead of stable_cumsum in kmeans++ (#31991)
otizonaizit Aug 22, 2025
f19ff9c
Make the test suite itself thread-safe to be able to detect thread-sa…
ogrisel Aug 22, 2025
450cb20
ENH use xp.cumulative_sum and xp.searchsorted directly instead of sta…
otizonaizit Aug 22, 2025
7cc4581
DOC: Correct punctuation typos in Model Evaluation Section (#32001)
star1327p Aug 23, 2025
f2cd677
MNT bump array-api-extra to v0.8.0 (#31993)
lucascolley Aug 25, 2025
d8ba1de
MNT Avoid DeprecationWarning in numpy-dev (#32010)
lesteve Aug 25, 2025
e2402d1
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Aug 25, 2025
cd82ba3
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Aug 25, 2025
dea1c1b
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Aug 25, 2025
e6f5ac5
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Aug 25, 2025
74142b3
ENH use np.cumsum directly instead of stable_cumsum in AdaBoost (#31995)
otizonaizit Aug 25, 2025
a86b32d
ENH use np.cumsum directly instead of stable_cumsum for LLE (#31996)
otizonaizit Aug 25, 2025
969df01
Customized dir method to recognize available_if decorator (#31928)
j-hendricks Aug 26, 2025
872be3c
DOC Fix rst substitution casing in README.rst (#32015)
juni2003 Aug 26, 2025
48cba5a
FEA Make standard scaler compatible to Array API (#27113)
AlexanderFabisch Aug 27, 2025
726ed18
CI Add Python 3.14 nightly wheels (#32012)
lesteve Aug 27, 2025
56da56f
DOC Add reference links to Bayesian Regression (#32016)
star1327p Aug 28, 2025
5736956
CI add codecov to GitHub Action workflow (#31941)
StefanieSenger Aug 28, 2025
00acd12
ENH speedup coordinate descent by avoiding calls to axpy in innermost…
lorentzenchr Aug 28, 2025
ef4885f
MNT `np.nan_to_num` -> `xpx.nan_to_num` (#32033)
lucascolley Aug 28, 2025
2bcfd2e
DOC Add TargetEncoder to Categorical Feature Support example (#32019)
ArturoAmorQ Aug 28, 2025
0eba4d4
MNT fix typo and internal documentation in LinearModelLoss and Newton…
lorentzenchr Aug 28, 2025
2e4e40b
DOC Build website with a Scikit-learn logo that is complete - not cro…
DeaMariaLeon Aug 29, 2025
98f9eec
MNT Add changelog README and PR checklist to PR template (#32038)
lesteve Aug 29, 2025
1a783c9
DOC Use un-cropped image for thumbnails (#32037)
DeaMariaLeon Aug 29, 2025
59c4b7a
CI Use pytest-xdist in debian 32 build (#32031)
lesteve Aug 29, 2025
b5c5130
MNT remove PA_C from SGD and (re-) use eta0 (#31932)
lorentzenchr Aug 31, 2025
285883c
FIX make sure _PassthroughScorer works with meta-estimators (#31898)
adrinjalali Aug 31, 2025
db3e21b
:lock: :robot: CI Update lock files for scipy-dev CI build(s) :lock: …
scikit-learn-bot Sep 1, 2025
b7b8dd7
:lock: :robot: CI Update lock files for array-api CI build(s) :lock: …
scikit-learn-bot Sep 1, 2025
de0e21e
:lock: :robot: CI Update lock files for free-threaded CI build(s) :lo…
scikit-learn-bot Sep 1, 2025
6d233b9
:lock: :robot: CI Update lock files for main CI build(s) :lock: :robo…
scikit-learn-bot Sep 1, 2025
6c86237
TST Add option to use strict xfail mode in `parametrize_with_checks` …
betatim Sep 1, 2025
8a12e07
MAINT remove useless np.abs in test (#32069)
FrancoisPgm Sep 1, 2025
f2d793b
MNT Improve metadata routing warning message (#32070)
Flakes342 Sep 1, 2025
0c984ae
CI Revert Python 3.13.7 work arounds in wheels (#32068)
lesteve Sep 1, 2025
42b6fc8
MNT Remove xfail now that array-api-strict >2.3.1 (#32052)
lucyleeow Sep 1, 2025
e3b383a
MNT remove the `steps` attribute from _BaseComposition (#32040)
sotagg Sep 1, 2025
ed0a98a
CI Run free-threaded test suite with pytest-run-parallel (#32023)
lesteve Sep 2, 2025
96f48da
MRG Add Warning for NaNs in Yeo-Johnson Inverse Transform with Extrem…
maf-rnmourao Sep 2, 2025
c7866e6
TST fix platform sensitive test: test_float_precision (#32035)
ogrisel Sep 2, 2025
b138521
CI Add Python 3.14 free-threaded wheels (#32079)
lesteve Sep 2, 2025
30b98cd
DOC improve docstring of LogisticRegression and LogisticRegressionCV …
lorentzenchr Sep 2, 2025
90338a4
MNT Mark cython extensions as free-threaded compatible (#31342)
lesteve Sep 2, 2025
3edc4d6
ENH Add a link + tooltip to each parameter docstring in params table …
DeaMariaLeon Sep 2, 2025
835355a
DOC review comments for LogisticRegressionCV docstrings (#32082)
lorentzenchr Sep 2, 2025
a08d428
Merge branch 'main' into gmIC
tingshanL Sep 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
157 changes: 157 additions & 0 deletions examples/mixture/plot_gmIC_selection.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
"""
================================
Gaussian Mixture Model Selection
================================

This example shows that model selection can be performed with Gaussian Mixture
Models (GMM) using :ref:`information-theory criteria <aic_bic>`. Model selection
concerns both the covariance type and the number of components in the model.

In this case, both the Akaike Information Criterion (AIC) and the Bayes
Information Criterion (BIC) provide the right result, but we only demo the
latter as BIC is better suited to identify the true model among a set of
candidates. Unlike Bayesian procedures, such inferences are prior-free.

"""

# %%
# Data generation
# ---------------
#
# We generate two components (each one containing `n_samples`) by randomly
# sampling the standard normal distribution as returned by `numpy.random.randn`.
# One component is kept spherical yet shifted and re-scaled. The other one is
# deformed to have a more general covariance matrix.

import numpy as np

n_samples = 500
np.random.seed(0)
C = np.array([[0.0, -0.1], [1.7, 0.4]])
component_1 = np.dot(np.random.randn(n_samples, 2), C) # general
component_2 = 0.7 * np.random.randn(n_samples, 2) + np.array([-4, 1]) # spherical

X = np.concatenate([component_1, component_2])

# %%
# We can visualize the different components:

import matplotlib.pyplot as plt

plt.scatter(component_1[:, 0], component_1[:, 1], s=0.8)
plt.scatter(component_2[:, 0], component_2[:, 1], s=0.8)
plt.title("Gaussian Mixture components")
plt.axis("equal")
plt.show()

# %%
# Model training and selection
# ----------------------------
#
# We vary the number of components from 1 to 6 and the type of covariance
# parameters to use:
#
# - `"full"`: each component has its own general covariance matrix.
# - `"tied"`: all components share the same general covariance matrix.
# - `"diag"`: each component has its own diagonal covariance matrix.
# - `"spherical"`: each component has its own single variance.
#
# We score the different models and keep the best model (the lowest BIC). This
# is done by using :class:`~sklearn.model_selection.GridSearchCV` and a
# user-defined score function which returns the negative BIC score, as
# :class:`~sklearn.model_selection.GridSearchCV` is designed to **maximize** a
# score (maximizing the negative BIC is equivalent to minimizing the BIC).
#
# The best set of parameters and estimator are stored in `best_parameters_` and
# `best_estimator_`, respectively.

from sklearn.mixture import GaussianMixtureIC
gm_ic = GaussianMixtureIC(min_components=1, max_components=6, covariance_type='all')
gm_ic.fit(X)



# %%
# Plot the BIC scores
# -------------------
#
# To ease the plotting we can create a `pandas.DataFrame` from the results of
# the cross-validation done by the grid search. We re-inverse the sign of the
# BIC score to show the effect of minimizing it.

import pandas as pd

df = pd.DataFrame(
[(
model.n_components, model.covariance_type, model.criterion
) for model in gm_ic.results_]
)
df.columns = ["Number of components", "Type of covariance", "BIC score"]
df.sort_values(by="BIC score").head()

# %%
import seaborn as sns

sns.catplot(
data=df,
kind="bar",
x="Number of components",
y="BIC score",
hue="Type of covariance",
)
plt.show()

# %%
# In the present case, the model with 2 components and full covariance (which
# corresponds to the true generative model) has the lowest BIC score and is
# therefore selected by the grid search.
#
# Plot the best model
# -------------------
#
# We plot an ellipse to show each Gaussian component of the selected model. For
# such purpose, one needs to find the eigenvalues of the covariance matrices as
# returned by the `covariances_` attribute. The shape of such matrices depends
# on the `covariance_type`:
#
# - `"full"`: (`n_components`, `n_features`, `n_features`)
# - `"tied"`: (`n_features`, `n_features`)
# - `"diag"`: (`n_components`, `n_features`)
# - `"spherical"`: (`n_components`,)

from matplotlib.patches import Ellipse
from scipy import linalg

color_iter = sns.color_palette("tab10", 2)[::-1]
Y_ = gm_ic.best_model_.predict(X)

fig, ax = plt.subplots()

for i, (mean, cov, color) in enumerate(
zip(
gm_ic.best_model_.means_,
gm_ic.best_model_.covariances_,
color_iter,
)
):
v, w = linalg.eigh(cov)
if not np.any(Y_ == i):
continue
plt.scatter(X[Y_ == i, 0], X[Y_ == i, 1], 0.8, color=color)

angle = np.arctan2(w[0][1], w[0][0])
angle = 180.0 * angle / np.pi # convert to degrees
v = 2.0 * np.sqrt(2.0) * np.sqrt(v)
ellipse = Ellipse(mean, v[0], v[1], angle=180.0 + angle, color=color)
ellipse.set_clip_box(fig.bbox)
ellipse.set_alpha(0.5)
ax.add_artist(ellipse)

plt.title(
f"Selected GMM: {gm_ic.covariance_type_} model, "
f"{gm_ic.n_components_} components"
)
plt.axis("equal")
plt.show()

# %%
3 changes: 2 additions & 1 deletion sklearn/mixture/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@

from ._gaussian_mixture import GaussianMixture
from ._bayesian_mixture import BayesianGaussianMixture
from ._gaussian_mixture_ic import GaussianMixtureIC


__all__ = ["GaussianMixture", "BayesianGaussianMixture"]
__all__ = ["GaussianMixture", "BayesianGaussianMixture", "GaussianMixtureIC"]
Loading