-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Change the default of aggregate
parameter for cross-validation report
#1440
Conversation
Nitpick: on form, and this is open to debate, i find it more explicit to use a dedicated sentinel |
aggregate
parameter for cross-validation report
I wasn't sure how to write this, so of course it can be changed! |
I think that we can pass the default |
updated, did not think about this simpler possibility. Sequence of str also makes more sense, as more aggregate options exists in pandas than mean and std |
Now, we should amend a couple of things:
|
@thomasloux would you be able to make the latest requested changes? |
Hello, I've changed the doctstring. Not sure what you mean by "Examples" section if not the one in the docstring. You may want to check that briefly if you want to check exact formatting (where to truncate results...). For all metrics, I've remove any argument link with Additionally, here is the new description of the argument:
Let me know if you have suggestion on these change! |
@thomasloux I went through the file. I did not wanted to push in your branch directly. So here is the diff of what I would propose as changes: diff --git a/examples/getting_started/plot_skore_getting_started.py b/examples/getting_started/plot_skore_getting_started.py
index 993555fd..1e45f320 100644
--- a/examples/getting_started/plot_skore_getting_started.py
+++ b/examples/getting_started/plot_skore_getting_started.py
@@ -129,16 +129,16 @@ cv_report = CrossValidationReport(rf, X, y, cv_splitter=5)
cv_report.help()
# %%
-# We display the metrics for each fold:
+# We display the mean and standard deviation of for each metric:
# %%
cv_report.metrics.report_metrics(pos_label=1)
# %%
-# or the aggregated results:
+# or by individual fold:
# %%
-cv_report.metrics.report_metrics(aggregate=["mean", "std"], pos_label=1)
+cv_report.metrics.report_metrics(aggregate=None, pos_label=1)
# %%
# We display the ROC curves for each fold:
diff --git a/examples/technical_details/plot_cache_mechanism.py b/examples/technical_details/plot_cache_mechanism.py
index 65ba5ddb..04a0d12e 100644
--- a/examples/technical_details/plot_cache_mechanism.py
+++ b/examples/technical_details/plot_cache_mechanism.py
@@ -272,7 +272,7 @@ report.help()
# exposed.
# The first call will be slow because it computes the predictions for each fold.
start = time.time()
-result = report.metrics.report_metrics(aggregate=["mean", "std"])
+result = report.metrics.report_metrics()
end = time.time()
result
@@ -283,7 +283,7 @@ print(f"Time taken: {end - start:.2f} seconds")
#
# But the subsequent calls are fast because the predictions are cached.
start = time.time()
-result = report.metrics.report_metrics(aggregate=["mean", "std"])
+result = report.metrics.report_metrics()
end = time.time()
result
diff --git a/examples/use_cases/plot_employee_salaries.py b/examples/use_cases/plot_employee_salaries.py
index 60793db0..168576f6 100644
--- a/examples/use_cases/plot_employee_salaries.py
+++ b/examples/use_cases/plot_employee_salaries.py
@@ -198,7 +198,7 @@ my_project.put("Linear model report", report)
# %%
# We can now have a look at the performance of the model with some standard metrics.
-report.metrics.report_metrics(aggregate=["mean", "std"], indicator_favorability=True)
+report.metrics.report_metrics(indicator_favorability=True)
# %%
# Second model
@@ -241,7 +241,7 @@ my_project.put("HGBDT model report", report)
# %%
#
# We can now have a look at the performance of the model with some standard metrics.
-report.metrics.report_metrics(aggregate=["mean", "std"])
+report.metrics.report_metrics()
# %%
# Investigating the models
@@ -264,8 +264,8 @@ import pandas as pd
results = pd.concat(
[
- linear_model_report.metrics.report_metrics(aggregate=["mean", "std"]),
- hgbdt_model_report.metrics.report_metrics(aggregate=["mean", "std"]),
+ linear_model_report.metrics.report_metrics(),
+ hgbdt_model_report.metrics.report_metrics(),
],
axis=1,
)
@@ -289,13 +289,11 @@ results = pd.concat(
scoring=scoring,
scoring_kwargs=scoring_kwargs,
scoring_names=scoring_names,
- aggregate=["mean", "std"],
),
hgbdt_model_report.metrics.report_metrics(
scoring=scoring,
scoring_kwargs=scoring_kwargs,
scoring_names=scoring_names,
- aggregate=["mean", "std"],
),
],
axis=1,
diff --git a/skore/src/skore/sklearn/_cross_validation/metrics_accessor.py b/skore/src/skore/sklearn/_cross_validation/metrics_accessor.py
index c65ef2b8..abb4250e 100644
--- a/skore/src/skore/sklearn/_cross_validation/metrics_accessor.py
+++ b/skore/src/skore/sklearn/_cross_validation/metrics_accessor.py
@@ -60,7 +60,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
pos_label: Optional[Union[int, float, bool, str]] = None,
indicator_favorability: bool = False,
flat_index: bool = False,
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Report a set of metrics for our estimator.
@@ -99,7 +101,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
Whether to flatten the `MultiIndex` columns. Flat index will always be lower
case, do not include spaces and remove the hash symbol to ease indexing.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -118,7 +120,6 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> report.metrics.report_metrics(
... scoring=["precision", "recall"],
... pos_label=1,
- ... aggregate=["mean", "std"],
... indicator_favorability=True,
... )
LogisticRegression Favorability
@@ -150,7 +151,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
report_metric_name: str,
*,
data_source: DataSource = "test",
- aggregate: Optional[Sequence[str]] = None,
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
**metric_kwargs: Any,
) -> pd.DataFrame:
# build the cache key components to finally create a tuple that will be used
@@ -237,7 +240,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
self,
*,
data_source: DataSource = "test",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the accuracy score.
@@ -249,7 +254,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
- "test" : use the test set provided when creating the report.
- "train" : use the train set provided when creating the report.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -266,10 +271,10 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> classifier = LogisticRegression(max_iter=10_000)
>>> report = CrossValidationReport(classifier, X=X, y=y, cv_splitter=2)
>>> report.metrics.accuracy()
- LogisticRegression
- Split #0 Split #1
+ LogisticRegression
+ mean std
Metric
- Accuracy 0.94... 0.94...
+ Accuracy 0.94... 0.00...
"""
return self.report_metrics(
scoring=["accuracy"],
@@ -290,7 +295,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
Literal["binary", "macro", "micro", "weighted", "samples"]
] = None,
pos_label: Optional[Union[int, float, bool, str]] = None,
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the precision score.
@@ -330,7 +337,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
pos_label : int, float, bool or str, default=None
The positive class.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -347,11 +354,11 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> classifier = LogisticRegression(max_iter=10_000)
>>> report = CrossValidationReport(classifier, X=X, y=y, cv_splitter=2)
>>> report.metrics.precision()
- LogisticRegression
- Split #0 Split #1
- Metric Label / Average
- Precision 0 0.96... 0.90...
- 1 0.93... 0.96...
+ LogisticRegression
+ mean std
+ Metric Label / Average
+ Precision 0 0.93... 0.04...
+ 1 0.94... 0.02...
"""
return self.report_metrics(
scoring=["precision"],
@@ -374,7 +381,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
Literal["binary", "macro", "micro", "weighted", "samples"]
] = None,
pos_label: Optional[Union[int, float, bool, str]] = None,
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the recall score.
@@ -415,7 +424,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
pos_label : int, float, bool or str, default=None
The positive class.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -432,11 +441,11 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> classifier = LogisticRegression(max_iter=10_000)
>>> report = CrossValidationReport(classifier, X=X, y=y, cv_splitter=2)
>>> report.metrics.recall()
- LogisticRegression
- Split #0 Split #1
- Metric Label / Average
- Recall 0 0.87... 0.94...
- 1 0.98... 0.94...
+ LogisticRegression
+ mean std
+ Metric Label / Average
+ Recall 0 0.91... 0.046...
+ 1 0.96... 0.027...
"""
return self.report_metrics(
scoring=["recall"],
@@ -453,7 +462,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
self,
*,
data_source: DataSource = "test",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the Brier score.
@@ -465,7 +476,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
- "test" : use the test set provided when creating the report.
- "train" : use the train set provided when creating the report.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -482,10 +493,10 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> classifier = LogisticRegression(max_iter=10_000)
>>> report = CrossValidationReport(classifier, X=X, y=y, cv_splitter=2)
>>> report.metrics.brier_score()
- LogisticRegression
- Split #0 Split #1
+ LogisticRegression
+ mean std
Metric
- Brier score 0.04... 0.04...
+ Brier score 0.04... 0.00...
"""
return self.report_metrics(
scoring=["brier_score"],
@@ -504,7 +515,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
data_source: DataSource = "test",
average: Optional[Literal["macro", "micro", "weighted", "samples"]] = None,
multi_class: Literal["raise", "ovr", "ovo"] = "ovr",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the ROC AUC score.
@@ -549,7 +562,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
pairwise combinations of classes. Insensitive to class imbalance when
`average == "macro"`.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -566,10 +579,10 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> classifier = LogisticRegression(max_iter=10_000)
>>> report = CrossValidationReport(classifier, X=X, y=y, cv_splitter=2)
>>> report.metrics.roc_auc()
- LogisticRegression
- Split #0 Split #1
+ LogisticRegression
+ mean std
Metric
- ROC AUC 0.99... 0.98...
+ ROC AUC 0.98... 0.00...
"""
return self.report_metrics(
scoring=["roc_auc"],
@@ -587,7 +600,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
self,
*,
data_source: DataSource = "test",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the log loss.
@@ -599,7 +614,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
- "test" : use the test set provided when creating the report.
- "train" : use the train set provided when creating the report.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -616,10 +631,10 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> classifier = LogisticRegression(max_iter=10_000)
>>> report = CrossValidationReport(classifier, X=X, y=y, cv_splitter=2)
>>> report.metrics.log_loss()
- LogisticRegression
- Split #0 Split #1
+ LogisticRegression
+ mean std
Metric
- Log loss 0.1... 0.1...
+ Log loss 0.14... 0.03...
"""
return self.report_metrics(
scoring=["log_loss"],
@@ -637,7 +652,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
*,
data_source: DataSource = "test",
multioutput: Literal["raw_values", "uniform_average"] = "raw_values",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the R² score.
@@ -659,7 +676,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
By default, no averaging is done.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -676,10 +693,10 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> regressor = Ridge()
>>> report = CrossValidationReport(regressor, X=X, y=y, cv_splitter=2)
>>> report.metrics.r2()
- Ridge
- Split #0 Split #1
+ Ridge
+ mean std
Metric
- R² 0.36... 0.39...
+ R² 0.37... 0.02...
"""
return self.report_metrics(
scoring=["r2"],
@@ -698,7 +715,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
*,
data_source: DataSource = "test",
multioutput: Literal["raw_values", "uniform_average"] = "raw_values",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
) -> pd.DataFrame:
"""Compute the root mean squared error.
@@ -720,7 +739,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
By default, no averaging is done.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
Returns
@@ -738,9 +757,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
>>> report = CrossValidationReport(regressor, X=X, y=y, cv_splitter=2)
>>> report.metrics.rmse()
Ridge
- Split #0 Split #1
+ mean std
Metric
- RMSE 59.9... 61.4...
+ RMSE 60.7... 1.0...
"""
return self.report_metrics(
scoring=["rmse"],
@@ -756,7 +775,9 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
*,
metric_name: Optional[str] = None,
data_source: DataSource = "test",
- aggregate: Optional[Sequence[str]] = ("mean", "std"),
+ aggregate: Optional[
+ Union[Literal["mean", "std"], Sequence[Literal["mean", "std"]]]
+ ] = ("mean", "std"),
**kwargs,
) -> pd.DataFrame:
"""Compute a custom metric.
@@ -791,7 +812,7 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
- "test" : use the test set provided when creating the report.
- "train" : use the train set provided when creating the report.
- aggregate : {"mean", "std"} or list of such str, default=None
+ aggregate : {"mean", "std"} or list of such str, default=("mean", "std")
Function to aggregate the scores across the cross-validation splits.
**kwargs : dict
@@ -816,10 +837,10 @@ class _MetricsAccessor(_BaseAccessor["CrossValidationReport"], DirNamesMixin):
... response_method="predict",
... metric_name="MAE",
... )
- Ridge
- Split #0 Split #1
+ Ridge
+ mean std
Metric
- MAE 50.1... 52.6...
+ MAE 51.4... 1.7...
"""
# create a scorer with `greater_is_better=True` to not alter the output of
# `metric_function` They are mainly documentation changes. NB: I corrected the typing from |
Hey, seems like I've forgotten to push a previous commit where I also changed docstrings, sorry. To be sure, don't you want to emphasize the
|
@thomasloux Indeed, it was the default (and this no need to specify it) before and I forgot to add it explicitly:
|
Coverage Report for backend
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM. Thanks @thomasloux
I just pushed a small fix in your branch to fix the conflict to be able to run the tests. It looks good to me otherwise. @thomasloux Next time, you might want to make a branch instead of submitting from your |
Thanks @thomasloux. The CIs is happy so I am. Nice job. |
Following the corresponding issue, I change the default behaviour of CrossValidationReport to aggregate fold metrics as mean and std, rather than report metrics for all folds.
I choose to use Ellipsis as default value to make the difference with
aggregate=None
that outputs metrics for all folds.I've also updated the tests, most are just column names, 1 is actual values, 2 behaviours in case of interruption (you may want to check these ones, as I am less familiar with the expected behaviour of the object is this setting).
Docstring still needs to be changed.