Probmetrics: Classification metrics and post-hoc calibration

This package (PyTorch-based) currently contains

classification metrics, especially also metrics for assessing the quality of probabilistic predictions, and
post-hoc calibration methods, especially a fast and accurate implementation of temperature scaling.

It accompanies our paper Rethinking Early Stopping: Refine, Then Calibrate. Please cite our paper if you use this repository for research purposes. The experiments from the paper can be found here: vision, tabular, theory.

Installation

Probmetrics is available via

pip install probmetrics

To obtain all functionality, install probmetrics[extra,dev,dirichletcal].

extra installs more packages for smooth ECE, Venn-Abers calibration, centered isotonic regression, the temperature scaling implementation in NetCal.
dev installs more packages for development (esp. documentation)
dirichletcal installs Dirichlet calibration, which however only works for Python 3.12 upwards.

Using temperature scaling

We provide a highly efficient implementation of temperature scaling that, unlike some other implementations, does not suffer from optimization issues.

Numpy interface

from probmetrics.calibrators import get_calibrator
import numpy as np

probas = np.asarray([[0.1, 0.9]])  # shape = (n_samples, n_classes)
labels = np.asarray([1])  # shape = (n_samples,)
# this is the version with Laplace smoothing, 
# use calibrate_with_mixture=False (the default) for no Laplace smoothing
calib = get_calibrator('temp-scaling', calibrate_with_mixture=True)
# other option: calib = MixtureCalibrator(TemperatureScalingCalibrator())
# there is also a fit_torch / predict_proba_torch interface
calib.fit(probas, labels)
calibrated_probas = calib.predict_proba(probas)

PyTorch interface

The PyTorch version can be used directly with GPU tensors, but this can actually be slower than CPU for smaller validation sets (around 1K-10K samples).

from probmetrics.distributions import CategoricalProbs
from probmetrics.calibrators import get_calibrator
import torch

probas = torch.as_tensor([[0.1, 0.9]])
labels = torch.as_tensor([1])

# temp-scaling with Laplace smoothing
calib = get_calibrator('ts-mix')

# if you have logits, you can use CategoricalLogits instead
calib.fit_torch(CategoricalProbs(probas), labels)
calib.predict_proba_torch(CategoricalProbs(probas))

Using our refinement and calibration metrics

We provide estimators for refinement error (loss after post-hoc calibration) and calibration error (loss improvement through post-hoc calibration). They can be used as follows:

import torch
from probmetrics.metrics import Metrics

# compute multiple metrics at once 
# this is more efficient than computing them individually
metrics = Metrics.from_names(['logloss', 
                              'refinement_logloss_ts-mix_all', 
                              'calib-err_logloss_ts-mix_all'])
y_true = torch.tensor(...)
y_logits = torch.tensor(...)
results = metrics.compute_all_from_labels_logits(y_true, y_logits)
print(results['refinement_logloss_ts-mix_all'].item())

Using more metrics

In general, while some metrics can be flexibly configured using the corresponding classes, many metrics are available through their name. Here are some relevant classification metrics:

from probmetrics.metrics import Metrics

metrics = Metrics.from_names([
    'logloss',
    'brier',  # for binary, this is 2x the brier from sklearn
    'accuracy', 'class-error',
    'auroc-ovr', # one-vs-rest
    'auroc-ovo-sklearn', # one-vs-one (can be slow!)
    # calibration metrics
    'ece-15', 'rmsce-15', 'mce-15', 'smece'
    'refinement_logloss_ts-mix_all', 
    'calib-err_logloss_ts-mix_all',
    'refinement_brier_ts-mix_all', 
    'calib-err_brier_ts-mix_all'
])

The following function returns a list of all metric names:

from probmetrics.metrics import Metrics, MetricType
Metrics.get_available_names(metric_type=MetricType.CLASS)

While there are some classes for regression metrics, they are not implemented.

Releases

v0.0.2:
- Removed numpy<2.0 constraint,
- allow 1D vectors in CategoricalLogits / CategoricalProbs
- add TorchCal temperature scaling
- minor fixes in AutoGluon temperature scaling that shouldn't affect the performance in practice
v0.0.1: Initial release

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
probmetrics		probmetrics
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Probmetrics: Classification metrics and post-hoc calibration

Installation

Using temperature scaling

Numpy interface

PyTorch interface

Using our refinement and calibration metrics

Using more metrics

Releases

About

Uh oh!

Releases

Packages

Languages

License

dholzmueller/probmetrics

Folders and files

Latest commit

History

Repository files navigation

Probmetrics: Classification metrics and post-hoc calibration

Installation

Using temperature scaling

Numpy interface

PyTorch interface

Using our refinement and calibration metrics

Using more metrics

Releases

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages