Reproducibility bug/feature ? #31

w1ll1a9m · 2024-12-19T12:03:57Z

Describe the bug
Predicting an array of identical instances produces different predictions

To Reproduce
Steps to reproduce the behavior:

run this from the basic example:

from pgbm.sklearn import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import numpy as np

X, y = fetch_california_housing(return_X_y=True)
# Train pgbm
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
x_test_single = X_test[0:1,:]
x_test_dup = np.tile(x_test_single, (10, 1))

# Train on set 
model = HistGradientBoostingRegressor(random_state=0)
model.fit(X_train, y_train)
#Point and probabilistic predictions. By default, 1 probabilistic estimates is created, so we create 100
yhat_point, yhat_point_std = model.predict(x_test_dup, return_std=True)
yhat_dist = model.sample(yhat_point, yhat_point_std, n_estimates=1000, random_state=1)

In this case I create the x_test_dup array which has the same instance duplicated 10 times
yhat_point and y_hat_point_std are then arrays of the same value 10 times
however, yhat_dist is an array of 10 different samples.

TLDR: the samples are different for an array with the same yhat_point, yhat_dist

I believe this is happening because the seed is being altered every time a sample is taken in:

/pgbm/sklearn/distributions

for j in prange(n_samples):
        np.random.seed(seed + j)

why would we want to have this feature ?

Expected behavior
I would expect to have the same predictions for instances with the same input feature set in an array.

Additional context
I am using the latest PGBM version on python 3.11

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility bug/feature ? #31

Reproducibility bug/feature ? #31

w1ll1a9m commented Dec 19, 2024 •

edited

Loading

Reproducibility bug/feature ? #31

Reproducibility bug/feature ? #31

Comments

w1ll1a9m commented Dec 19, 2024 • edited Loading

w1ll1a9m commented Dec 19, 2024 •

edited

Loading