-
Notifications
You must be signed in to change notification settings - Fork 23
Description
Describe the bug
Predicting an array of identical instances produces different predictions
To Reproduce
Steps to reproduce the behavior:
run this from the basic example:
from pgbm.sklearn import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import numpy as np
X, y = fetch_california_housing(return_X_y=True)
# Train pgbm
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
x_test_single = X_test[0:1,:]
x_test_dup = np.tile(x_test_single, (10, 1))
# Train on set
model = HistGradientBoostingRegressor(random_state=0)
model.fit(X_train, y_train)
#Point and probabilistic predictions. By default, 1 probabilistic estimates is created, so we create 100
yhat_point, yhat_point_std = model.predict(x_test_dup, return_std=True)
yhat_dist = model.sample(yhat_point, yhat_point_std, n_estimates=1000, random_state=1)
In this case I create the x_test_dup array which has the same instance duplicated 10 times
yhat_point and y_hat_point_std are then arrays of the same value 10 times
however, yhat_dist is an array of 10 different samples.
TLDR: the samples are different for an array with the same yhat_point, yhat_dist
I believe this is happening because the seed is being altered every time a sample is taken in:
/pgbm/sklearn/distributions
for j in prange(n_samples):
np.random.seed(seed + j)
why would we want to have this feature ?
Expected behavior
I would expect to have the same predictions for instances with the same input feature set in an array.
Additional context
I am using the latest PGBM version on python 3.11