You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Predicting an array of identical instances produces different predictions
To Reproduce
Steps to reproduce the behavior:
run this from the basic example:
from pgbm.sklearn import HistGradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
import numpy as np
X, y = fetch_california_housing(return_X_y=True)
# Train pgbm
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=0)
x_test_single = X_test[0:1,:]
x_test_dup = np.tile(x_test_single, (10, 1))
# Train on set
model = HistGradientBoostingRegressor(random_state=0)
model.fit(X_train, y_train)
#Point and probabilistic predictions. By default, 1 probabilistic estimates is created, so we create 100
yhat_point, yhat_point_std = model.predict(x_test_dup, return_std=True)
yhat_dist = model.sample(yhat_point, yhat_point_std, n_estimates=1000, random_state=1)
In this case I create the x_test_dup array which has the same instance duplicated 10 times
yhat_point and y_hat_point_std are then arrays of the same value 10 times
however, yhat_dist is an array of 10 different samples.
TLDR: the samples are different for an array with the same yhat_point, yhat_dist
I believe this is happening because the seed is being altered every time a sample is taken in:
/pgbm/sklearn/distributions
for j in prange(n_samples):
np.random.seed(seed + j)
why would we want to have this feature ?
Expected behavior
I would expect to have the same predictions for instances with the same input feature set in an array.
Additional context
I am using the latest PGBM version on python 3.11
The text was updated successfully, but these errors were encountered:
Describe the bug
Predicting an array of identical instances produces different predictions
To Reproduce
Steps to reproduce the behavior:
run this from the basic example:
In this case I create the x_test_dup array which has the same instance duplicated 10 times
yhat_point and y_hat_point_std are then arrays of the same value 10 times
however, yhat_dist is an array of 10 different samples.
TLDR: the samples are different for an array with the same yhat_point, yhat_dist
I believe this is happening because the seed is being altered every time a sample is taken in:
/pgbm/sklearn/distributions
why would we want to have this feature ?
Expected behavior
I would expect to have the same predictions for instances with the same input feature set in an array.
Additional context
I am using the latest PGBM version on python 3.11
The text was updated successfully, but these errors were encountered: