perf_psi fails because of score binning and missing samples #117

Migalvao · 2025-01-21T15:09:40Z

When calling function perf_psi() with show_plot=True, i got this error:

IndexError: single positional indexer is out-of-bounds
on distr_prob.distr.iloc[:,1] in line 532:

       530 # ax1
       531 p1 = ax1.bar(ind, distr_prob.distr.iloc[:,0], width, color=(24/254, 192/254, 196/254), alpha=0.6)
-->    532 p2 = ax1.bar(ind+width, distr_prob.distr.iloc[:,1], width, color=(246/254, 115/254, 109/254), alpha=0.6)
       533 # ax2
       534 p3 = ax2.plot(ind+width/2, distr_prob.badprob.iloc[:,0], color=(24/254, 192/254, 196/254))

and right before that a warning regarding a division by zero (RuntimeWarning: divide by zero encountered in log).

Looking at the binning created by this function, by printing the return value with return_distr_dat=True:

{'psi':   
    variable  PSI
0    score  inf, 
'pic': {}, 
'dat': {'score':          
         bin       N           badprob          
ae               test   train      test     train
0   [300,350)     NaN     1.0       NaN  0.000000
1   [350,400)  6257.0  6216.0  0.560332  0.563224
2   [400,500)  2733.0  2775.0  0.361873  0.358559
}}

We see that the first bin shows only one sample for the training set and no samples for the test set, hence, probably, the error, since in the following part of the code we see that when pivoting the table in line 518, if there are no samples for a certain bin in one of the sets, there is no row to pivot.

511        distr_prob = dat.groupby(['ae', 'bin'])\
512          ['y'].agg([good, bad])\
513          .assign(N=lambda x: x.good+x.bad,
514            badprob=lambda x: x.bad/(x.good+x.bad)
515          ).reset_index()
516        distr_prob.loc[:,'distr'] = distr_prob.groupby('ae')['N'].transform(lambda x:x/sum(x))
517        # pivot table
518        distr_prob = distr_prob.pivot_table(values=['N','badprob', 'distr'], index='bin', columns='ae')

Therefore, there will only be one column, leading to the indexer being out-of-bounds.

In case I got it correctly, I suggest either enforcing an empty record in case there is a bin with no samples, adjusting the bins so that there are always samples or even perhaps allowing for custom bins to be used.

Otherwise, I would appreciate your support in this matter. Thanks you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf_psi fails because of score binning and missing samples #117

perf_psi fails because of score binning and missing samples #117

Migalvao commented Jan 21, 2025

perf_psi fails because of score binning and missing samples #117

perf_psi fails because of score binning and missing samples #117

Comments

Migalvao commented Jan 21, 2025