description |
---|
Frecuently used techniques for performance evaluation |
Gini
- Definition: Indicates how discriminative is the model (predictive power)
- Possible values:
- 0 would indicate no discrimination based features to make a choice.
- 1 would indicate the model is completely relying on features as discriminators to make a choice (desirable for banks for example)
KS statistic
- Definition: Indicates the maximum distance between distribution functions (classes samples in supervised learning) of two samples (empirical, or one reference).
- Possible values:
- 0 would indicate no distinction on the two samples (no difference in label A-label B scores).
- 1 would indicate Maximum distance between the two samples.
Student's t-test
- Definition: Indicates how likely a set of samples came from the exact same distribution (P-value). P-Value can be compared with a threshold call statistical significance (e.g. .05).
- Possible values:
- P-Value < 0.05, indicate we can reject the null hypothesis that the two samples are coming from the exact same distribution.
- Comment: Samples must be shaped in a normal distribution.
K fold cross validation:
- Definition: Indicates the variation of performance by sampling different same sized subsets of the data of K fold and training with the remaining of the subsets.
- Possible values:
- K between 5 and 10 folds.
- Comment: An stratified version of it is recommended given that it tries to hold the same proportions of the whole dataset.
Cross entropy vs sparse cross entropy
If your Y_i
's are one-hot encoded, use categorical_crossentropy. Examples (for a 3-class classification): [1,0,0] , [0,1,0], [0,0,1]
But if your Y_i
's are integers, use sparse_categorical_crossentropy. Examples for above 3-class classification problem: [1] , [2], [3]. Might be more memory efficient