Debugging class imbalance #3142

Overload119 · 2023-02-25T05:23:07Z

Overload119
Feb 25, 2023

I've been investigating ways to improve a model that predicts a binary feature, submitted_proposal.

Currently my loss is ~0.07, which I think I want around <0.02 based on what I've read here.

Here is my dataset as well as my config, although I've tweak the config quite a bit throughout the course of this.

Config: https://drive.google.com/file/d/1irPEMR7HFILD3G2GhblCF_Rw5PGKEIsR/view?usp=sharing
Dataset: https://drive.google.com/file/d/1TFvxOdryjqNZ9I0l-7ehhABGT1BL0zrS/view?usp=sharing

Attempt #1: Decrease the threshold

I've modified my yml file as follows.

output_features:
  - name: submitted_proposal
    type: binary
    decoder:
      threshold: 0.01

After running train I'm not sure if this is actually doing anything based on the output of output_features

 'output_features': [   {   'calibration': False,
                               'column': 'submitted_proposal',
                               'decoder': {   'bias_initializer': 'zeros',
                                              'fc_activation': 'relu',
                                              'fc_bias_initializer': 'zeros',
                                              'fc_dropout': 0.0,
                                              'fc_layers': None,
                                              'fc_norm': None,
                                              'fc_norm_params': None,
                                              'fc_output_size': 256,
                                              'fc_use_bias': True,
                                              'fc_weights_initializer': 'xavier_uniform',
                                              'input_size': None,
                                              'num_fc_layers': 0,
                                              'type': 'regressor',
                                              'use_bias': True,
                                              'weights_initializer': 'xavier_uniform'},
                               'dependencies': [],
                               'input_size': None,
                               'loss': {   'confidence_penalty': 0.0,
                                           'positive_class_weight': 1,
                                           'robust_lambda': 0,
                                           'type': 'binary_weighted_cross_entropy',
                                           'weight': 1.0},
                               'name': 'submitted_proposal',
                               'num_classes': None,
                               'preprocessing': {   'computed_fill_value': None,
                                                    'fallback_true_label': None,
                                                    'fill_value': None,
                                                    'missing_value_strategy': 'drop_row'},
                               'proc_column': 'submitted_proposal_mZFLky',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'threshold': 0.5, <---- Threshold is 0.5 here, not in decoder dict and is the 0.5 despite being changed
                               'type': 'binary'}],

I've also tried changing the top level key. It's possible the threshold in the output from the train command is different than what it's using.

Attempt #2: Increase positive_class_weight

The default positive_class_weight is 1. I've tried changing it to 2 and 10.

output_features:
  - name: submitted_proposal
    type: binary
    loss:
      type: binary_weighted_cross_entropy
      positive_class_weight: 10

This increases loss to 0.11-0.33.

Attempt #3: Increase confidence_penalty

output_features:
  - name: submitted_proposal
    type: binary
    loss:
      type: binary_weighted_cross_entropy
      positive_class_weight: 1
      confidence_penalty: 0.01

I wasn't able to get this to work - I get the following error for any value I put in confidence_penalty

Training:   0% 0/2700 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/local/bin/ludwig", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/ludwig/cli.py", line 172, in main
    CLI()
  File "/usr/local/lib/python3.8/dist-packages/ludwig/cli.py", line 67, in __init__
    getattr(self, args.command)()
  File "/usr/local/lib/python3.8/dist-packages/ludwig/cli.py", line 72, in train
    train.cli(sys.argv[2:])
  File "/usr/local/lib/python3.8/dist-packages/ludwig/train.py", line 395, in cli
    train_cli(**vars(args))
  File "/usr/local/lib/python3.8/dist-packages/ludwig/train.py", line 185, in train_cli
    model.train(
  File "/usr/local/lib/python3.8/dist-packages/ludwig/api.py", line 557, in train
    train_stats = trainer.train(
  File "/usr/local/lib/python3.8/dist-packages/ludwig/trainers/trainer.py", line 825, in train
    should_break = self._train_loop(
  File "/usr/local/lib/python3.8/dist-packages/ludwig/trainers/trainer.py", line 962, in _train_loop
    loss, all_losses = self.train_step(
  File "/usr/local/lib/python3.8/dist-packages/ludwig/trainers/trainer.py", line 220, in train_step
    loss, all_losses = self.model.train_loss(
  File "/usr/local/lib/python3.8/dist-packages/ludwig/models/base.py", line 204, in train_loss
    of_train_loss = of_obj.train_loss(targets[of_name], predictions, of_name)
  File "/usr/local/lib/python3.8/dist-packages/ludwig/features/base_feature.py", line 268, in train_loss
    return self.train_loss_function(predictions[prediction_key], targets)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/ludwig/modules/loss_modules.py", line 174, in forward
    mean_penalty = utils.mean_confidence_penalty(probabilities, 2)
  File "/usr/local/lib/python3.8/dist-packages/ludwig/utils/loss_utils.py", line 20, in mean_confidence_penalty
    entropy_per_class = torch.maximum(-probabilities * torch.log(torch.clamp(probabilities, 1e-10, 1)), 0)
TypeError: maximum(): argument 'other' (position 2) must be Tensor, not int

Any other idea to improve this model that are beyond feature engineering (which is definitely something we're looking into).

Answered by arnavgarg1

Feb 27, 2023

Hi @Overload119! Thanks for explaining the steps you took so clearly - it was useful and made it easy to follow along. I had a chance to use the same dataset and config and run a variety of tests and wanted to share those results with you.

The snapshot of the dataset you added to Google Drive is balanced, with almost an exact 50-50 split between 1 and 0s. In case you balanced these datasets out manually by dropping rows from the majority class so that the majority and minority classes were equal, it may be an unfair representation of the true dataset (and what you will actually see in a production scenario when you're running inference against your trained model). Instead, I would sugge…

View full answer

arnavgarg1 · 2023-02-27T23:32:07Z

arnavgarg1
Feb 27, 2023
Collaborator

Hi @Overload119! Thanks for explaining the steps you took so clearly - it was useful and made it easy to follow along. I had a chance to use the same dataset and config and run a variety of tests and wanted to share those results with you.

The snapshot of the dataset you added to Google Drive is balanced, with almost an exact 50-50 split between 1 and 0s. In case you balanced these datasets out manually by dropping rows from the majority class so that the majority and minority classes were equal, it may be an unfair representation of the true dataset (and what you will actually see in a production scenario when you're running inference against your trained model). Instead, I would suggest using the full dataset and setting oversample_minority in the preprocessing section of your Ludwig config to correct the imbalance (see this). This will only oversample the training data, but leave the validation and test sets as is which is a better reflection of what true model performance might be in production.
Using the dataset and config you provided, I was able to get the following metrics on the test set:

    - Loss: 0.0834
    - Accuracy: 0.9777
    - Precision: 0.9953
    - Recall: 0.9588
    - ROC AUC: 0.9938
    - Specificity: 0.9957

These are really good performance metrics for a classification task.

To also support this claim, here are some plots I created:

Confusion Matrix

ROC Curve

Precision-Recall Curve

All of these plots show that the model is extremely good at discriminating between the two classes (and you can plot all three of them as well using Ludwig 0.7)

As a follow-up of point 2 above, I think the medium article you linked is a good frame of reference generally for any machine learning model, but this is not a rule of thumb. This is because it is a function of the features you have, the size of your dataset, and generally, the difficulty of the machine learning task you're about to perform. The reason I added all of these metrics is to say that it often helps to look at other metrics to form a more complete picture of how our model is performing, and in your case, it is doing really well - these are very good precision-recall curves and ROC Curves, as well as a very telling confusion matrix.
In either case, I decided to see if I could reduce the loss even further than what you were getting. To do this, I looked at some of the learning curves (can also be found in Ludwig) and plotted the loss over each epoch of training for each of the 3 sets (train, validation and test). I could see that your model had already overfit a few epochs in, but I did see that the validation and test set performance also got better with additional training beyond that point. This motivated me to try and increase the complexity of the model slightly, and also make the model train for longer. I changed up your config to look something like this:

input_features:
  - name: areas_of_expertise
    type: set
    column: areas_of_expertise
  - name: techniques
    type: set
    column: techniques
  - name: discipline
    type: set
    column: discipline
  - name: rfp_techniques
    type: set
    column: rfp_techniques
  - name: rfp_disciplines
    type: set
    column: rfp_disciplines
  - name: rfp_areas_of_expertise
    type: set
    column: rfp_areas_of_expertise
  - name: publications
    type: vector
    column: publications
  - name: proposals
    type: vector
    column: proposals
output_features:
  - name: submitted_proposal
    type: binary
    column: submitted_proposal
combiner:
  type: transformer
trainer:
  early_stop: 10
defaults:
  vector:
    preprocessing:
      vector_size: 1536
      fill_value: 
        "0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"

The main things to note here are swapping out the default combiner (concat) with a transformer combiner, and increasing early_stop in the trainer from the default value of 5 to a 10. Both of these achieve the goals I laid out above. After training, these are the performance metrics I get on the test set:

- Loss: 0.0481
- Accuracy: 0.9829
- Precision: 0.9817
- Recall: 0.9832
- ROC AUC: 0.9972
- Specificity: 0.9826

This is marginally better than just using the defaults that come out of Ludwig, but I think any further improvements would be going down the road of diminishing returns because the model has already overfit and the performance at the point of overfitting is already very very good.

Let me know if this helps and if there are other questions you might have.

1 reply

tgaddair Feb 27, 2023
Maintainer

In addition to @arnavgarg1's very thorough answer, a couple comments on some of the things you tried:

The previous docs might have been incorrect about the placement of the threshold param. In Ludwig v0.7 (latest) this should be specified at the same level as encoder, not within the encoder section. See our docs updated earlier today (which are now generated directly from the schema, so less likely to have these kinds of errors): https://ludwig.ai/latest/configuration/features/binary_features/#output-features
Regarding the confidence penalty issue, this does indeed look like a bug, and should be addressed by Fixed confidence_penalty for newer versions of pytorch #3156. Thanks for reporting!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging class imbalance #3142

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Debugging class imbalance #3142

Overload119 Feb 25, 2023

Attempt #1: Decrease the threshold

Attempt #2: Increase positive_class_weight

Attempt #3: Increase confidence_penalty

Replies: 1 comment · 1 reply

arnavgarg1 Feb 27, 2023 Collaborator

tgaddair Feb 27, 2023 Maintainer

Overload119
Feb 25, 2023

Replies: 1 comment 1 reply

arnavgarg1
Feb 27, 2023
Collaborator

tgaddair Feb 27, 2023
Maintainer