Skip to content

Handle undersampling due to lots of images without burned areas #12

@weiji14

Description

@weiji14

The extra Sentinel-2 imagery dataset provided in https://huggingface.co/datasets/chabud-team/chabud-extra does not contain any burned areas according to https://huggingface.co/datasets/chabud-team/chabud-extra/discussions/1. If we include these datasets in the training, there will be a severe imbalance in the ratio of burned area to unburned area pixels.

Some potential ways to handle the extra data to improve model performance:

  • Loss functions that handle foreground/background classes properly
    • Focal Loss
    • Dice Loss
  • Self-supervised pre-training
    • Develop pretext tasks that make use of the extra data, generate useful embeddings on all the given data, and then fine-tune on images with burned areas only

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions