Skip to content

Latest commit

 

History

History

Readme.md

Walkthrough of the Config Jungle

In this repository Hydra is used for configuring and managing experiments. Therefore, configuration files and their handling are of major importance, which is why they are explained in more detail below. First, the basic functionality of Hydra will be briefly explained. At first glance, the use of Hydra may make the configuration more complicated and confusing, but this will quickly disappear if you familiarize yourself with it a bit. The advantage that Hydra provides is the ease of managing experiments and to easily add new models or datasets (and more) without changing the base code. Since Hydra uses the OmegaConf package to handle .yaml files, Omegaconf and YAML are also briefly introduced.

Table of Contents

Basics

Hydra automatically loads and composes different configuration files and allows to dynamically override values at runtime via the command line. In Hydra, .yaml files are used to set configurations. In this repository the config/training.yaml can be seen as the main file from which other configurations are composed (for training). Each subfolder in config/ is a config group, which contains a separate config file for each alternative inside. For example the config group model is located in the config/model subfolder with a separate * .yaml* file for each available model (hrnet.yaml, hrnet_ocr.yaml, ...). The individual config files contain model/dataset/etc. specific parameters, such as the number of channels in a layer of the model or the number of classes in a dataset. Having a separate config file for each model/dataset/etc. makes it easy to switch between them and arbitrarily combine different config files from different config groups. Additionally, this ensures that only the relevant parameters are loaded into the job configuration. Hydra creates the job configuration by composing the configuration files from the different configuration groups. Basically, exactly one config file from each config group is used in this process (as an exception, a config group can be declared as optional, this will then only be used if it is explicitly defined). To tell hydra how to compose the job configuration, a default list is used, which specifies which configuration file from which configuration group should be used and in which order they are composed. The default list in this repository is defined in config/training.yaml and looks like this:

training.yaml
  ─────────────────────────────
defaults:
  - _self_
  - trainer: SemSeg             # Which Trainer to use
  - metric: mean_IoU            # Metric configuration
  - model: hrnet                # Model
  - dataset: Cityscapes         # Dataset
  - data_augmentation: only_norm  # Data Augmentation
  - optimizer: SGD              # Optimizer
  - lr_scheduler: polynomial    # Learning rate scheduler
  - callbacks: default          # Callbacks
  - logger: tensorboard         # Logger
  - experiment/default          # Load Default Setting and Hyperparameters
  - optional experiment:        # (Optional) load another experiment configuratioj
  - environment: local          # Environment

The configs of each config group are merged from top to bottom, where later groups can overwrite the parameters of earlier groups. In addition to the order, the default list also sets default values for the configuration groups. This means if not changed, the parameters defined in experiment/default.yaml,..., datasets/Cityscapes.yaml and model/hrnet.yaml are used in this case. To change the used config file of a config group, the corresponding entry in the default list can be changed in the training.yaml, or the entry can be overwritten from the commandline. Hydra's commandline syntax is straight forward and elements can be changed, added or removed in the following ways. Thereby this syntax is the same for single parameters like batch_size as well as for config files from config groups like model. All available options to change for parameters and config groups are shown below in the Configure the Configuration part.

python training.py  parameter_to_change=<new_value>  +parameter_to_add=<a_value>  ~parameter_to_delete
#Example for single parameters
python training.py  batch_size=3 +extra_lr=0.001 ~momentum
#Example for config groups
python training.py  model=hrnet_ocr +parameter_group=default ~environment   

Another important concept of Hydra is the ability to instantiate objects. This enables to fully define classes in the config files and then instantiate them in the code. An example for both is shown below. The reason for doing this is that it is possible to add new optimizers, models, datasets etc. from the config without having to change the base code. This makes this repository easy to change and flexible to extend without having to search yourself through the implementation. For example, to use or define another optimizer in the example below, only the optimizer entry in the example.yaml has to be changed.

example.yaml
  ─────────────────────────────
# Generall syntax
name:
  _target_: path.to.class
  arg1:     some_argument
  arg2:     ...
# Example for defining a torch-optimizer
optimizer:
  _target_:     torch.optim.SGD
  lr:           0.01
  momentum:     0.9
  weight_decay: 0.005
# Another example for defining a custom class object
metric:
  _target_:    src.metric.ConfusionMatrix
  num_classes: 24
example.py
─────────────────────────────
my_optimizer = hydra.utils.instantiate(cfg.optimizer)
my_metric = hydra.utils.instantiate(cfg.metric)

This was only a short introduction how to use hydra to work with this repository. For more information on Hydra, check out the official docs or one of the following sources, which provide some nice insights into Hydra (source1, source2, source3 and source4).

OmegaConf in a Nutshell

Click to expand/collapse

Hydra uses the package OmegaConf to handle * .yaml* files. OnegaConf gives a lot of possibilities to work with .yaml files, but since hydra manages this for you in the background you do not need much of it for a basic use. If you need further functionality, for example if you manually want to load or save files look at the official OmegaConf docs. The Access and Manipulation of the cfg in python is straight forward:

example.yaml
─────────────────────────────
Parameters:
  lr:     0.01
  epochs: 100
  whatever:
    - 42
    - ...
main.py
─────────────────────────────
from omegaconf import OmegaConf

...
# For the example load the cfg manually, which is normally done automatically by hydra
cfg = OmegaConf.load("example.yaml")

# Access over object and dictionary style
lr = cfg.Parameters.lr
lr = cfg["Parameters"]["lr"]

# Manipulation in the same way
cfg.Parameters.epochs = 300
cfg["Parameters"]["epochs"] = 300

# The same goes for accessing lists
x = cfg.Parameters.whatever[0]

Variable interpolation is another important concept of Hydra and Omegaconf. When defining config files the situation will occur that variables from other config files are needed. For example for defining the last layer of a model, the number of classes, which is defined in the specific dataset configs, may be needed. Therefore, variable interpolation is used, which can be seen as a link to a position in the config, that is resolved at runtime. Therefore, the variable is resolved from the dataset which used the current job and no conflicts occur between different dataset configs and the model config. Variable interpolation is done with the following syntax:${path.to.another.node.in.the.config}. and in that case the value will be the value of that node.

dataset/a_dataset.yaml
─────────────────────────────
  #@package _global_
...
dataset:
  num_classes: 24
model/a_model.yaml
─────────────────────────────
  #@package _global_
...
num_output_classes: ${dataset.number_classes} # num_output_classes will have the value 24 at runtime

YAML in a Nutshell

Click to expand/collapse

This is only a short introduction to YAML and only shows its basic syntax. This should be enough for defining your own yaml files, but if you need more information they can be found here for example. The following examples are for Yaml in combination with OmegaConf and may not work for yaml alone.

Some Basic Assignments are shown here:

example.yaml
─────────────────────────────
# Comments in yaml
number: 10                   # Simple value, works for int and float.
string: Text|"Text"          # Strings, Quotation marks are not necessarily required.
empty: None| |Empty|Null
explicit_Type: !!float 1     # Explicitly defined type. works as well for other types like str etc.
missing_vale: ???            # Missing required value. The  has to be given e.g. from the commandline.
optional opt_value:          # Optional Value. Can be empty or ???, and will only be considered if it has a value.
value2: ${number}            # Value interpolation (takes the value of attribute number, in this 
                             # case 10). $ indicates reference and {} is required.
value3: "myvalue ${number}"  # String interpolation, same as value interpolation just with string output.
booleans: on|off|yes|no|true|false|True|False|TRUE|FALSE    #multiple possibilities to define boolean values.

List are defined in the following way:

alist:
  - elem1                      # Elements need to be on the same indentation level
  - elem2                      # There needs to be a space between dash and element
  - ...
samelist: [ elem1, elem2, ... ]               # The same list can also be defined with this syntax

val_interpolation: ${alist[0]}                # Get the value of alist at position 0

Dictionaries are defined in the following way:

adict:
  key1: val1                    # Keys must be indented
  key2: val2                    # There has to be a space between colon and value
  ...                           # Each key may occur at most once
samedict: { key1: val1, key2: val2, ... }     # The same dict can also be defined with this syntax

val_interpolation: ${adict.key1}              # Get the value of adict at key1

For more complex files you will end up with lists of dictionaries and dictionaries of list and mixtures of both. But basically that's it!

Configure the Configuration

In the following, each configuration group and some other features are explained in detail. First, the provided functionality is explained and afterwards it is described how this can be customized, for example to add a new model or a new dataset to the framework.

Model

Configure

Currently, the following models are supported, by default hrnet is used. How to select a model and the used pretrained weights is explained here.

  • hrnet: High-Resolution Network (HRNet). Segmentation model with a single output.
  • hrnet_ocr: Object-Contextual Representations (OCR). A HRNet backbone with an OCR head. The model has two outputs, a primary and an auxiliary one.
  • hrnet_ocr_aspp: Additionally including an ASPP module into the ORC model. Again the model has two outputs.
  • hrnet_ocr_ms: Hierarchical Multiscale Attention Network. Extends ORC with multiscale and attention. The model has 4 outputs: primary, auxiliary, high_scale_prediction, low_scale_prediction
    • MODEL.MSCALE_INFERENCE is used to enable/disable the use of multiple scales (only during inference and validation), which is False by default.
    • MODEL.N_SCALES defines the scales which are used during MSCALE_INFERENCE, by default = [0.5, 1.0, 2.0]
  • FCN: including torchvision's FCN (docs, paper). Besides the arguments described in the torchvision docs you can specify the following arguments:
    • model.backbone can be resnet50 or resnet101, to define which version of the model should be used. resnet101 by default.
  • DeepLab: including torchvision's DeepLabv3 (docs, paper). Besides the arguments described in the torchvision docs you can specify the following arguments:
    • model.backbone can be resnet50 or resnet101, to define which version of the model should be used. resnet101 by default.
  • UNet: Implementation of UNet (paper , source code). No pretrained weights are available

Customize

Defining a custom model is done in two steps, first defining your custom pytorch model and afterwards setting up its config file.

  1. Defining your Pytorch Model, thereby the following thinks have to be considered:

    • Model Input: The input of the model will be a torch.Tensor of shape [batch_size, channels, height, width]).
    • Model Output: It is recommended that your model return a dict which contain all the models outputs. The naming can be arbitrary but the ordering matters. For example if you have one output return as follows: return {"out": model_prediction}. If you have multiple output to it analogues: return {"main": model_prediction, "aux": aux_out}. The output of the model can also be a single Tensor, a list or a tuple, but in this case the output is converted into dict automatically. It should be noted that in each case the order of the outputs is relevant. Only the first output is used for updating the metric during validation or testing. Further the order of the outputs should match the order of your losses in lossfunction and the weights in lossweights.(see Lossfunction for more details on that)
  2. Setting up your model config

    • Create a custom_model.yaml file in config/model/. Therby the name of the file defines how the model can be select over hydras commandline syntax. For the content of the .yaml file adopt the following dummy. Node that MODEL.NAME is required.
#@package _global_
# model is used to initialize your custom model, 
# _target_: should point to your model class or a getter function which returns your model
# afterwards you can handle your custom input arguments which are used to initialize the model
model:
   _target_: models.my_model.get_model     # if you want to use a getter function to load weights 
                                           # or initialize you model
   #_target_: models.my_model.Model        # if you want to load the Model directly
   num_classes:  ${DATASET.NUM_CLASSES}    # example arguments, for example the number of classes
   pretrained: ${MODEL.PRETRAINED}         # of if pretrained weights should be used
   arg1: ...
# model is used to store information which are needed for your model
MODEL:
  # Required model arguments
  NAME: MyModel            # Name of the model is needed for logging
  # Your arguments, for example somethinnk like that
  PRETRAINED: True         # e.g. a parameter to indicate if pretrained weights should be used 
  PRETRAINED_WEIGHTS: /pretrained/weights.pth  # give the path to the weights      
  1. Train your model
     python training.py model=custom_model     # to select config/model/custom_model.yaml

Dataset

Configure

Currently, the following datasets are supported, and they can be selected as shown here. By default, the cityscapes dataset is used.

  • Cityscapes: Cityscapes dataset with using fine annotated images. Contains 19 classes and 2.975 training and 500 validation images.
  • Cityscapes_coarse: Cityscapes dataset with using coarse annotated training images. Contains 19 classes and ~20.000 training images. For validation the 500 fine annotated images from Cityscape are used.
  • Cityscapes_fine_coarse: Cityscapes dataset with using coarse and fine annotated training images. Contains 19 classes and ~23.000 training images. For validation the 500 fine annotated images from Cityscape are used.
  • VOC2010_Context: PASCAL Context dataset, which is an extension for the PASCAL VOC2010 dataset and contains additional segmentation masks. It contains 5.105 training and 4.998 validation images. This dataset contains 59 classes. For the 60 class setting see below.
  • VOC2010_Context_60: The VOC2010_Context dataset with an additional background class, resulting in a total of 60 classes.

Customize

Defining a custom dataset is done in two steps, first defining your custom pytorch dataset and afterwards setting up its config file.

  1. Defining your pytorch Dataset, thereby consider that the following structure is required ( mainly pytorch basic) and see the dummy below:

    • _init_(self, custom_args, split, transforms):
      • custom_args: your custom input arguments (for example data_root etc.). They will be given to your dataset from the config file (see below).
      • split: one of the following strings: ["train","val","test"]. To define if train, validation or test set should be returned.
      • transforms: Albumentations transformations
    • _getitem_(self, idx):
      • getting some index and should the output should look similar to: return img, mask
      • with img.shape = [c, height, width] and mask.shape = [height, width], where c is the number of channels. For example c=3 if you use RGB data.
    • __len(self)__:
      • return the number of samples in your dataset, something like: return len(self.img_files)
    class Custom_dataset(torch.utils.data.Dataset):
     def __init__(self,root,split,transforms):
         # get your data for the corresponding split
         if split=="train":
              self.imgs = ...
              self.masks = ...
         if split=="val" or split=="test":       # if you have dont have a test set use the validation set
              self.imgs = ...
              self.masks = ...
         
         # save the transformations
         self.transforms=transforms
    
     def __getitem__(self, idx):
         # reading images and masks as numpy arrays
         img =cv2.imread(self.imgs[idx])
         img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # cv2 reads images in BGR order
    
         mask=cv2.imread(self.masks[idx],-1)
    
         # thats how you apply Albumentations transformations
         transformed = self.transforms(image=img, mask=mask)
         img = transformed['image']
         mask = transformed['mask']
         
         return img, mask.long()
    
     def __len__(self):
         return len(self.imgs)
  2. Setting up your dataset config

    • Create a custom_dataset.yaml file in config/datasets/. For the content of the .yaml file adopt the following dummy:
    #@package _global_
    # dataset is used to initialize your custom dataset, 
    # _target_: should point to your dataset class
    # afterwards you can handle your custom input arguments which are used to initialize the dataset
    dataset:
      _target_: datasets.MyDataset.Custom_dataset 
      root: /home/.../Datasets/my_dataset     #the root to the data as an example input
      #root: ${paths.my_dataset}               #the root if defined in config/environment/used_env.yaml
      input1: ...                    #All your other input arguments
      input2: ...
    # DATASET is used to store information about the dataset which are needed during training
    DATASET:
      # Required dataser arguments
      NAME:            # Used for the logging directory
      NUM_CLASSES:     # Needed for defining the model and the metrics
      # (Optional) but needed if ignore index should be used
      IGNORE_INDEX:    # Needed for the loss function, if no ignore indes set to 255 or another number
                       # which do no occur in your dataset 
      # (Optional) needed if weighted lossfunctions are used
      CLASS_WEIGHTS: [ 0.9, 1.1, ...]                
      # (Optional) can be used for nicer for logging 
      CLASS_LABELS:
         - class1
         - class2 ...
  3. Train on your Dataset

     python training.py dataset=custom_dataset     # to select config/dataset/custom_dataset.yaml

Experiments, Hyperparameters and Pytorch Lightning Trainer

Configure

Experiments

Individual data sets, models, etc. are defined in the corresponding parameter groups, the experiment files are used to define their combination. This avoids the need for manual coniguration via the commandline for frequently performed experiments. Therefore, the standard list is overwritten as follows:

# @package _global_
#define the augmentation, dataset and model
defaults:
  - override /data_augmentation: scale_crop_hflip
  - override /dataset: Cityscapes
  - override /model: hrnet

Hyperparameters

The default hyperparameters are defined in config/experiment/default.yaml. For the specific datasets they are overwriten from config/experiment/<dataset.name>.yaml The following hyperparameters are supported and can be changed in the .yaml-files directly or can be overwritten from the command line as shown below.

  • epochs: number of epochs for training.
  • batch_size: defines the batch size during training (per GPU).
  • val_batch_size: defines the batch size during validation and testing (also per GPU). Is set to batch_size if not specified.
  • num_workers: number of workers for the dataloaders.
  • lr: initial learning rate for training.
python training.py epochs=100 batch_size=7 val_batch_size=3 num_workers=4 lr=0.001

Pytorch Lightning Trainer

Since Pytorch Lightning is used as training framework, with the trainer class as central unit, some additional parameters can be defined by passing them to the Pytorch Lightning Trainer. The pl_trainer entry in the config is used for this purpose. By default, this looks like the following and arguments can be overwritten/added/removed as shown below:

experiment/default.yaml
------------------
pl_trainer:                     # parameters for the pytorch lightning trainer
  max_epochs: ${epochs}         # parsing the number of epochs which is defined as a hyperparameter
  gpus: -1                      # defining the used GPUs - in this case using all available GPUs
  precision: 16                 # using Mixed Precision
  benchmark: True               # using benchmark for faster training
# Overwriting
python training.py pl_trainer.precision=32 pl_trainer.benchmark=false
# Adding
python training.py +fast_dev_run=True +pl_trainer.reload_dataloaders_every_n_epochs=2 
# Removing
python training.py ~pl_trainer.precision 

A full list of all available options of the Pytorch Lightning Trainer class can be seen in the Lightning docs.
Some arguments are defined inside the code and can't be overwritten from the config file. These parameters are not intended to be changed, if you still want to adapt them you can do this in training.py in the training_loop function. The effected parameters are:

  • callbacks: callbacks are defined in config/callbacks, so add your callbacks there
  • logger: tb_logger is used by default
  • strategy: ddp if multiple gpus are used, else None
  • sync_batchnorm: sync_batchnorm is True if multiple gpus are used, else False

Customize

Experiment Configurations and Hyperparameters can be added or changed in .yaml-files or from the commandline. For different experiments, a group of parameters may need to be adjusted at once. To not have to change them manually each time there is an optional experiment config group to easily switch between different experiment settings. Create experiment/my_config.yaml and insert all parameters or settings that differ from the default.yaml into it. A dummy and how this can be used it shown below:

config/experiment/my_config.yaml
─────────────────────────────
# @package _global_
defaults:
  - override /data_augmentation: randaugment_hflip
    
batch_size: 6
val_batch_size: 4
epochs: 175
lossfunction: RMI
...
# Also fo Pytorch Lightning Trainer Arguments
pl_trainer:
  precision: 32
  ...               
python training.py experiment=my_config

Optimizer

Configure

Currently Stochastic Gradient Descent (SGD) and MadGrad are only supported optimizers. Since the pytorch implementation of SGD is used also other parameters of the SGB class, like nesterov, can be passed (similar for Madgrad):

  • weight_decay: default = 0.0005
  • momentum: default = 0.9
python training.py optimizer=SGD weight_decay=0.0001 momentum=0.8 +optimizer.nesterov=True
python training.py optimizer=MADGRAD

Customize

To add a custom optimizer create a my_optimizer.yaml file in config/optimizer/. A dummy and how this can be used it shown below. Besides the arguments which are defined in the config the optimizer will be also initialized with the model parameters in the following way: optimizer=hydra.utils.instantiate(self.config.optimizer,self.parameters())

config/optimizer/my_optimizer.yaml
─────────────────────────────
_target_: path.to.my.optimizer.class      # for example torch.optim.SGD
lr: ${lr}
arg1: custom_args
arg2: ...
python training.py optimizer=my_optimizer

LR Scheduler

Configure

Currentyl the following schedulers are supported and can be used as shown below. By default the polynomial scheduler is used (stepwise):

  • polynomial: Polynomial lr scheduler over the number of steps: (1-current_step/max_step)^0.9
  • polynomial_epoch: Polynomial lr scheduler over number of epochs: (1-current_epoch/max_epoch)^0.9
python training.py lr_scheduler=polynomial
python training.py lr_scheduler=polynomial_epoch

Customize

To add a custom lr_scheduler create a my_scheduler.yaml file in config/lr_scheduler/. A dummy and how this can be used it shown below. Besides the arguments which are defined in the config, the lr scheduler will be also initialized with the optimizer, in the following way: scheduler=hydra.utils.instantiate(self.config.lr_scheduler.scheduler, optimizer=self.optimizer, max_steps=max_steps) As you can see also the maximum number of steps is given to the scheduler since this can only be calculated during runtime. Even if you do not want to use this information make sure to catch the input argument.

config/lr_scheduler/my_scheduler.yaml
─────────────────────────────
interval: step #or epoch    # when the scheduler should be called, at each step of each epoch
frequency: 1                # how often should it be called, in most cases this should be 1
monitor: metric_to_track    # parameter for pytorch lightning to log the lr
scheduler:                  # defining the actuel scheduler class
  _target_: path.to.my.scheduler.class    # path to your scheduler
  arg1: custom_args        # arguments for the scheduler
  arg2: ...           
python training.py lr_scheduler=my_scheduler

Loss Function

Configure

There are two parameters to define the functionality of the loss function. The lossfunction parameter is used to define one or multiple loss functions. The lossweights parameter is used to weight the different losss functions. Both are explained in more detail in the following and can be overwritten from the commandline as shown below:

  • lossfunction: defines the loss function to be used and can be set by: lossfunction="CE" for using Cross Entropy Loss. If the model has multiple outputs a list of loss functions can be passed, where the order inside the list corresponds to the order of the model outputs. For example: lossfunction=["RMI","CE"] if the RMI loss should be used for the primary model output and Cross Entropy for the secondary output. The following losses are supported and can be selected by using the corresponding name/abbreviation:
  • lossweight: In the case of multiple losses, it may be useful to weight the losses differently. Therefore pass a list of weights where the length correspond to the number of losses/model outputs. For two outputs this can be done in the following way: lossweight=[1, 0.4] to weight the primary loss by 1 while the second output is weighted less with 0.4. If not specified no weighting is done. By default lossweight=[1, 0.4, 0.05, 0.05] is used.
python training.py lossfunction=wCE lossweight=1                    # For one output like for HRNet
python training.py lossfunction=[RMI, CE] lossweight=[1,0.4]        # Two outputs like OCR and OCR+ASPP
python training.py lossfunction=[wRMI, wCE, wCE, wCE] lossweight=[1, 0.5, 0.1, 0.05]  # Four outputs like OCR+MS

Consider the number of outputs of each model for defining the correct number of losses in the right order. If the number of given loss functions/lossweights is higher than the number of model outputs that's no problem and only the first corresponding lossfunctions/lossweights are used. For the supported models the number of outputs is listed here

Customize

The loss function in defined using the get_loss_function_from_cfg function in utils/lossfunction . Inside the function your have access to everything what is defined in the cfg. To add a custom loss function just add the following onto the bottom of the function:

elif LOSSFUNCTION == "MYLOSS":
        ...                  #do whatever you need
        loss_function = MyLoss(...)

The loss function will be called in the following way: lossfunction(y_pred, y_gt) with y_pred.shape = [batch_size, num_classes, height, width] and y_gt.shape = [batch_size, height, width]. If you need the data in another format you can use for example lambda functions (look at the definition of "DC_CE" loss in the get_loss_function_from_cfg).

Data Augmentations

Configure

Some predefined data augmentation pipelines are provided (see in the conifg/data_augmentation/ folder). For the provided datasets the augmentation with the corresponding name is used by default. The data augmentations can be selected by the following command.

python training.py data_augmentation=VOC2010_Context
python training.py data_augmentation=Custom_augmentation

Customize

For Data Augmentation the Albumentations package is used. A short introduction to use Albumentations for semantic segmentation is give here and an overview about all transformations which are supported by Albumentations is given here. Thereby this repository provides a simple API for defining data augmentations. To define custom data augmentations adopt the following example and put it into config/data_augmentations/custom_augmentation.yaml. Train and Test transformations are defined separately using AUGMENTATIONS.TEST and AUGMENTATIONS.TRAIN (see example). Thereby different Albumentations transformations are listed in list format, while there parameters are given as dicts. Some transformations like Compose() or OneOf() need other transformations as input. Therefore, recursively define these transformations in the transforms parameter of the outer transformation(Compose, OneOf, ...) like it can be seen in the example. Consider that only Albumentations transformations are supported. Typically, an Albumentation transformation pipeline consists of an outer Compose containing the list of all operations and the last operation is a ToTensorV2.

config/data_augmentations/custom_augmentation.yaml
─────────────────────────────
#@package _global_
AUGMENTATIONS:
  VALIDATION:
    - Compose:
        transforms:
          - Normalize:
              mean: [ 0.485, 0.456, 0.406 ]
              std: [ 0.229, 0.224, 0.225 ]
          - ToTensorV2:
  TEST: ${AUGMENTATIONS.VALIDATION}  # when same augmentations are used for testing and validation
                                     # otherwise define them like validation and train
  TRAIN:
    - Compose:
        transforms:
          # dummy structure
          - Albumentations_transformation:
              parameter1: ...
              parameter2: ...
              ...
          # some example transformations
          - RandomCrop:
              height: 512
              width:  1024
          # nested transformation
          - OneOf:
              transforms:
                - HorizontalFlip:
                    p: 0.5
          - ...    # put other transformations here
          - Normalize:
              mean: [ 0.485, 0.456, 0.406 ]
              std: [ 0.229, 0.224, 0.225 ]
          - ToTensorV2:

However, for very complex data augmentation pipelines this API requires a high effort and is not suitable. For this case you can define your augmentation pipeline with Albumentations and output the pipeline as dict or save it as .json. This dict (or the content of the .json file) can then be inserted under the argument FROM_DICT. An example can be seen below and in the data_augmentations/autoaugment.yaml files.

TRAIN:
  FROM_DICT: { "__version__": "1.1.0", "transform": { "__class_fullname__": "Compose", "p": 1.0, "transforms": [ { "__class_fullname__": "RandomCrop", "always_apply": false, "p": 1.0, ... ,{ "__class_fullname__": "ToTensorV2", "always_apply": true, "p": 1.0, "transpose_mask": true } ], "bbox_params": null, "keypoint_params": null, "additional_targets": { } }}

Metric

Configure

In this repository the Intersection over Union (mean_IoU) and the Dice score (mean_Dice) is provided. Both metric update a confusion matrix in each step and compute a final scores at the end of each epoch. The final score is composed of first calculating the score for each class and then taking the class wise mean. By default only this final score is returned and logged, if additionally the score for each class is required use mean_IoU_Class or mean_Dice_Class. Some additional configurations are provided, adopt them in the config files or override them from commandline:

  • METRIC.NAME: Name of the metric that should be optimized. The name has to be one the metrics which is logged to tensorboard. If a step-wise or image-wise computed metric should be optimized, the "_stepwise" or "_per_image" postfix has to be used (e.g. meanDice_stepwise). If the metric should be optimized for a single class, add the class name (for mean_Dice_Class and mean_IoU_Class) e.g. meanDice_class1.
  • METRIC.call_global: The metric is updated in each step and computed once at the end of each epoch. True by default.
  • METRIC.call_stepwise: The metric is computed in each step and averaged at the end of each epoch (avg. over all batches). False by default. Can be combined with METRIC.call_global but only one of METRIC.call_stepwise and METRIC.call_per_img can be True.
  • METRIC.call_per_img: The metric is computed for each image and averaged at the end of each epoch (avg. over all images). False by default. Can be combined with METRIC.call_global but only one of METRIC.call_stepwise and METRIC.call_per_img can be True.
  • METRIC.train_metric: True or False (False by default), provides the possibility to have a separate metric during training.
python training.py metric=mean_IoU         # mean Intersection over Onion (IoU)
python training.py metric=mean_Dice        # mean Dice score
python training.py metric=mean_IoU_Class   # mean IoU with additionally logging scores for each class
python training.py metric=mean_Dice_Class  # mean Dice with additionally logging scores for each class
python training.py METRIC.call_stepwise=True METRIC.train_metric=True  # change metric settings

Customize

For defining a new metric use the torchmetric package. This makes the metric usable for multi-GPU training, a python dummy for such a metric can be found below. More information on how to define a torchmetrics can be found here As a restriction in this repository, the compute() method must return either a single tensor or a dict. A dict should be used when multiple metrics are returned, e.g. the IoU for each class separately.
If a dict is used the metric is logged named by the corresponding key (avoid duplicates), if a single tensor is returned it will be named by name of the metric defined in the config.

from torchmetrics import Metric

class CustomMetric(Metric):
    def __init__(self, ...):
        ...
        #define state variables like this to make your metric multi gpu usable
        self.add_state("variable", default=XXX, dist_reduce_fx="sum", )
        ...

    def update(self, pred: torch.Tensor, gt: torch.Tensor):  
        # input is the batch-wise ground truth (gt) and models prediction (pred)
        ...     # pred.shape= [batch_size, num_classes, height, width]
                # gt.shape= [batch_size, height, width]

    def compute(self):
        ...  # do your computations
        return metric  # return the metric which should be optimized
        # or
        return {"metric1":value,"metric2":value,...} # if you want additional metrics to be logged 
                                                     # return them in dict format as a second arguments
    
    # (Optional) This function can be used to save metric states, e.g. a confusion matrix.
    # If the metric has a save_state function, the function is called in validation_epoch_end
    # If you dont need this functionality you don't need to define this function
    def save_state(self, trainer: pl.Trainer):
        # Save whatever you want to save
        ...
        

After implementing the metric you have to set up the config of the metric. Therefore create a my_metric.yaml in config/metric/ and use the following dummy to define the metric. METRIC.NAME should be the name of your target metric which should be one of the metrics defined in METRIC.METRICS(if the metric returns a single tensor), if the metric returns a dict METRIC.NAME should be a key in this dict. The remaining Parameters should be set as described in the Configure section above

config/metric/my_metric.yaml
─────────────────────────────
#@package _global_
METRIC:
  NAME: mymetric_name          # which metric to optimize - should be on of the names defined in METRIC.METRICS
  train_metric: False    # If also a train metric is wanted (in addition to a validation metric)
  call_global: True      # If True metric is updated in each step and computed once at the end of the epoch
  call_stepwise: False   # If True metric is computed in each step (usually one batch) and averaged over all steps - exclusively with call_per_img but can be combined with call_global.
  call_per_img: False    # If True metric is computed for each image and averaged over all images - exclusively with call_stepwise but can be combined with call_global.

  METRICS:
    mymetric_name: # define the name of the metric, needed for logging and to find the target metric
      _target_: src.metric.myMetricClass  # path to the metric Class
      ...
      #num_classes: ${DATASET.NUM_CLASSES}  # list of arguments for initialization, e.g. number of classes
python training.py metric=my_metric

Environment

Configure

If you run code on different devices (e.g. on your local machine and a gpu-cluster) it can make sense to group all environment specific settings, e.g. paths or hyperparameters like the batch size, to enable easy switching between them. Different environments are stored in the conifg/environment/ folder and can be used in the following way. To add you own environment look at the customization chapter. By default environment=local.

python training.py environment=cluster
python training.py environment=local

Customize

An environment config contains everything which is specific for the environment like paths or specific parameters but also to reach environment specific behaviour by for example enable/disable checkpoint saving or the progressbar. Since the environment config is merged into the training config at last, you can override all parameters from there. For adding a new environment config create a custom_env.yaml file in config/environment/ and adapt the following dummy:

config/envrironment/custom_env.yaml
─────────────────────────────
#@package _global_

# Output directory for logs and checkpoints
LOGDIR: logs/
# Paths to datasets
paths:
  cityscapes: /home/.../Datasets/cityscapes
  VOC2010_Context: /home/.../Datasets/VOC2010_Context
  other_datasets: ...
# Whatever you need
CUSTOM_PATH: ...
Some_Parameter: ...
...
python training.py environment=custom_env

Testing

Configure

You can test you model directly by using the testing.py file. In this case, the configuration used in the experiment is reconstructed and the model is evaluated under the same settings as during training. The function expects a ckpt_dir arguments which is the path to the experiment you want to test (dir to the folder which contains checkpoints/, hydra/, hparams.yaml etc.). To change the parameters during training you can override them using Hydras commandline syntax or adding a TESTING.OVERRIDES entry to the config (see example below). If so, the TESTING entry should be added to the config of the model or dataset for which this setting is desired. To use multiscale testing insert the desired scaled to the TESTING.SCALES entry in the config or from commandline. Similar flipping can be enabled by setting TESTING.FLIP to True. Node that a test dataset is needed, if you don't have one your dataset class should return the validation set instead (as done for Cityscapes in this repo). Also, keep in mind that the environmental parameters must also be adopted if training and testing are performed on different devices.

Example to change the config during testing. One way is to add TESTING somewhere in the config or to overwrite arguments from the commandline. If none of both is used the same config used for training is also used for testing.

For The VOC Dataset and the hrnet_ocr_ms model already some test settings are defined. See the corresponding .yaml file for more details.

config/somewhere/xxx.yaml
─────────────────────────────
TESTING:
  SCALES: [ 0.5, 1.0, 1.5, 2.0 ] # scales for multiscale testing, if not wanted delete the line of leave empty
  FLIP: True # if flipping should be used during testing, delete or set the line to false if not wanted
  OVERRIDES: # List of arguments which should be overwritten in the config for testing
    - val_batch_size=1 # For example set the batch size to 1 during training (you have to use the syntax a=x here)
    - ...
# Run testing
python testing.py ckpt_dir=<somepath>
# When no TESTING entry in the config is used, the same behaviour defined in .yaml above can be reached from the commandline by:
python testing.py ckpt_dir=<somepath> +TESTING.SCALES=[ 0.5, 1.0, 1.5, 2.0 ]  +TESTING.FLIP=True val_batch_size=1
# example for PASCALContext dataset
python testing.py ckpt_dir="/home/.../PASCALContext/hrnet/data_augmentations=PASCALContext_epochs=200/2022-01-18_16-05-09" environment=local TESTING.SCALES=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0]  TESTING.FLIP=True

Acknowledgements

    

This Repository is developed and maintained by the Applied Computer Vision Lab (ACVL) of Helmholtz Imaging.