- implementation of yolov2 concepts,but not 100% in accordance with every detail
- vgg16 as image encoder
- trained on voc2012 for 2 epochs
- 7*7 total cells in detection layer,5 box predictors in each cell,total detection layer size is 7*7*(5*5+20)
- train.py #train
- evaluate.py #predict
- config.py #config
- dividing detection layer logits by scalar to reduce magnitude before activation,this might ease gradient vanishing
- linear activation instead of sigmoid for iou predication to ease gradient vanishing because ground truth iou and iou prediction are often near zero due to poor xy and wh prediction at early stage of training.Although sometimes linear activation might deviate far from [0,1],making loss explode.