Skip to content

Latest commit

 

History

History
53 lines (35 loc) · 2.61 KB

README.md

File metadata and controls

53 lines (35 loc) · 2.61 KB

DeepLabV3+ with TensorFlow

Motivation

One of the novel AI models is computer vision. There are many different types of computer vision models, such as mask-rcnn, u-net, and so on. This project aims to introduce the state-of-the-art computer vision model, DeepLabV3+. This model is a semantic segmentation architecture, and atrous convolution was introduced in DeepLab as a tool to adjust and control the effective field-of-view of the convolution. aspp
The above picture shows how the atrous convolution works. In comparison with regular CNN, the method skips pixels, and this can capture more information about pictures while they are in encoding. 

The below picture shows the entire model architecture, and it shows that the atrous convolution method used in Atrous Spatial Pyramid Pooling (ASPP). aspp

1.Requiremnents

Python ==> 3.8
TensorFlow ==> version 2+

Assume DataFolder:
DeepLab
----> Train_Images (for training)
----> Train_Masks (for training)
----> Val_Images (for validation)
----> Val_Masks (for validation)
----> Test_Images (for testing)
----> Test_Masks (for testing)
----> Output (for Saving model)

2.Datsets

Image Resources: Link

Sample Images:
imagemask

3.Demo

deeplabv3.mp4

4.Results

After 20 epochs, DeepLabV3+ predicts those images with high accuracy.

Train Loss Dice Coef IoU Val Loss Val Dice Coef Val IoU
0.0730 0.9270 0.8643 0.1173 0.8827 0.7906

results

DeepLabV3+ finishes the training in approximately 30 seconds with 128 x 128 images, and the RAM consumption on the GPU is around 8 GB. This is because I used ResNet50 as a backbone. Thus, selecting different models as the backbone would be different from the results. 

5.Future Study

I saw one issue in DeepLabV3+. The model struggles to differentiate between black and white. For example, the model is unable to perform with arms akimbo. This might be related to the image sizes. So, I need to try it with larger images in the future.

bad example