|
| 1 | +# Segment Anything |
| 2 | + |
| 3 | +The **Segment Anything Model (SAM)** produces high quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a [dataset](https://segment-anything.com/dataset/index.html) of 11 million images and 1.1 billion masks, and has strong zero-shot performance on a variety of segmentation tasks. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +The code requires `python>=3.7` and `mindspore>=2.0` and supports both GPU and Ascend platform. Please follow the instructions [here](https://www.mindspore.cn/install) to install mindspore dependencies. |
| 8 | + |
| 9 | +Clone the repository locally and install with |
| 10 | + |
| 11 | +```shell |
| 12 | +git clone https://github.com/mindspore-lab/models.git |
| 13 | +cd research/segment-anything |
| 14 | +pip install -r requirements.txt |
| 15 | +``` |
| 16 | + |
| 17 | +## Inference |
| 18 | + |
| 19 | +First download the weights ([sam_vit_b](sam_vit_b-35e4849c.ckpt), [sam_vit_l](sam_vit_l-1b460f38.ckpt), [sam_vit_h](sam_vit_h-c72f8ba1.ckpt)) and put them under `${project_root}/models` directory. |
| 20 | +There are two recommended ways to use sam. |
| 21 | + |
| 22 | +### Using sam with prompts |
| 23 | + |
| 24 | +SAM predicts object masks given prompts that indicate the desired object. if a point prompt is given, three plausible masks are generated. |
| 25 | + |
| 26 | +```shell |
| 27 | +python use_sam_with_promts.py --prompt-type point --model-type vit_h |
| 28 | +``` |
| 29 | + |
| 30 | +<p float="left"> |
| 31 | + <img src=images/truck_mask1.png width="400"/><img src=images/truck_mask2.png width="400"/><img src=images/truck_mask3.png width="400"/> |
| 32 | +</p> |
| 33 | + |
| 34 | +If a prompt with two points is given, one plausible mask is generated instead of 3 because of less ambiguity compared to one point prompt. |
| 35 | +The star in green and red denotes positive and negtive point, respectively. |
| 36 | + |
| 37 | +<div align="center"> |
| 38 | + <img alt="img.png" src="images/truck_two_point.png" width="600"/> |
| 39 | +</div> |
| 40 | + |
| 41 | +If a box prompt is given, one plausible masks is generated. |
| 42 | + |
| 43 | +```shell |
| 44 | +python use_sam_with_promts.py --prompt-type box --model-type vit_h |
| 45 | +``` |
| 46 | + |
| 47 | +<div align="center"> |
| 48 | + <img alt="img.png" width="600" src="images/truck_box.png"/> |
| 49 | +</div> |
| 50 | + |
| 51 | +If a prompt with both a box and a point is given, one plausible mask is generated. |
| 52 | + |
| 53 | +```shell |
| 54 | +python use_sam_with_promts.py --prompt-type point_box --model-type vit_h |
| 55 | +``` |
| 56 | + |
| 57 | +<div align="center"> |
| 58 | + <img alt="img.png" width="600" src="images/truck_point_box.png"/> |
| 59 | +</div> |
| 60 | + |
| 61 | +See `python use_sam_with_promts.py --help` to explore more custom settings. |
| 62 | + |
| 63 | +### Using sam with Automatic Mask Generation(AMG) |
| 64 | + |
| 65 | +Since SAM can efficiently process prompts, masks for the entire image can be generated by sampling a large number of prompts over an image. AMG works by sampling single-point input prompts in a grid over the image, from each of which SAM can predict multiple masks. Then, masks are filtered for quality and deduplicated using non-maximal suppression. Additional options allow for further improvement of mask quality and quantity, such as running prediction on multiple crops of the image or postprocessing masks to remove small disconnected regions and holes. |
| 66 | + |
| 67 | +```shell |
| 68 | +python use_sam_with_amg.py --model-type vit_h |
| 69 | +``` |
| 70 | + |
| 71 | +<div align="center"> |
| 72 | +<img src="images/dengta.jpg" height="350" /> |
| 73 | + |
| 74 | +<img src="images/dengta-amg-vith.png" height="350" /> |
| 75 | +</div> |
| 76 | + |
| 77 | +See `python use_sam_with_amg.py --help` to explore more custom settings. |
| 78 | + |
| 79 | +## Finetune |
| 80 | + |
| 81 | +Finetune is a popular method that adapts large pretrained model to specific downstream tasks. Currently, finetune with box-prompt are supported. The bounding boxes are used as prompt input to predict mask. |
| 82 | +Beside fine-tuning our code on COCO2017 dataset which contains common seen objects and lies in the similar distribution of the original [training dataset](https://segment-anything.com/dataset/index.html) of SAM, We have done further experiments on a medical imaging segmentation dataset [FLARE22](https://flare22.grand-challenge.org/Dataset/). Result shows that the finetune method in this repository is effective. |
| 83 | + |
| 84 | +The bellowing shows the mask quality before and after finetune. |
| 85 | + |
| 86 | + |
| 87 | +| pretrained_model | dataset | epochs | mIOU | |
| 88 | +|:----------------:| -------- |:-------------:|------| |
| 89 | +| sam-vit-b | COCO2017 | 0 (zero-shot) | 77.4 | |
| 90 | +| sam-vit-b | COCO2017 | 20 | 83.5 | |
| 91 | +| sam-vit-b | FLARE22 | 0 (zero-shot) | 79.5 | |
| 92 | +| sam-vit-b | FLARE22 | 10 | 88.1 | |
| 93 | + |
| 94 | +To finetune COCO dataset, please run: |
| 95 | + |
| 96 | +```shell |
| 97 | +mpirun --allow-run-as-root -n 8 python train.py -c configs/coco_box_finetune.yaml |
| 98 | +``` |
| 99 | + |
| 100 | +The original FLARE22 dataset contains image in 3D format and ground truth labelled as instance segmentation ids. Run |
| 101 | + |
| 102 | +```shell |
| 103 | +python scripts/preprocess_CT_MR_dataset.py |
| 104 | +``` |
| 105 | + |
| 106 | +to preprocess it to the format of 2D RGB image and binary mask |
| 107 | + |
| 108 | +To finetune FLARE22 dataset, please run: |
| 109 | + |
| 110 | +```shell |
| 111 | +mpirun --allow-run-as-root -n 8 python train.py -c configs/flare_box_finetune.yaml |
| 112 | +``` |
| 113 | + |
| 114 | +Here are the examples of segmentation result predicted by fine-tuned SAM: |
| 115 | + |
| 116 | +<div align="center"> |
| 117 | +<img src="images/coco_bear.jpg" height="350" /> |
| 118 | + |
| 119 | +<img src="images/flare_organ.jpg" height="350" /> |
| 120 | +</div> |
| 121 | + |
| 122 | +<p align="center"> |
| 123 | + <em> COCO2017 image example</em> |
| 124 | + |
| 125 | + |
| 126 | + <em> FLARE22 image example </em> |
| 127 | +</p> |
| 128 | + |
| 129 | + |
0 commit comments