Skip to content

[NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation

Notifications You must be signed in to change notification settings

OpenRobotLab/OV_PARTS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

939b1a5 · Jun 24, 2024

History

13 Commits
Sep 17, 2023
Feb 6, 2024
Sep 17, 2023
Sep 17, 2023
Sep 17, 2023
Jun 24, 2024
Feb 6, 2024
Sep 17, 2023

Repository files navigation


OV-PARTS: Towards Open-Vocabulary Part Segmentation

Meng WeiXiaoyu YueWenwei ZhangXihui LiuShu KongJiangmiao Pang*
Shanghai AI Laboratory The University of Hong Kong The University of Sydney University of Macau Texas A&M University

🏠 About

OV-PARTS is a benchmark for Open-Vocabulary Part Segmentation by using the capabilities of large-scale Vision-Language Models (VLMs).

  • Benchmark Datasets: Two refined versions of two publicly available datasets:

  • Benchmark Tasks: Three specific tasks which provides insights into the analogical reasoning, open granularity and few-shot adapting abilities of models.

    • Generalized Zero-Shot Part Segmentation: this benchmark task aims to assess the model’s capability to generalize part segmentation from seen objects to related unseen objects.
    • Cross-Dataset Part Segmentation: except for the zero-shot generalization ability, this benchmark task aims to assess the model’s capability to generalize part segmentation across different datasets with varying granularity levels.
    • Few-Shot Part Segmentation: this benchmark task aims to assess the model’s fast adaptation capability.
  • Benchmark Baselines: Baselines based on existing two-stage and one-stage object-level open vocabulary segmentation methods, including ZSseg, CLIPSeg, CATSeg.

🔥 News

We organize the Open Vocabulary Part Segmentation (OV-PARTS) Challenge in the Visual Perception via Learning in an Open World (VPLOW) Workshop. Please check our website!

🛠 Getting Started

Installation

  1. Clone this repository

    git clone https://github.com/OpenRobotLab/OV_PARTS.git
    cd OV_PARTS
  2. Create a conda environment with Python3.8+ and install python requirements

    conda create -n ovparts python=3.8
    conda activate ovparts
    pip install -r requirements.txt

Data Preparation

After downloading the two benchmark datasets, please extract the files by running the following command and place the extracted folder under the "Datasets" directory.

tar -xzf PascalPart116.tar.gz
tar -xzf ADE20KPart234.tar.gz

The Datasets folder should follow this structure:

Datasets/
├─Pascal-Part-116/
│ ├─train_16shot.json
│ ├─images/
│ │ ├─train/
│ │ └─val/
│ ├─annotations_detectron2_obj/
│ │ ├─train/
│ │ └─val/
│ └─annotations_detectron2_part/
│   ├─train/
│   └─val/
└─ADE20K-Part-234/
  ├─images/
  │ ├─training/
  │ ├─validation/
  ├─train_16shot.json
  ├─ade20k_instance_train.json
  ├─ade20k_instance_val.json
  └─annotations_detectron2_part/
    ├─training/
    └─validation/

Create {train/val}_{obj/part}_label_count.json files for Pascal-Part-116.

python baselines/data/datasets/mask_cls_collect.py Datasets/Pascal-Part-116/annotations_detectron2_{obj/part}/{train/val} Datasets/Pascal-Part-116/annotations_detectron2_part/{train/val}_{obj/part}_label_count.json

Training

  1. Training the two-stage baseline ZSseg+.

    Please first download the clip model fintuned with CPTCoOp.

    Then run the training command:

    python train_net.py --num-gpus 8 --config-file configs/${SETTING}/zsseg+_R50_coop_${DATASET}.yaml
  2. Training the one-stage baselines CLIPSeg and CATSeg.

    Please first download the pre-trained object models of CLIPSeg and CATSeg and place them under the "pretrain_weights" directory.

    Models Pre-trained checkpoint
    CLIPSeg download
    CATSeg download

    Then run the training command:

    # For CATseg.
    python train_net.py --num-gpus 8 --config-file configs/${SETTING}/catseg_${DATASET}.yaml
    
    # For CLIPseg.
    python train_net.py --num-gpus 8 --config-file configs/${SETTING}/clipseg_${DATASET}.yaml

Evaluation

We provide the trained weights for the three baseline models reported in the paper.

Models Setting Pascal-Part-116 checkpoint ADE20K-Part-234 checkpoint
ZSSeg+ Zero-shot download download
CLIPSeg Zero-shot download download
CatSet Zero-shot download download
CLIPSeg Few-shot download download
CLIPSeg cross-dataset - download

To evaluate the trained models, add --eval-only to the training command.

For example:

  python train_net.py --num-gpus 8 --config-file configs/${SETTING}/catseg_${DATASET}.yaml --eval-only MODEL.WEIGHTS ${WEIGHT_PATH}

📝 Benchmark Results

  • Zero-shot performance of the two-stage and one-stage baselines on Pascal-Part-116

    Model Backbone Finetuning Oracle-Obj Pred-Obj
    Seen Unseen Harmonic Seen Unseen Harmonic
    Fully-Supervised
    MaskFormer ResNet-50 - 55.28 52.14 - 53.07 47.82 -
    Two-Stage Baselines
    ZSseg ResNet-50 - 49.35 12.57 20.04 40.80 12.07 18.63
    ZSseg+ ResNet-50 CPTCoOp 55.33 19.17 28.48 54.23 17.10 26.00
    ZSseg+ ResNet-50 CPTCoCoOp 54.43 19.04 28.21 53.31 16.08 24.71
    ZSseg+ ResNet-101c CPTCoOp 57.88 21.93 31.81 56.87 20.29 29.91
    One-Stage Baselines
    CATSeg ResNet-101
    &ViT-B/16
    - 14.89 10.29 12.17 13.65 7.73 9.87
    CATSeg ResNet-101
    &ViT-B/16
    B+D 43.97 26.11 32.76 41.65 26.08 32.07
    CLIPSeg ViT-B/16 - 22.33 19.73 20.95 14.32 10.52 12.13
    CLIPSeg ViT-B/16 VA+L+F+D 48.68 27.37 35.04 44.57 27.79 34.24
  • Zero-shot performance of the two-stage and one-stage baselines on ADE20K-Part-234

    Model Backbone Finetuning Oracle-Obj Pred-Obj
    Seen Unseen Harmonic Seen Unseen Harmonic
    Fully-Supervised
    MaskFormer ResNet-50 - 46.25 47.86 - 35.52 16.56 -
    Two-Stage Baselines
    ZSseg+ ResNet-50 CPTCoOp 43.19 27.84 33.85 21.30 5.60 8.87
    ZSseg+ ResNet-50 CPTCoCoOp 39.67 25.15 30.78 19.52 2.98 5.17
    ZSseg+ ResNet-101c CPTCoOp 43.41 25.70 32.28 21.42 3.33 5.76
    One-Stage Baselines
    CATSeg ResNet-101
    &ViT-B/16
    - 11.49 8.56 9.81 6.30 3.79 4.73
    CATSeg ResNet-101
    &ViT-B/16
    B+D 31.40 25.77 28.31 20.23 8.27 11.74
    CLIPSeg ViT-B/16 - 15.27 18.01 16.53 5.00 3.36 4.02
    CLIPSeg ViT-B/16 VA+L+F+D 38.96 29.65 33.67 24.80 6.24 9.98
  • Cross-Dataset performance of models trained on the source dataset ADE20K-Part-234 and tested on the target dataset Pascal-Part-116.

    Model Source Target
    Oracle-Obj Pred-Obj Oracle-Obj Pred-Obj
    CATSeg 27.95 17.22 16.00 14.72
    CLIPSeg VA+L+F 35.01 21.74 16.18 11.70
    CLIPSeg VA+L+F+D 37.76 21.87 19.69 13.88

🔗 Citation

If you find our work helpful, please cite:

@inproceedings{wei2023ov,
  title={OV-PARTS: Towards Open-Vocabulary Part Segmentation},
  author={Wei, Meng and Yue, Xiaoyu and Zhang, Wenwei and Kong, Shu and Liu, Xihui and Pang, Jiangmiao},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
  year={2023}
}

👏 Acknowledgements

We would like to express our gratitude to the open-source projects and their contributors, including ZSSeg, CATSeg and CLIPSeg. Their valuable work has greatly contributed to the development of our codebase.

About

[NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages