CityU-AIM-Group
diff --git a/‎README.md‎
Lines changed: 177 additions & 2 deletions b/‎README.md‎
Lines changed: 177 additions & 2 deletions
diff --git a/‎assets/mot.png‎
2.49 MB b/‎assets/mot.png‎
2.49 MB
diff --git a/‎assets/overall.png‎
1010 KB b/‎assets/overall.png‎
1010 KB
diff --git a/‎assets/paper.pdf‎
9.45 MB b/‎assets/paper.pdf‎
9.45 MB
diff --git a/‎benchmark.py‎
Lines changed: 69 additions & 0 deletions b/‎benchmark.py‎
Lines changed: 69 additions & 0 deletions
@@ -1,2 +1,177 @@
-# SOMA
-[ICCV' 23] Novel Scenes &amp; Classes: Towards Adaptive Open-set Object Detection
+# [Novel Scenes &amp; Classes: Towards Adaptive Open-set Object Detection (ICCV-23 ORAL)](assets/paper.pdf)
+
+By [Wuyang Li](https://wymancv.github.io/wuyang.github.io/)
+
+Paper link will be updated after the CVF open access.
+
+<div align=center>
+<img src="./assets/mot.png" width="400">
+</div>
+
+Domain Adaptive Object Detection (DAOD) strongly assumes a shared class space between the two domains.
+
+This work breaks the assumption and formulates Adaptive Open-set Object Detection (AOOD), by allowing the target domain with novel-class objects.
+
+The object detector uses the base-class labels in the source domain for training, and aims to detect base-class objects and identify novel-class objects as unknown in the target domain.
+
+If you have any ideas and problems hope to discuss, you can reach me out via  [E-mail](mailto:wuyangli2-c@my.cityu.edu.hk).
+
+# 💡 Preparation
+
+## Setp 1: Clone and Install the Project
+
+### Clone the repository
+
+```bash
+git clone https://github.com/CityU-AIM-Group/SOMA.git
+```
+
+### Install the project following [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR)
+
+Note that the following is in line with our experimental environments, which is silightly different from the official one.
+
+```
+# Linux, CUDA>=9.2, GCC>=5.4
+# (ours) CUDA=10.2, GCC=8.4, NVIDIA V100 
+# Establish the conda environment
+
+conda create -n aood python=3.7 pip
+conda activate aood
+conda install pytorch=1.5.1 torchvision=0.6.1 cudatoolkit=10.2 -c pytorch
+pip install -r requirements.txt
+
+# Compile the project
+cd ./models/ops
+sh ./make.sh
+
+# unit test (should see all checking is True)
+python test.py
+
+# NOTE: If you meet the permission denied issue when starting the training
+cd ../../ 
+chmod -R 777 ./
+```
+
+## Setp 2: Download Necessary Resources
+
+### Download pre-processed datasets (VOC format) from the following links
+
+|                |                                                              (Foggy) Cityscapes                                                              |                              Pascal VOC                              |                                                                      Clipart                                                                      |                                                                      BDD100K                                                                      |
+| :------------: | :------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------: |
+| Official Links |                                                [Imgs](https://www.cityscapes-dataset.com/login/)                                                | [Imgs+Labels](https://pjreddie.com/projects/pascal-voc-dataset-mirror/) |                                                                         -                                                                         |                                                                         -                                                                         |
+|   Our Links   | [Labels](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/EVNAjK2JkG9ChREzzqdqJkYBLoZ_VOqkMdhWasN_BETGWw?e=fP9Ae4) |                                  -                                  | [Imgs+Labels](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/Edz2YcXHuStIqwM_NA7k8FMBGLeyAGQcSjdSR-vYaVx_vw?e=es6KDW) | [Imgs+Labels](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/EeiO6O36QgZKnTcUZMInACIB0dfWEg4OFyoEZnZCkibKHA?e=6byqBX) |
+
+### Download DINO-pretrained ResNet-50 from this [link](https://portland-my.sharepoint.com/:u:/g/personal/wuyangli2-c_my_cityu_edu_hk/EVnK9IPi91ZPuNmwpeSWGHABqhSFQK52I7xGzroXKeuyzA?e=EnlwgO)
+
+## Setp 3: Change the Path
+
+### Change the data path as follows.
+
+```
+[DATASET_PATH]
+└─ Cityscapes
+   └─ AOOD_Annotations
+   └─ AOOD_Main
+      └─ train_source.txt
+      └─ train_target.txt
+      └─ val_source.txt
+      └─ val_target.txt
+   └─ leftImg8bit
+      └─ train
+      └─ val
+   └─ leftImg8bit_foggy
+      └─ train
+      └─ val
+└─ bdd_daytime
+   └─ Annotations
+   └─ ImageSets
+   └─ JPEGImages
+└─ clipart
+   └─ Annotations
+   └─ ImageSets
+   └─ JPEGImages
+└─ VOCdevkit
+   └─ VOC2007
+   └─ VOC2012
+```
+
+### Change the data root folder in config files
+
+Replace the DATASET.COCO_PATH in all yaml files in [config](configs) by your data root $DATASET_PATH, e.g., Line 22 of [soma_aood_city_to_foggy_r50.yaml](configs/soma_aood_city_to_foggy_r50.yaml)
+
+### Change the path of DINO-pretrained backbone
+
+Replace the backbone loading path at Line 107 of [backbone.py](models/backbone.py).
+
+# 🔥 Start Training
+
+We use two GPUs for training with 2 source images and 2 target images as input.
+
+```bash
+GPUS_PER_NODE=2 
+./tools/run_dist_launch.sh 2 python main.py --config_file {CONFIG_FILE} --opts DATASET.AOOD_SETTING 1
+```
+
+We provide some scripts in our experiments in [run.sh](./run.sh). After "--opts", the settings will overwrite the default config file as the maskrcnn-benchmark framework.
+
+# 📦 Well-trained models
+
+Will be provided later
+
+<!-- | Source| Target| Task | mAP $_b$ | AR $_n$ |  WI	 | AOSE |  AP@75 |	 checkpoint |
+| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:| :-----:
+| City 	|Foggy 	| het-sem | 
+| City 	|Foggy 	| het-sem | 
+| City 	|Foggy 	| het-sem | 
+| City 	|Foggy 	| het-sem |  -->
+
+
+# 💬 Notification
+
+- The core idea is to select informative motifs (which can be trated as the mix-up of object queries) for self-training.
+- You can try the DA version of [OW-DETR](https://github.com/akshitac8/OW-DETR) in this repository by setting: 
+```
+-opts AOOD.OW_DETR_ON True
+```
+- Adopting SAM to address AOOD may be a good direction.
+- To visualize unknown boxes, post-processing is needed in Line736 of [PostProcess](models/motif_detr.py).
+
+# 📝 Citation
+
+If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate your acknowledgment.
+
+```BibTeX
+@InProceedings{li2023novel,
+  title={Novel Scenes & Classes: Towards Adaptive Open-set Object Detection},
+  author={Li, Wuyang and Guo, Xiaoqing and Yuan, Yixuan},
+  booktitle={ICCV},
+  year={2023}
+}
+```
+
+Relevant project:
+
+Exploring the similar issue for the classifictaion task. [[link]](https://openaccess.thecvf.com/content/CVPR2023/html/Li_Adjustment_and_Alignment_for_Unbiased_Open_Set_Domain_Adaptation_CVPR_2023_paper.html)
+
+```BibTeX
+@InProceedings{Li_2023_CVPR,
+    author    = {Li, Wuyang and Liu, Jie and Han, Bo and Yuan, Yixuan},
+    title     = {Adjustment and Alignment for Unbiased Open Set Domain Adaptation},
+    booktitle = {CVPR},
+    year      = {2023},
+}
+```
+
+# 🤞 Acknowledgements
+
+We greatly appreciate the tremendous effort for the following works.
+
+- This work is based on DAOD framework [AQT](https://github.com/weii41392/AQT).
+- Our work is highly inspired by [OW-DETR](https://github.com/akshitac8/OW-DETR) and [OpenDet](https://github.com/csuhan/opendet2).
+- The implementation of the basic detector is based on [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR).
+
+# 📒 Abstract
+
+Domain Adaptive Object Detection (DAOD) transfers an object detector to a novel domain free of labels. However, in the real world, besides encountering novel scenes, novel domains always contain novel-class objects de facto, which are ignored in existing research. Thus, we formulate and study a more practical setting, Adaptive Open-set Object Detection (AOOD), considering both novel scenes and classes. Directly combing off-the-shelled cross-domain and open-set approaches is sub-optimal since their low-order dependence, such as the confidence score, is insufficient for the AOOD with two dimensions of novel information. To address this, we propose a novel Structured Motif Matching (SOMA) framework for AOOD, which models the high-order relation with motifs, \ie, statistically significant subgraphs, and formulates AOOD solution as motif matching to learn with high-order patterns. In a nutshell, SOMA consists of Structure-aware Novel-class Learning (SNL) and Structure-aware Transfer Learning (STL). As for SNL, we establish an instance-oriented graph to capture the class-independent object feature hidden in different base classes. Then, a high-order metric is proposed to match the most significant motif as high-order patterns, serving for motif-guided novel-class learning. In STL, we set up a semantic-oriented graph to model the class-dependent relation across domains, and match unlabelled objects with high-order motifs to align the cross-domain distribution with structural awareness. Extensive experiments demonstrate that the proposed SOMA achieves state-of-the-art performance.
+
+![image](./assets/overall.png)
@@ -0,0 +1,69 @@
+# ------------------------------------------------------------------------
+# Modified by Wei-Jie Huang
+# ------------------------------------------------------------------------
+# Deformable DETR
+# Copyright (c) 2020 SenseTime. All Rights Reserved.
+# Licensed under the Apache License, Version 2.0 [see LICENSE for details]
+# ------------------------------------------------------------------------
+
+"""
+Benchmark inference speed of Deformable DETR.
+"""
+import os
+import time
+import argparse
+
+import torch
+
+from main import get_args_parser as get_main_args_parser
+from models import build_model
+from datasets import build_dataset
+from util.misc import nested_tensor_from_tensor_list
+
+
+def get_benckmark_arg_parser():
+    parser = argparse.ArgumentParser('Benchmark inference speed of Deformable DETR.')
+    parser.add_argument('--num_iters', type=int, default=300, help='total iters to benchmark speed')
+    parser.add_argument('--warm_iters', type=int, default=5, help='ignore first several iters that are very slow')
+    parser.add_argument('--batch_size', type=int, default=1, help='batch size in inference')
+    parser.add_argument('--resume', type=str, help='load the pre-trained checkpoint')
+    return parser
+
+
+@torch.no_grad()
+def measure_average_inference_time(model, inputs, num_iters=100, warm_iters=5):
+    ts = []
+    for iter_ in range(num_iters):
+        torch.cuda.synchronize()
+        t_ = time.perf_counter()
+        model(inputs)
+        torch.cuda.synchronize()
+        t = time.perf_counter() - t_
+        if iter_ >= warm_iters:
+          ts.append(t)
+    print(ts)
+    return sum(ts) / len(ts)
+
+
+def benchmark():
+    args, _ = get_benckmark_arg_parser().parse_known_args()
+    main_args = get_main_args_parser().parse_args(_)
+    assert args.warm_iters < args.num_iters and args.num_iters > 0 and args.warm_iters >= 0
+    assert args.batch_size > 0
+    assert args.resume is None or os.path.exists(args.resume)
+    dataset = build_dataset('val', main_args)
+    model, _, _ = build_model(main_args)
+    model.cuda()
+    model.eval()
+    if args.resume is not None:
+        ckpt = torch.load(args.resume, map_location=lambda storage, loc: storage)
+        model.load_state_dict(ckpt['model'])
+    inputs = nested_tensor_from_tensor_list([dataset.__getitem__(0)[0].cuda() for _ in range(args.batch_size)])
+    t = measure_average_inference_time(model, inputs, args.num_iters, args.warm_iters)
+    return 1.0 / t * args.batch_size
+
+
+if __name__ == '__main__':
+    fps = benchmark()
+    print(f'Inference Speed: {fps:.1f} FPS')
+