This repository contains the official implementation of PairDETR, a method for Joint Detection and Association of Human Bodies and Faces.
For setting up the environment, we highly recommend using Docker images to ensure reproducibility and avoid any dependency issues. For our experiments, we used the Docker image
You can also use the provided requirements file to set up your personal environment.
pip install -r requirements.txt
- Download the dataset from here.
- We used the annotations prepared by the authors of BFJDet download.
- Preprocessed annotations that cut boxes outside the image frame and removed ignored boxes are available in the annotations folder.
- Download the dataset from here.
- We used the annotations prepared by the authors of BFJDet download.
- We preprocessed the annotations to cut the boxes located outside the image frame and removed ignore boxes, you can use preprocessed ones directly from annotations folder.
After setting up the environment and preparing the datasets, update the paths in config.py.
Then to start the training run:
python run_lightning_d.py
After that the training should start. You can experiment with different hyperparameters and training setups. We used huggingface model loader to make it easier to experiment with other backbones or model, refer to train_lightning_d.py Detr_light class initialization you can change the model for example to:
DetaForObjectDetection.from_pretrained(
"jozhang97/deta-swin-large",
two_stage = False, with_box_refine = False
)
Please keep in mind that we don't support two-stages or box refinement training yet. you can also experiment with different feature extractors using timm integration. Keep in mind that these backbones are trained on ImageNet, not COCO so you may consider increasing the number of epochs for training and reseting some hyperparameters:
DeformableDetrForObjectDetection.from_pretrained(
"SenseTime/deformable-detr",
use_timm_backbone=True,
backbone="mobilenetv3_small_050.lamb_in1k"
)
refer to our test.py script for loading the model and inference, simple example to load the model:
from train_lightning_d import Detr_light
model = Detr_light(num_classes = 3, num_queries = 1500)
checkpoint = torch.load(<path to the chk>, map_location="cuda")
model.load_state_dict(checkpoint, strict=False)
Comparison between PairDETR method and other methods in the miss matching rate mMr-2 (the lower the better):
Model | Reasnable | Bare | Partial | Heavy | Hard | Average | Checkpoints |
---|---|---|---|---|---|---|---|
POS | 55.49 | 48.20 | 62.00 | 80.98 | 84.58 | 66.4 | weights |
BFJ | 42.96 | 37.96 | 48.20 | 67.31 | 71.44 | 52.5 | weights |
BPJ | - | - | - | - | - | 50.1 | weights |
PBADET | - | - | - | - | - | 50.8 | none |
OURs | 35.25 | 30.38 | 38.12 | 52.47 | 55.75 | 42.9 | weights |
- End-to-End Object Detection with Transformers
- CrowdHuman: A Benchmark for Detecting Human in a Crowd
- Body-Face Joint Detection via Embedding and Head Hook
- Deformable DETR: Deformable Transformers for End-to-End Object Detection
- DETR for Crowd Pedestrian Detection
- An Extendable, Efficient and Effective Transformer-based Object Detector