Skip to content

jystin/ATBench

Repository files navigation

@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology (WACV 2025)

by Xin Jiang*, Junwei Zheng*, Ruiping Liu, Jiahang Li, Jiaming Zhang†, Sven Matthiesen, Rainer Stiefelhagen

* denotes equal contribution and † denotes corresponding author

News

  • [2024.09.17] ATBench (Assistive Technology Benchmark) is accepted to WACV2025.
  • [2024.10.13] We are excited to release ATModel (Assistive Technology Model) training code (INSTALL.md, DATASET.md, TRAIN.md, EVALUATION.md)

pipeline

Introduction

multi_task_result

ATBench is designed by a pre-design user study with PVIs, including five five most crucial vision-language tasks: Panoptic Segmentation, Image Captioning, Visual Question Answering (VQA), Depth Estimation, Optical Character Recognition (OCR). And we also proposed a novel ATModel that can address all tasks simultaneously.

More detailed can be found in our arxiv paper.

Getting Started

Checkpoints and Numbers:

PS
(ADE-150)
DE
(NYU-V2)
OCR
(6 datasets avg)
IC
(VizWiz_Cap)
VQA
(VizWiz_VQA)
#Params
Model PQ RMSE Acc(%) CIDEr Acc(%)
Unified-IO (S) - 0.649 - - 42.4 71M
Unified-IO (B) - 0.469 - - 45.8 241M
Unified-IO (L) - 0.402 - - 47.7 776M
X-Decoder (T) 41.6 - - - - 164M
GIT (T) - - - 113.1 68.0 0.7B
PaLI (T) - - - 117.2 67.5 3.0B
ATModel 38.5 0.425 80.1 52.5 53.7 62M

Installation, Dataset, Training and Evaluation Guide:

Acknowledgement

  • We build our work on top of X-Decoder and use their code. We appreciate the previous open-source repository X-Decoder.

Citation

If you find our work useful in your research, please cite:

@inproceedings{jiang2025atbench,
title={@BENCH: Benchmarking Vision-Language Models for Human-centered Assistive Technology},
author={Jiang, Xin and Zheng, Junwei and Liu, Ruiping and Li, Jiahang and Zhang, Jiaming and Matthiesen, Sven and Stiefelhagen, Rainer},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published