On the Value of Out-of-Distribution Testing: An Example of Goodhart’s Law

## 一言でいうと
VQAタスクにおけるOOD testingのベンチマークに関する問題点を指摘し，評価方法を議論．

### 論文リンク
https://papers.nips.cc/paper/2020/file/045117b0e0a11a242b9765e79cbf113f-Paper.pdf

### 著者/所属機関
Damien Teney et al.
(Australian Institute for Machine Learning, University of Adelaide, Australia)

### 投稿日付(yyyy/MM/dd)
2020/12

## 概要

> Goodhart’s law: When a measure becomes a target, it ceases to be a good measure.

OOD testingは学習データセットのバイアスを解決する手法の一つとして非常に注目を集めている．
OODベンチマークは学習データとテストデータが異なる同時分布となるように設計されている．
VQA-CPはvisual question answeringにおける一般的なOODベンチマークの一つである．
しかしながら，著者たちはこのデータセットは実際には3つの問題が存在することを発見した．

1. ほとんどの公開されている手法はどのようにOOD splitが行われているのかの明示的な知識に依存している．
2. OODテストセットがモデル選択に使われている．
3. モデルのin-domainパフォーマンスはin-domainデータセットにおける再学習の後の結果のみが報告される．

このような問題を解決する評価方法について議論．

<img width="811" alt="Screen Shot 2021-02-26 at 2 49 35" src="https://user-images.githubusercontent.com/10952293/109195014-60c60500-77dd-11eb-944c-fd6779106999.png">


## 新規性・差分
- VQAタスクにおけるOOD testingのベンチマークに関する問題点を指摘し，評価方法を議論．

## 手法

<img width="804" alt="Screen Shot 2021-02-26 at 2 49 43" src="https://user-images.githubusercontent.com/10952293/109195034-66bbe600-77dd-11eb-89fb-9cda993642cd.png">

<img width="490" alt="Screen Shot 2021-02-26 at 2 49 51" src="https://user-images.githubusercontent.com/10952293/109195045-691e4000-77dd-11eb-83ba-e4d64871bb04.png">


## 結果

<img width="804" alt="Screen Shot 2021-02-26 at 2 49 59" src="https://user-images.githubusercontent.com/10952293/109195066-6d4a5d80-77dd-11eb-9956-8cbdff32e6e8.png">

<img width="688" alt="Screen Shot 2021-02-26 at 2 50 11" src="https://user-images.githubusercontent.com/10952293/109195081-70454e00-77dd-11eb-855c-c7bc2c0839db.png">


## コメント


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the Value of Out-of-Distribution Testing: An Example of Goodhart’s Law #38

一言でいうと

論文リンク

著者/所属機関

投稿日付(yyyy/MM/dd)

概要

新規性・差分

手法

結果

コメント

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

On the Value of Out-of-Distribution Testing: An Example of Goodhart’s Law #38

Description

一言でいうと

論文リンク

著者/所属機関

投稿日付(yyyy/MM/dd)

概要

新規性・差分

手法

結果

コメント

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions