Add metric toggles to leaderboard + Remove failed FPR scores#50

Merged

liamdugan merged 19 commits intomainfrom

Jul 25, 2025

Owner

liamdugan commented Jul 25, 2025

This pull request adds:

Support for multiple FPR values & AUROC in run_evaluation
Leaderboard toggle to select one of three metrics: TPR@FPR=5%, TPR@FPR=1%, and AUROC
Recalculation of all existing detector evaluation scores to include these new metrics
Logic to remove detector predictions that do not meet a particular FPR threshold
Warnings on submission if your detector fails to meet the FPR threshold

Non-breaking interface changes

evaluate_cli.py now takes in multiple arguments for target_fpr.
The default value for evaluate_cli.py is both 0.05 FPR and 0.01 FPR.

Potentially breaking changes

The output format of results.json now has a slightly altered structure. Instead of the accuracy field pointing to the TPR@FPR=5% it now points to a dictionary indexed by the FPR containing the true positives, false negatives, and TPR for the particular fpr value.

When detectors are unable to achieve a given target FPR, the resulting field in the accuracy dictionary will be given a null value (as seen below).

Old

{
  "domain": "abstracts",
  "model": "llama-chat",
  "decoding": "greedy",
  "repetition_penalty": "no",
  "attack": "none",
  "tp": 200,
  "fn": 0,
  "accuracy": 1.0
},

New

{
  "domain": "abstracts",
  "model": "llama-chat",
  "decoding": "greedy",
  "repetition_penalty": "no",
  "attack": "none",
  "accuracy": {
    "0.05": {
      "tp": 200,
      "fn": 0,
      "accuracy": 1.0
    },
    "0.01": null
  },
  "auroc": 0.9989833333333333
},

This is a BREAKING CHANGE for any code built off evaluate_cli or run_evaluation that directly accesses the results.json. Please take care to catch these null values and index accuracy correctly.

liamdugan and others added 18 commits

June 21, 2025 15:46


          Made leaderboard title spacing better

85899b3


          Added toggle for metric to top of leaderboard

ce7080d


          Added auroc to evaluation

8d573a6


          +multiple fprs passed into run_evaluation


          evaluate CLI now takes in fprs as list

fb67a41


          feat(leaderboard): metric select

b23832c


          Fixed bug in compute_scores

bd1c950


          Removed print

27b1d9d


          Updated results for all detectors

8a9c6a8


          Merge branch 'toggle_metrics' of github.com:liamdugan/raid into toggl…

bd53841

…e_metrics


          Edited print to support multiple metric printing

e508f24


          Updated scores for all models

61620eb


          Remove submissions that fail FPR check

3ccdd58


          fixed bugs in fpr removal

35c20dd


          Updated leaderboard to not print failed FPRs

79bcf8a


          Fixed website build error

056c741


          updated shared task scores with new metrics

1deffb0


          refactor: use indexed dirs for dataloading

9c80379

liamdugan had a problem deploying to raid-main

July 25, 2025 16:14

— with

GitHub Actions Failure


          Update _version.py

9617a95

liamdugan had a problem deploying to raid-main

July 25, 2025 16:14

— with

GitHub Actions Error

liamdugan merged commit 28e96e8 into main

4 of 5 checks passed

liamdugan temporarily deployed to raid-main

July 25, 2025 17:16

— with

GitHub Actions Inactive

github-actions bot commented Jul 25, 2025

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

e5-small-lora

Release date: 2024-11-07

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 85.69% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 93.87% at FPR=5%.

LLMDet

Release date: 2023-05-24

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 26.70% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 33.40% at FPR=5%.

Luminar

Release date: 2025-05-17

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 100.00% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 100.00% at FPR=5%.

SpeedAI

Release date: 2025-05-08

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 99.62% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 99.86% at FPR=5%.

It's AI

Release date: 2025-04-01

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 94.15% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 95.75% at FPR=5%.

RoBERTa-base (GPT2)

Release date: 2019-08-24

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 51.77% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 59.19% at FPR=5%.

RoBERTa (ChatGPT)

Release date: 2023-01-18

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 26.64% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 42.52% at FPR=5%.

Desklib

Release date: 2024-10-03

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 83.76% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 92.38% at FPR=5%.

RoBERTa-large (GPT2)

Release date: 2019-08-24

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 50.70% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 55.73% at FPR=5%.

SuperAnnotate AI Detector

Release date: 2024-10-27

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 64.87% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 70.34% at FPR=5%.

GLTR

Release date: 2019-06-10

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 51.48% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 59.69% at FPR=5%.

Desklib AI Text Detector v1.01

Release date: 2025-02-16

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 91.17% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 94.87% at FPR=5%.

Binoculars

Release date: 2024-01-22

I've committed detailed results of this detector's performance on the test set to this PR.

Warning

No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated.
Without adversarial attacks, it achieved a TPR of 78.98% at FPR=5%.

RADAR

Release date: 2023-07-07

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 63.91% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 65.61% at FPR=5%.

Gaussian Extreme

Release date: 2025-05-17

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 97.10% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 97.08% at FPR=5%.

FastDetectGPT

Release date: 2023-10-08

I've committed detailed results of this detector's performance on the test set to this PR.

Warning

No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated.

Warning

No aggregate score across all non-adversarial settings is reported here as some domains/generator models/decoding strategies/repetition penalties were not included in the submission.
If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

github-actions bot pushed a commit that referenced this pull request


          leaderboard: add eval results (#50)

358d694

liamdugan mentioned this pull request

Leaderboard Results after new PR #51

Closed

zhudotexe mentioned this pull request

dummy PR to rehydrate results #56

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet