Skip to content

Add metric toggles to leaderboard + Remove failed FPR scores#50

Merged
liamdugan merged 19 commits intomainfrom
toggle_metrics
Jul 25, 2025
Merged

Add metric toggles to leaderboard + Remove failed FPR scores#50
liamdugan merged 19 commits intomainfrom
toggle_metrics

Conversation

@liamdugan
Copy link
Owner

This pull request adds:

  • Support for multiple FPR values & AUROC in run_evaluation
  • Leaderboard toggle to select one of three metrics: TPR@FPR=5%, TPR@FPR=1%, and AUROC
  • Recalculation of all existing detector evaluation scores to include these new metrics
  • Logic to remove detector predictions that do not meet a particular FPR threshold
  • Warnings on submission if your detector fails to meet the FPR threshold

Non-breaking interface changes

  • evaluate_cli.py now takes in multiple arguments for target_fpr.
  • The default value for evaluate_cli.py is both 0.05 FPR and 0.01 FPR.

Potentially breaking changes

The output format of results.json now has a slightly altered structure. Instead of the accuracy field pointing to the TPR@FPR=5% it now points to a dictionary indexed by the FPR containing the true positives, false negatives, and TPR for the particular fpr value.

When detectors are unable to achieve a given target FPR, the resulting field in the accuracy dictionary will be given a null value (as seen below).

Old
{
  "domain": "abstracts",
  "model": "llama-chat",
  "decoding": "greedy",
  "repetition_penalty": "no",
  "attack": "none",
  "tp": 200,
  "fn": 0,
  "accuracy": 1.0
},
New
{
  "domain": "abstracts",
  "model": "llama-chat",
  "decoding": "greedy",
  "repetition_penalty": "no",
  "attack": "none",
  "accuracy": {
    "0.05": {
      "tp": 200,
      "fn": 0,
      "accuracy": 1.0
    },
    "0.01": null
  },
  "auroc": 0.9989833333333333
},

This is a BREAKING CHANGE for any code built off evaluate_cli or run_evaluation that directly accesses the results.json. Please take care to catch these null values and index accuracy correctly.

@liamdugan liamdugan merged commit 28e96e8 into main Jul 25, 2025
4 of 5 checks passed
@github-actions
Copy link

Eval run succeeded! Link to run: link

Here are the results of the submission(s):

e5-small-lora

Release date: 2024-11-07

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 85.69% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 93.87% at FPR=5%.

LLMDet

Release date: 2023-05-24

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 26.70% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 33.40% at FPR=5%.

Luminar

Release date: 2025-05-17

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 100.00% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 100.00% at FPR=5%.

SpeedAI

Release date: 2025-05-08

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 99.62% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 99.86% at FPR=5%.

It's AI

Release date: 2025-04-01

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 94.15% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 95.75% at FPR=5%.

RoBERTa-base (GPT2)

Release date: 2019-08-24

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 51.77% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 59.19% at FPR=5%.

RoBERTa (ChatGPT)

Release date: 2023-01-18

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 26.64% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 42.52% at FPR=5%.

Desklib

Release date: 2024-10-03

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 83.76% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 92.38% at FPR=5%.

RoBERTa-large (GPT2)

Release date: 2019-08-24

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 50.70% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 55.73% at FPR=5%.

SuperAnnotate AI Detector

Release date: 2024-10-27

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 64.87% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 70.34% at FPR=5%.

GLTR

Release date: 2019-06-10

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 51.48% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 59.69% at FPR=5%.

Desklib AI Text Detector v1.01

Release date: 2025-02-16

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 91.17% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 94.87% at FPR=5%.

Binoculars

Release date: 2024-01-22

I've committed detailed results of this detector's performance on the test set to this PR.

Warning

No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated.
Without adversarial attacks, it achieved a TPR of 78.98% at FPR=5%.

RADAR

Release date: 2023-07-07

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 63.91% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 65.61% at FPR=5%.

Gaussian Extreme

Release date: 2025-05-17

I've committed detailed results of this detector's performance on the test set to this PR.

On the RAID dataset as a whole (aggregated across all generation models, domains, decoding strategies, repetition penalties, and adversarial attacks), it achieved a TPR of 97.10% at FPR=5%.
Without adversarial attacks, it achieved a TPR of 97.08% at FPR=5%.

FastDetectGPT

Release date: 2023-10-08

I've committed detailed results of this detector's performance on the test set to this PR.

Warning

No aggregate score across all settings is reported here as some domains/generator models/decoding strategies/repetition penalties/adversarial attacks were not included in the submission. This submission will not appear in the main leaderboard; it will only be visible within the splits in which all samples were evaluated.

Warning

No aggregate score across all non-adversarial settings is reported here as some domains/generator models/decoding strategies/repetition penalties were not included in the submission.
If all looks well, a maintainer will come by soon to merge this PR and your entry/entries will appear on the leaderboard. If you need to make any changes, feel free to push new commits to this PR. Thanks for submitting to RAID!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants