Skip to content

lynote-ai/AIImageDetector

Repository files navigation

AI Image Detector

A small, friendly open-source detector for AI-generated images. It is designed in the yt-dlp / rembg spirit: install it, run one command, get a probability and a reproducible report.

AI image detection is probabilistic. Treat the output as one signal, not as proof.

Model Choice

The default backend is UnivFD / UniversalFakeDetect: CLIP ViT-L/14 image features plus a tiny linear fake/real head. This is a strong practical default because the task-specific weight is tiny, the code path is understandable, and the CVPR 2023 paper showed good cross-generator generalization compared with older GAN-trained detectors.

This repo also ships a hybrid backend that blends UnivFD with a lightweight Hugging Face image classifier. It is useful when you want a stronger practical ensemble without training a new detector from scratch.

Recent research has moved further. AIDE combines CLIP semantics with low-level frequency/noise features and reports gains on GenImage and AIGCDetectBenchmark. That is a good research target for a future backend, but UnivFD is currently the simplest robust default for an installable open-source tool.

Useful references:

Install

Use Python 3.10+.

python -m venv .venv
source .venv/bin/activate
pip install -e .

Optional extras:

pip install -e '.[eval]'      # Hugging Face dataset benchmarks
pip install -e '.[hf]'        # generic Hugging Face image-classification backend
pip install -e '.[api]'       # FastAPI server
pip install -e '.[web]'       # Gradio UI
pip install -e '.[dev]'       # tests and linting

CLI Usage

Detect one image:

aidetect detect image.jpg

Detect a folder recursively:

aidetect detect ./images --csv report.csv

JSON lines output:

aidetect detect ./images --json

Use a Hugging Face image-classification model instead of UnivFD:

aidetect detect image.jpg --backend hf --hf-model capcheck/ai-image-detection

Use the hybrid backend:

aidetect detect image.jpg --backend hybrid --hybrid-univfd-weight 0.8

Python API

from aidetector import create_detector

detector = create_detector("univfd", device="auto")
result = detector.predict_path("image.jpg")
print(result.as_dict())

Web UI

pip install -e '.[web]'
aidetect serve

FastAPI

pip install -e '.[api]'
aidetect api --host 127.0.0.1 --port 8000

Then call:

curl -F "file=@image.jpg" http://127.0.0.1:8000/detect

Benchmarks

Evaluate a GenImage-style folder where nature/ contains real images and ai/ contains generated images:

aidetect benchmark-folder /path/to/GenImage/Midjourney/val \
  --real-dir nature \
  --fake-dir ai \
  --output benchmarks/midjourney-val.json

Evaluate a Hugging Face dataset such as Tiny-GenImage:

pip install -e '.[eval]'
aidetect benchmark-hf TheKernel01/Tiny-GenImage \
  --split validation \
  --image-field image \
  --label-field label \
  --fake-label 1 \
  --max-samples 200 \
  --output benchmarks/tiny-genimage-univfd-200.json

The JSON report includes accuracy, balanced accuracy, precision, recall, F1, ROC AUC, confusion counts, a diagnostic threshold sweep, model metadata, dataset metadata, and per-image predictions.

For more defensible evaluation, calibrate a threshold on one split and evaluate on another:

aidetect benchmark-calibrated-folder /path/to/exported-folder \
  --backend univfd \
  --output benchmarks/univfd-calibrated.json

For multi-shard Tiny-GenImage evaluation with per-generator slices:

aidetect benchmark-tiny-genimage-local \
  /path/to/validation-00000-of-00004.parquet \
  /path/to/validation-00001-of-00004.parquet \
  /path/to/validation-00002-of-00004.parquet \
  /path/to/validation-00003-of-00004.parquet \
  --backend hybrid \
  --optimize-metric f1 \
  --max-per-class-per-shard 100 \
  --output benchmarks/tiny-genimage-hybrid-multishard-800-f1.json

If Hugging Face dataset metadata requests are flaky, you can work from a local Tiny-GenImage parquet shard:

aidetect prepare-tiny-genimage .cache/tiny-genimage-validation-200 \
  --local-parquet /path/to/validation-00000-of-00004.parquet \
  --max-per-class 100

aidetect benchmark-calibrated-folder .cache/tiny-genimage-validation-200 \
  --backend univfd \
  --real-dir real \
  --fake-dir ai \
  --output benchmarks/tiny-genimage-univfd-calibrated-200.json

Current local benchmark evidence is split into two levels.

Smoke benchmark on Tiny-GenImage validation shard data/validation-00000-of-00004.parquet, 20 real + 20 fake images:

Backend Threshold Accuracy Balanced Acc F1 ROC AUC Images/s
UnivFD / CLIP ViT-L/14 0.5 0.500 0.500 0.000 0.715 2.31
capcheck/ai-image-detection 0.5 0.600 0.600 0.692 0.743 32.03

Calibrated hold-out benchmark on the same shard family, exported as 100 real + 100 fake images and split deterministically into calibration/test sets:

Backend Calibration Test Accuracy Test Balanced Acc Test F1 Test ROC AUC
UnivFD / CLIP ViT-L/14 threshold-only 0.760 0.760 0.721 0.811
Hybrid (UnivFD 0.8 + HF 0.2) threshold + blend weight 0.670 0.670 0.629 0.752
capcheck/ai-image-detection threshold-only 0.580 0.580 0.580 0.610

Interpretation:

  • The 40-image run is only a smoke test.
  • The 200-image calibrated split is a stronger local benchmark because threshold selection happens on a separate calibration split before the test split is scored.
  • It is still not a publication-grade claim. It is one shard, one deterministic split, and one local environment.
  • These calibrated runs were executed on CPU in this workspace.

Current strongest local benchmark, calibrated on 4 Tiny-GenImage validation shards with up to 100 real + 100 fake images sampled per shard:

Backend Test N Test Accuracy Test Balanced Acc Precision Recall Test F1 Test ROC AUC
Hybrid (UnivFD 0.85 + HF 0.15), optimize=f1 400 0.773 0.773 0.779 0.760 0.770 0.843
Hybrid (UnivFD 0.85 + HF 0.15), optimize=balanced_accuracy 400 0.745 0.745 0.802 0.650 0.718 0.843
UnivFD / CLIP ViT-L/14 300 0.690 0.690 0.806 0.500 0.617 0.784

Selected generator-vs-real slices from that same held-out split:

Generator N Accuracy Balanced Acc F1 ROC AUC
BigGAN vs Real 231 0.810 0.876 0.577 0.962
ADM vs Real 232 0.810 0.877 0.585 0.974
VQDM vs Real 224 0.804 0.872 0.511 0.941
GLIDE vs Real 227 0.775 0.744 0.427 0.805
Wukong vs Real 228 0.776 0.750 0.440 0.815
SD15 vs Real 228 0.768 0.714 0.404 0.763
Midjourney vs Real 230 0.730 0.576 0.262 0.638

This is the honest picture: switching the calibration objective to f1 gives us the strongest thresholded result so far, with a materially better precision / recall balance than pure UnivFD. It also lifts weaker generators such as Midjourney and SD15, though they remain much harder than ADM, BigGAN, or VQDM. This is still not a universal detector guarantee.

Model Weights

On first use, the UnivFD backend downloads:

  • CLIP ViT-L/14 OpenAI weights through open_clip_torch
  • UniversalFakeDetect linear head from siddharthksah/deepsafe-weights/universalfakedetect/fc_weights.pth

You can also pass a local head checkpoint:

aidetect detect image.jpg --weight-path ./fc_weights.pth

Development

pip install -e '.[dev,eval,hf,api]'
pytest
ruff check .

Limitations

  • No detector is universal. New generators, heavy recompression, screenshots, crops, edits, upscaling, and adversarial post-processing can change results.
  • Benchmarks can overstate real-world reliability if the deployment data differs from the benchmark distribution.
  • The tool currently detects whole-image synthetic likelihood. It does not localize edited regions.

Citation

If this helps your work, cite the original UnivFD paper:

@InProceedings{Ojha_2023_CVPR,
  author = {Ojha, Utkarsh and Li, Yuheng and Lee, Yong Jae},
  title = {Towards Universal Fake Image Detectors That Generalize Across Generative Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2023},
  pages = {24480-24489}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages