AI Image Detector

A small, friendly open-source detector for AI-generated images. It is designed in the yt-dlp / rembg spirit: install it, run one command, get a probability and a reproducible report.

AI image detection is probabilistic. Treat the output as one signal, not as proof.

Model Choice

The default backend is UnivFD / UniversalFakeDetect: CLIP ViT-L/14 image features plus a tiny linear fake/real head. This is a strong practical default because the task-specific weight is tiny, the code path is understandable, and the CVPR 2023 paper showed good cross-generator generalization compared with older GAN-trained detectors.

This repo also ships a hybrid backend that blends UnivFD with a lightweight Hugging Face image classifier. It is useful when you want a stronger practical ensemble without training a new detector from scratch.

Recent research has moved further. AIDE combines CLIP semantics with low-level frequency/noise features and reports gains on GenImage and AIGCDetectBenchmark. That is a good research target for a future backend, but UnivFD is currently the simplest robust default for an installable open-source tool.

Useful references:

UniversalFakeDetect paper: https://openaccess.thecvf.com/content/CVPR2023/html/Ojha_Towards_Universal_Fake_Image_Detectors_That_Generalize_Across_Generative_Models_CVPR_2023_paper.html
UniversalFakeDetect code: https://github.com/WisconsinAIVision/UniversalFakeDetect
AIDE paper: https://arxiv.org/abs/2406.19435
GenImage benchmark: https://github.com/GenImage-Dataset/GenImage
Tiny-GenImage runnable subset: https://huggingface.co/datasets/TheKernel01/Tiny-GenImage
CIFAKE benchmark paper: https://huggingface.co/papers/2303.14126

Install

Use Python 3.10+.

python -m venv .venv
source .venv/bin/activate
pip install -e .

Optional extras:

pip install -e '.[eval]'      # Hugging Face dataset benchmarks
pip install -e '.[hf]'        # generic Hugging Face image-classification backend
pip install -e '.[api]'       # FastAPI server
pip install -e '.[web]'       # Gradio UI
pip install -e '.[dev]'       # tests and linting

CLI Usage

Detect one image:

aidetect detect image.jpg

Detect a folder recursively:

aidetect detect ./images --csv report.csv

JSON lines output:

aidetect detect ./images --json

Use a Hugging Face image-classification model instead of UnivFD:

aidetect detect image.jpg --backend hf --hf-model capcheck/ai-image-detection

Use the hybrid backend:

aidetect detect image.jpg --backend hybrid --hybrid-univfd-weight 0.8

Python API

from aidetector import create_detector

detector = create_detector("univfd", device="auto")
result = detector.predict_path("image.jpg")
print(result.as_dict())

Web UI

pip install -e '.[web]'
aidetect serve

FastAPI

pip install -e '.[api]'
aidetect api --host 127.0.0.1 --port 8000

Then call:

curl -F "file=@image.jpg" http://127.0.0.1:8000/detect

Benchmarks

Evaluate a GenImage-style folder where nature/ contains real images and ai/ contains generated images:

aidetect benchmark-folder /path/to/GenImage/Midjourney/val \
  --real-dir nature \
  --fake-dir ai \
  --output benchmarks/midjourney-val.json

Evaluate a Hugging Face dataset such as Tiny-GenImage:

pip install -e '.[eval]'
aidetect benchmark-hf TheKernel01/Tiny-GenImage \
  --split validation \
  --image-field image \
  --label-field label \
  --fake-label 1 \
  --max-samples 200 \
  --output benchmarks/tiny-genimage-univfd-200.json

The JSON report includes accuracy, balanced accuracy, precision, recall, F1, ROC AUC, confusion counts, a diagnostic threshold sweep, model metadata, dataset metadata, and per-image predictions.

For more defensible evaluation, calibrate a threshold on one split and evaluate on another:

aidetect benchmark-calibrated-folder /path/to/exported-folder \
  --backend univfd \
  --output benchmarks/univfd-calibrated.json

For multi-shard Tiny-GenImage evaluation with per-generator slices:

aidetect benchmark-tiny-genimage-local \
  /path/to/validation-00000-of-00004.parquet \
  /path/to/validation-00001-of-00004.parquet \
  /path/to/validation-00002-of-00004.parquet \
  /path/to/validation-00003-of-00004.parquet \
  --backend hybrid \
  --optimize-metric f1 \
  --max-per-class-per-shard 100 \
  --output benchmarks/tiny-genimage-hybrid-multishard-800-f1.json

If Hugging Face dataset metadata requests are flaky, you can work from a local Tiny-GenImage parquet shard:

aidetect prepare-tiny-genimage .cache/tiny-genimage-validation-200 \
  --local-parquet /path/to/validation-00000-of-00004.parquet \
  --max-per-class 100

aidetect benchmark-calibrated-folder .cache/tiny-genimage-validation-200 \
  --backend univfd \
  --real-dir real \
  --fake-dir ai \
  --output benchmarks/tiny-genimage-univfd-calibrated-200.json

Current local benchmark evidence is split into two levels.

Smoke benchmark on Tiny-GenImage validation shard data/validation-00000-of-00004.parquet, 20 real + 20 fake images:

Backend	Threshold	Accuracy	Balanced Acc	F1	ROC AUC	Images/s
UnivFD / CLIP ViT-L/14	0.5	0.500	0.500	0.000	0.715	2.31
capcheck/ai-image-detection	0.5	0.600	0.600	0.692	0.743	32.03

Calibrated hold-out benchmark on the same shard family, exported as 100 real + 100 fake images and split deterministically into calibration/test sets:

Backend	Calibration	Test Accuracy	Test Balanced Acc	Test F1	Test ROC AUC
UnivFD / CLIP ViT-L/14	threshold-only	0.760	0.760	0.721	0.811
Hybrid (UnivFD 0.8 + HF 0.2)	threshold + blend weight	0.670	0.670	0.629	0.752
capcheck/ai-image-detection	threshold-only	0.580	0.580	0.580	0.610

Interpretation:

The 40-image run is only a smoke test.
The 200-image calibrated split is a stronger local benchmark because threshold selection happens on a separate calibration split before the test split is scored.
It is still not a publication-grade claim. It is one shard, one deterministic split, and one local environment.
These calibrated runs were executed on CPU in this workspace.

Current strongest local benchmark, calibrated on 4 Tiny-GenImage validation shards with up to 100 real + 100 fake images sampled per shard:

Backend	Test N	Test Accuracy	Test Balanced Acc	Precision	Recall	Test F1	Test ROC AUC
Hybrid (UnivFD 0.85 + HF 0.15), `optimize=f1`	400	0.773	0.773	0.779	0.760	0.770	0.843
Hybrid (UnivFD 0.85 + HF 0.15), `optimize=balanced_accuracy`	400	0.745	0.745	0.802	0.650	0.718	0.843
UnivFD / CLIP ViT-L/14	300	0.690	0.690	0.806	0.500	0.617	0.784

Selected generator-vs-real slices from that same held-out split:

Generator	N	Accuracy	Balanced Acc	F1	ROC AUC
BigGAN vs Real	231	0.810	0.876	0.577	0.962
ADM vs Real	232	0.810	0.877	0.585	0.974
VQDM vs Real	224	0.804	0.872	0.511	0.941
GLIDE vs Real	227	0.775	0.744	0.427	0.805
Wukong vs Real	228	0.776	0.750	0.440	0.815
SD15 vs Real	228	0.768	0.714	0.404	0.763
Midjourney vs Real	230	0.730	0.576	0.262	0.638

This is the honest picture: switching the calibration objective to f1 gives us the strongest thresholded result so far, with a materially better precision / recall balance than pure UnivFD. It also lifts weaker generators such as Midjourney and SD15, though they remain much harder than ADM, BigGAN, or VQDM. This is still not a universal detector guarantee.

Model Weights

On first use, the UnivFD backend downloads:

CLIP ViT-L/14 OpenAI weights through open_clip_torch
UniversalFakeDetect linear head from siddharthksah/deepsafe-weights/universalfakedetect/fc_weights.pth

You can also pass a local head checkpoint:

aidetect detect image.jpg --weight-path ./fc_weights.pth

Development

pip install -e '.[dev,eval,hf,api]'
pytest
ruff check .

Limitations

No detector is universal. New generators, heavy recompression, screenshots, crops, edits, upscaling, and adversarial post-processing can change results.
Benchmarks can overstate real-world reliability if the deployment data differs from the benchmark distribution.
The tool currently detects whole-image synthetic likelihood. It does not localize edited regions.

Citation

If this helps your work, cite the original UnivFD paper:

@InProceedings{Ojha_2023_CVPR,
  author = {Ojha, Utkarsh and Li, Yuheng and Lee, Yong Jae},
  title = {Towards Universal Fake Image Detectors That Generalize Across Generative Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2023},
  pages = {24480-24489}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.omx		.omx
aidetector		aidetector
benchmarks		benchmarks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai-image-detector-mvp.zip		ai-image-detector-mvp.zip
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Image Detector

Model Choice

Install

CLI Usage

Python API

Web UI

FastAPI

Benchmarks

Model Weights

Development

Limitations

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Image Detector

Model Choice

Install

CLI Usage

Python API

Web UI

FastAPI

Benchmarks

Model Weights

Development

Limitations

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages