| Hi HN, YCW24 company here, we just open-sourced an AI image detection model that beats the SOTA commercial detectors. AI-generated images/videos have become incredibly good in the last few months and are flooding the internet; being able to detect them reliably gives some power back to consumers and companies that care about high-quality, genuine content. Detecting AI-generated images is a very hard problem: there are many different techniques to generate images; there is image compression, noise, and other distortions that destroy generator artifacts; there's android phones applying auto-correction to images; etc. And none of the detectors we tried (sightengine.com, decopy.ai, etc.) works reliably even for basic examples (try it out with pirate Rick Astley made with Flux Kontext: https://imgur.com/a/iL3paE8). We released two models, the full version (~600M params) and a smaller version (~20M params) that can even run in your browser on mobile (see demo)! We've also put up code for running things locally or via an API (free but rate-limited) using javascript/node and python code. -------- Details -------- The full model was trained on 1M+ images that were scraped off the internet and the small model is a distillation. We're actively working on extending the dataset and further improving the models. Classification accuracy: sightengine.com seems to be the best commercial solution out there, as confirmed by this (https://arxiv.org/pdf/2404.14581) paper, which they also cite on their website. Of course, they cherry-picked the results and claim 98.3% accuracy while only achieving (still impressive) ~82.8% over the full dataset. I've downloaded the dataset used in the paper and tested my models against it. The code for running the tests as well as a usable version of the dataset (the original was a big pain to download from OneDrive) are included in the repo code. Here are our benchmark results for comparison:
Total samples: 144,088
Real images: 17,044
Synthetic images: 127,044
Average precision: 0.991 [threshold: 0.5]
Total accuracy: 0.864 PER-CATEGORY ACCURACY:
Real 0.875 (17,044 samples)
DALL-E_T2I 0.982 (16,110 samples)
DreamStudio_T2I 0.968 (16,278 samples)
Midjourney_T2I 0.961 (16,148 samples)
StarryAI_T2I 0.847 (13,515 samples)
DALL-E_IT2I 0.774 (16,665 samples)
DreamStudio_IT2I 0.666 (16,139 samples)
Midjourney_IT2I 0.897 (15,371 samples)
StarryAI_IT2I 0.805 (16,818 samples) [threshold: 0.65]
Total accuracy: 0.827 PER-CATEGORY ACCURACY:
Real 0.914 (17,044 samples)
DALL-E_T2I 0.971 (16,110 samples)
DreamStudio_T2I 0.949 (16,278 samples)
Midjourney_T2I 0.940 (16,148 samples)
StarryAI_T2I 0.803 (13,515 samples)
DALL-E_IT2I 0.698 (16,665 samples)
DreamStudio_IT2I 0.576 (16,139 samples)
Midjourney_IT2I 0.845 (15,371 samples)
StarryAI_IT2I 0.743 (16,818 samples) |