Hacker News new | ask | show | jobs
by Ukv 612 days ago
> https://edintegrity.biomedcentral.com/articles/10.1007/s4097...

> GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.

One false positive out of only "five human-written samples", unless I'm misreading.

Say 50 papers are checked, with 5 being generated by AI. By the rates of GPTZero in the paper, 3 AI-generated papers would be correctly flagged and 9 human-written papers would incorrectly flagged. Meaning a flagged paper is only 25% likely to actually be AI-generated.

Realistically the sample size in the paper is just far too small to make any real conclusion one way or another, but I think people fail to appreciate the difference between false positive rate and false discovery rate.