| > [can’t] be reliably detected… only ~90% effective I’m surprised to see these comments in conjunction, 90% is pretty good, and much higher than i expected. I wonder what’s the breakdown of false positives/false negatives Edit: from the linked paper > Of the 90 samples in which AI was used, it correctly identified 77 of
them as having >1% AI generated text, an 86% success rate. The fact that the tool is
more accurate in identifying human-generated text than AI-generated text is by design.
The company realized that users would be unwilling to use a tool that produced
significant numbers of false positives, so they “tuned” the tool to give human writers the
benefit of the doubt. This all seems exceptionally reasonable. Of the samples with AI, they correctly identify 86%. Of the samples without AI, they correctly identify a higher proportion, because of the nature of their service. This implies that if they _wanted_ to make a more balanced AI detection tool, they could get that 86% somewhat higher. |
What standard of proof is appropriate to expel someone from college? After they've taken on, say, $40,000 of debt to attend?
Assuming you had a class of 100 students, "90% effective" would mean expelling 10 students wrongly - personally I'd expect a higher standard of proof.