Hacker News new | ask | show | jobs
by jerf 1058 days ago
I think for this sort of problem it is more productive to think in terms of the amount of text necessary for detection, and how reliable such a detection would be, than a binary can/can't. I think similarly for how "photorealistic" a particular graphics tech is; many techs have already long passed the point where I can tell at 320x200 but they're not necessarily all there yet at 4K.

LLMs clearly pass the single sentence test. If you generate far more text than their window, I'm pretty sure they'd clearly fail as they start getting repetitive or losing track of what they've written. In between, it varies depending on how much text you get to look at. A single paragraph is pretty darned hard. A full essay starts becoming something I'm more confident in my assessment.

It's also worth reminding people that LLMs are more than just "ChatGPT in its standard form". As a human trying to do bot detection sometimes, I've noticed some tells in ChatGPT's "standard voice" which almost everyone is still using, but once people graduate from "Write a blog post about $TOPIC related to $LANGUAGE" to "Write a blog post about $TOPIC related to $LANGUAGE in the style of Ernest Hemmingway" in their prompts it's going to become very difficult to tell by style alone.