You can't detect LLM output with any reasonable rate. You'd have both false positives and false negatives all over the place. If you solve that part on its own, that will be a SOTA method.
This is a dangerous falsehood. OpenAI's since-cancelled polygraph had a 9% rate of false positives, and a 26% rate of true positive. If I can lose a quarter of toxic bytes and need to enable JavaScript on one site in ten? Count me in!
Then don't use any website - 100% false positives. But seriously, it's a 9% rate for specific models at the time. It's a cat and mouse game and any fine tuning or a new release will throw it off. Also they don't say which 9% was misclassified, but I suspect it's the most important ones - the well written articles. If I see a dumb tweet with a typo it's unlikely to come from LLM (and if it does, who cares), but a well written long form article may have been slightly edited with LLM and get caught. The 9% is not evenly distributed.
It was a cat and mouse game before, spam always is. The inevitable reality that spam is a slog of a war isn’t a good argument for giving up.
I don’t know the current meta on LLM vs LLM detector, but if I had to pick one job or the other, I’d rather train a binary classifier to detect a giant randomized nucleus sampling decoder thing than fool a binary classifier with said Markov process thing.
Please don’t advocate for giving up on spam, that affects us all.
I want more false positives.
https://openai.com/index/new-ai-classifier-for-indicating-ai...