|
|
|
|
|
by mistercow
1235 days ago
|
|
"However some advantages can disappear when you put constraints on the output such as quality and correctness." Only if you suppose that the ideal output is superhuman. In the case of OpenAI et al, that's arguably the case, but those aren't the players that are going to get into an arms race with detection anyway. They want it to be relatively easy to detect AI generated content, because they're not in the plagiarism business, and anti-plagiarism measures will get the public and media off their backs. And nobody who is interested in targeting plagiarism has nearly the funding to build their own LLM on a level that matters. So if there's an arms race in the near term, I expect it will be with postprocessors instead. These will be much smaller models (i.e. runs in your browser, or at least on a small backend machine) that take the output of ChatGPT and tweak it to fool detectors. They won't care about maximizing quality or accuracy, but will just care about preserving meaning while erasing statistical signs of AI generation. I don't know if the business case for that will be there. It's there for selling papers, and almost certainly some people will try their hand at these models just for the challenge and/or to prove a point. |
|
Most humans can’t write say an essay to save their life.
And those who do write very well tend to have their own signature.
Whilst it’s not 100% accurate we’ve managed to fairly successfully attribute a lot of unknown works to specific authors based on their known works.
So if you create a generator that produces output equals to say top 1% of human authors I’m not entirely sure that you can get one that doesn’t have its own signature.
Because whilst as you said most humans produce output that is statistically indistinguishable from most other humans the output that tends to survive selection bias and become known works is quite distinguishable by definition.
So you don’t even need to get to superhuman capability you just need to get to a high enough output quality that it would limit the statistical search space from billions to millions or even thousands.