|
|
|
|
|
by dogma1138
1235 days ago
|
|
Sort off, I’m aware that in principle the generator has an advantage and eventually the detector will average out to a coin flip at best. However some advantages can disappear when you put constraints on the output such as quality and correctness. So whilst the end result might be less statistically significant in terms of was it human or AI generated it can overall be also less useful to the end user. |
|
Only if you suppose that the ideal output is superhuman. In the case of OpenAI et al, that's arguably the case, but those aren't the players that are going to get into an arms race with detection anyway. They want it to be relatively easy to detect AI generated content, because they're not in the plagiarism business, and anti-plagiarism measures will get the public and media off their backs. And nobody who is interested in targeting plagiarism has nearly the funding to build their own LLM on a level that matters.
So if there's an arms race in the near term, I expect it will be with postprocessors instead. These will be much smaller models (i.e. runs in your browser, or at least on a small backend machine) that take the output of ChatGPT and tweak it to fool detectors. They won't care about maximizing quality or accuracy, but will just care about preserving meaning while erasing statistical signs of AI generation.
I don't know if the business case for that will be there. It's there for selling papers, and almost certainly some people will try their hand at these models just for the challenge and/or to prove a point.