Hacker News new | ask | show | jobs
by astrange 1508 days ago
That would be interesting if it was true, but I think it can’t be true because LLMs main advantage is they memorize text in their weights and so your discriminator model would need to be the same size as the LLM.

That said the smaller GPT3 models break down quite often so they’re probably detectable.

1 comments

In the same way we can train models that can identify people from their choice of words, phrasing, grammar, etc, we can train models that identify other models.
That's anthropomorphizing them - a large language model doesn't have a bottleneck the same way a human does (in terms of being able to express things), it can get on a path where it just outputs memorized text directly and it won't be consistent with what it usually seems to know at all.

Also, you could break a discriminator model by running a filter over the output that changes a few words around or misspells things, etc. Basically an adversarial attack.

I agree it is not exactly the same as a human, but the content it produces is based on its specific training data, how it was fed the training data, how long it was trained, the size and shape of the network, etc. These are unique characteristics of a model that directly impact what it produces. A model could have a unique proclivity for using specific groups of words, for example.

But yes, you could break the discriminator model, in the same way people disguise their own writing patterns by using synonyms, making different grammar/syntax choices, etc. Building a better evader and building a better detector is an eternal cat and mouse game, but it doesn't reduce the need to participate in this game.