|
|
|
|
|
by gojomo
1156 days ago
|
|
A watermark is absolutely possible - see for example some of the work Scott Aaronson has mentioned doing for OpenAI. But: very fragile, especially if people are specifically trying to hide their GPT use, or have access to the watermarking algorithm or online oracle. And: other methods – like remembering all output ever, or fuzzy summary representations of all output ever – seem to me similarly fragile, & introduce other problems & impracticalities. A guess: OpenAI internally initially shared the common concern that "consuming its own junk outputs" could be a problem. But their own experiments so far, private & public, may have convinced them it's not as much of a problem in practice as it seems in theory. The model outputs have a mix of good and bad text – just like the pre-LLM internet. And, the same filterings/weightings that have worked on pre-LLM content keep working. And, counter to some early intuitions, often one LLM's quality output is in fact very-useful input for other later LLMs. |
|