|
|
|
|
|
by obblekk
991 days ago
|
|
For written text, the problem may be even harder. Identifying the human author of text is a field called "stylometry" but this result shows that some simple transformations reduce the success to random chance [1]. Similarly, I suspect watermarking LLM output is probably unworkable. The output of a smart model could be de-watermarked by fine tuning a dumb open source model on the initial output, and then regenerating the original output token by token, selecting alternate words whenever multiple completions have close probabilities and semantically equivalent. It would be a bit tedious to perfectly dial in, but I suspect it could be done. And then ultimately, short text selections can have a lot of meaning with very little entropy to uniquely tag (e.g., covfefe). [1] https://dl.acm.org/doi/abs/10.1145/2382448.2382450 Curious if Scott Aaronson solved this challenge... |
|
Current LLMs have stylistic quirks imprinted on them by RLHF (ChatGPT's endless "it should be noted" and "it is important to remember that" verbiage is a good example), but they learned those from human writing.