| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by obblekk 991 days ago

For written text, the problem may be even harder. Identifying the human author of text is a field called "stylometry" but this result shows that some simple transformations reduce the success to random chance [1].

Similarly, I suspect watermarking LLM output is probably unworkable. The output of a smart model could be de-watermarked by fine tuning a dumb open source model on the initial output, and then regenerating the original output token by token, selecting alternate words whenever multiple completions have close probabilities and semantically equivalent. It would be a bit tedious to perfectly dial in, but I suspect it could be done.

And then ultimately, short text selections can have a lot of meaning with very little entropy to uniquely tag (e.g., covfefe).

[1] https://dl.acm.org/doi/abs/10.1145/2382448.2382450

Curious if Scott Aaronson solved this challenge...

2 comments

COAGULOPATH 991 days ago

The idea of telling a human generated "the quick brown fox..." from a machine-generated one was always a fantasy. Text has no birthmark.

Current LLMs have stylistic quirks imprinted on them by RLHF (ChatGPT's endless "it should be noted" and "it is important to remember that" verbiage is a good example), but they learned those from human writing.

link

kromem 991 days ago

Also, most stylometry work isn't well fitted to active attempts to forge another author, and is more about distinguishing authorship in works with uncertain attribution.

link