| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nullc 1165 days ago

This is fairly theoretical work. It assumes that the parties (and the adversary) know the distribution precisely.

Its direct practical may be potentially somewhat limited because people aren't going around communicating randomly selected LLM outputs... and if you use LLM output in a context where text would be expected it could be distinguished.

It's not useful for watermarking as the first change will destroy all the rest of the embedding.

I can make a contrived example where it's directly useful: Imagine you have agents in the field, you could send out LLM generated spam to communicate with them. Everyone expects the spam to be LLM generated, so it's not revealing that its detectable as such. This work discusses how you can make the spam carry secret messages to the agents in a way that is impossible to detect (without the key, of course) even by an attacker that has the exact spam LLM.

Less contrived, a sufficiently short message from a sufficiently advanced LLM is probably indistinguishable from a real post in practice. -- but that's outside of the scope of the paper. It's hard (impossible?) to rigorously analyze that security model because we can't say much about the distribution of "real text" so we can't say how far from it LLM output is. The best models we have of the distribution of real text are these LLMs, if you take them to BE the distribution of real text then that approach is perfectly secure by definition.

But really, even if the situation it solves is too contrived to be a gain over alternatives, the context provides an opportunity to explore the boundary of what is possible.