Hacker News new | ask | show | jobs
by kimburgess 1208 days ago
Watermarking. From an outsiders perspective, the issue appears to reaching consensus on how this can be implemented (but not in the technical sense). There's a game theoretic challenge in that if models define and publish detection mechanisms, this creates a motivation for people to use other systems that don't include this.

On the technical front there's a good paper here: https://arxiv.org/pdf/2301.10226.pdf, and a nice very approachable video explaining it here: https://www.youtube.com/watch?v=XZJc1p6RE78.

1 comments

The problem with watermarking like this, which is incredibly clever, is it’s trivial to break. All you have to do is change one word in the text, and the watermarking of all subsequent tokens is spoiled. So if you change the first word, or rephrase the first sentence, or extract text from the middle or end of a response, the watermark is completely spoiled.
There can be redundancy in the watermark, meaning you'll have to change more than one word. See e.g. how error-correcting codes work.
There are definitely paths of attack. The trivial ones that you call out - insertion, deletion, substitution - are covered in section 7 of that paper (along with mitigations).