| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kimburgess 1208 days ago
	Watermarking. From an outsiders perspective, the issue appears to reaching consensus on how this can be implemented (but not in the technical sense). There's a game theoretic challenge in that if models define and publish detection mechanisms, this creates a motivation for people to use other systems that don't include this. On the technical front there's a good paper here: https://arxiv.org/pdf/2301.10226.pdf, and a nice very approachable video explaining it here: https://www.youtube.com/watch?v=XZJc1p6RE78.

1 comments

simonh 1208 days ago

The problem with watermarking like this, which is incredibly clever, is it’s trivial to break. All you have to do is change one word in the text, and the watermarking of all subsequent tokens is spoiled. So if you change the first word, or rephrase the first sentence, or extract text from the middle or end of a response, the watermark is completely spoiled.

link

amelius 1208 days ago

There can be redundancy in the watermark, meaning you'll have to change more than one word. See e.g. how error-correcting codes work.

link

kimburgess 1208 days ago

There are definitely paths of attack. The trivial ones that you call out - insertion, deletion, substitution - are covered in section 7 of that paper (along with mitigations).

link