| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simonh 1209 days ago
	The problem with watermarking like this, which is incredibly clever, is it’s trivial to break. All you have to do is change one word in the text, and the watermarking of all subsequent tokens is spoiled. So if you change the first word, or rephrase the first sentence, or extract text from the middle or end of a response, the watermark is completely spoiled.

2 comments

amelius 1209 days ago

There can be redundancy in the watermark, meaning you'll have to change more than one word. See e.g. how error-correcting codes work.

link

kimburgess 1209 days ago

There are definitely paths of attack. The trivial ones that you call out - insertion, deletion, substitution - are covered in section 7 of that paper (along with mitigations).

link