Hacker News new | ask | show | jobs
by simonh 1209 days ago
The problem with watermarking like this, which is incredibly clever, is it’s trivial to break. All you have to do is change one word in the text, and the watermarking of all subsequent tokens is spoiled. So if you change the first word, or rephrase the first sentence, or extract text from the middle or end of a response, the watermark is completely spoiled.
2 comments

There can be redundancy in the watermark, meaning you'll have to change more than one word. See e.g. how error-correcting codes work.
There are definitely paths of attack. The trivial ones that you call out - insertion, deletion, substitution - are covered in section 7 of that paper (along with mitigations).