Hacker News new | ask | show | jobs
by itishappy 751 days ago
> A watermark is easy to implement and prove if your content gets picked up in the next training set.

Is it? How would you watermark raw text? Images maybe, but I'm skeptical even there.

My high-school cousins tell me kids use one AI to write these days, and another to rewrite it to avoid AI detectors. I view fighting against this as a Sisyphean task.

1 comments

Sure. If you have a really small site then it is possible that your data will never be picked up.

However, if it does get picked up then the watermark can be something as simple as a fake concept. For example, “Who is the Siberian Spectral Parrot?” - you can even present this as an alter-ego or something, so you don’t need to hide it from your users. Creativity is really the limit here.

And I think there has been evidence that ChatGPT had picked up small things like Reddit usernames.

But I am open to walking my statement back on it being easy. Maybe you do need a lot more references for some information to be included.

It’s also possible you thought I meant watermarking every content piece. I am talking about a site-wide content license.