There are situations where the model output being watermarked doesn't matter. For instance, I hear people on HN asking LLMs to explain things to them all the time, (which I think is a bad idea, but YMMV) and people use LLMs to write code quickly. (which I think is at least possibly a good idea) There are also some content farms which churn out low quality books on Amazon on various topics, and I don't think they care if they get caught using LLM outputs.
Thus it might reduce usage some, but it certainly wouldn't block all usage. Additionally, there are only a few providers of truly massive LLMs on the market right now. If they decided that doing this would be a social good, or more likely that it would bring bad PR to not do this when their competitors do, then they would at least be able to watermark all of the massive LLM outputs.
> There will always be a good offering who doesn't water mark
There's a possible future where this gets legislated, right? Of course, there are lots of implementation challenges to this and it's probably a bad idea...
>There will always be a good offering who doesn't water mark.
I wouldn't bet on that! I can see legislation to require this for many reasons ... related to intellectual property, cheating, detecting the root of hate-speech or harassment, "stealing" from employers by not performing work or putting them at legal risk, "stealing" from artists by duplicating their style, political speech that can not be traced (it could be from a bad actor!), tracking down generated revenge porn (or much worse!), tracking down people using LLMs to grift the elderly, and on and on. Why, if you are not using a watermarked LLM, it could be an op by Russia, China, or Iran! In fact, part of the legislation could be a requirement of social media or office tools or government tools or political tools or educational tools to check for a watermark and not work if an approved one is not found. Ideally this list will be private, because you want companies to be able to automate away workers, and do the least possible for customers, you just want to make sure you're doing it above-board, you see.
>And there is no good reason for a provider to watermark - they aren't helping the customer.
No one cares about customers, they care about money. And you know what helps make a lot of money? A legally defined moat for yourself and a couple of others that blocks anyone else.
>They'd be helping some other party who isn't paying them.
I'm totally happy having huge amounts of my use of llms identifiable as from an llm. I don't see many important cases for me where I need to pretend it wasn't from an llm.
I will happily lose those cases for increased performance, that's the thing I care about.
Are there normal cases where you picture this as an issue?
And I am not against LLM output being identifiable as such. (although I think an argument could be made based on the ruling about the monkey and the camera, which IIRC would say that the copyright belongs to whoever created the situation).
But after the
1. British Post Office scandal and
2. some really high profile cases of education institutions here in Norway abusing plagiarism detectors
I do not feel ready to trust neither
1. complex software (and especially not closed sourced software) to tell us who is cheating or not
2. nor any humans ability to use such a system in a sensible way
While cheating isn't usually criminal court, students also usually does not get a free defense.
For this reason I suggest cheating should have to be proven to have occurred, not "suggested to probably have occurred" by the same people who creates the not very reliable and extremely hard-to-reproduce LLMs.
Increased performance? Watermarking will not increase performance. They are talking about tilting the decoding process in minor ways. It won't help (or hurt much) performance.
Thus it might reduce usage some, but it certainly wouldn't block all usage. Additionally, there are only a few providers of truly massive LLMs on the market right now. If they decided that doing this would be a social good, or more likely that it would bring bad PR to not do this when their competitors do, then they would at least be able to watermark all of the massive LLM outputs.