| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danielmarkbruce 594 days ago
	Why would anyone ever use such a model? And then, given the significant reduction in users, why would any closed model service do this? Seems like a cool theoretical trick that has little practical implication.

2 comments

genrilz 594 days ago

There are situations where the model output being watermarked doesn't matter. For instance, I hear people on HN asking LLMs to explain things to them all the time, (which I think is a bad idea, but YMMV) and people use LLMs to write code quickly. (which I think is at least possibly a good idea) There are also some content farms which churn out low quality books on Amazon on various topics, and I don't think they care if they get caught using LLM outputs.

Thus it might reduce usage some, but it certainly wouldn't block all usage. Additionally, there are only a few providers of truly massive LLMs on the market right now. If they decided that doing this would be a social good, or more likely that it would bring bad PR to not do this when their competitors do, then they would at least be able to watermark all of the massive LLM outputs.

link

danielmarkbruce 594 days ago

You say that as though there isn't a choice though. There will always be a good offering who doesn't water mark.

And there is no good reason for a provider to watermark - they aren't helping the customer. They'd be helping some other party who isn't paying them.

This will never be a thing.

link

alach11 594 days ago

> There will always be a good offering who doesn't water mark

There's a possible future where this gets legislated, right? Of course, there are lots of implementation challenges to this and it's probably a bad idea...

link

danielmarkbruce 593 days ago

Sure, it's possible. I'd lay 100-1 against though.

link

mhuffman 594 days ago

>There will always be a good offering who doesn't water mark.

I wouldn't bet on that! I can see legislation to require this for many reasons ... related to intellectual property, cheating, detecting the root of hate-speech or harassment, "stealing" from employers by not performing work or putting them at legal risk, "stealing" from artists by duplicating their style, political speech that can not be traced (it could be from a bad actor!), tracking down generated revenge porn (or much worse!), tracking down people using LLMs to grift the elderly, and on and on. Why, if you are not using a watermarked LLM, it could be an op by Russia, China, or Iran! In fact, part of the legislation could be a requirement of social media or office tools or government tools or political tools or educational tools to check for a watermark and not work if an approved one is not found. Ideally this list will be private, because you want companies to be able to automate away workers, and do the least possible for customers, you just want to make sure you're doing it above-board, you see.

>And there is no good reason for a provider to watermark - they aren't helping the customer.

No one cares about customers, they care about money. And you know what helps make a lot of money? A legally defined moat for yourself and a couple of others that blocks anyone else.

>They'd be helping some other party who isn't paying them.

Yes! That party is themselves!

link

danielmarkbruce 593 days ago

It seems extremely unlikely in the US that such legislation could come about.

Even then, open source will almost certainly always exist. Services running offshore will exist. It would seem impossible to enforce.

link

IanCal 594 days ago

I'm totally happy having huge amounts of my use of llms identifiable as from an llm. I don't see many important cases for me where I need to pretend it wasn't from an llm.

I will happily lose those cases for increased performance, that's the thing I care about.

Are there normal cases where you picture this as an issue?

link

eitland 594 days ago

Not a problem for me. I am not a student anymore.

And I am not against LLM output being identifiable as such. (although I think an argument could be made based on the ruling about the monkey and the camera, which IIRC would say that the copyright belongs to whoever created the situation).

But after the

1. British Post Office scandal and

2. some really high profile cases of education institutions here in Norway abusing plagiarism detectors

I do not feel ready to trust neither

1. complex software (and especially not closed sourced software) to tell us who is cheating or not

2. nor any humans ability to use such a system in a sensible way

While cheating isn't usually criminal court, students also usually does not get a free defense.

For this reason I suggest cheating should have to be proven to have occurred, not "suggested to probably have occurred" by the same people who creates the not very reliable and extremely hard-to-reproduce LLMs.

link

danielmarkbruce 594 days ago

Increased performance? Watermarking will not increase performance. They are talking about tilting the decoding process in minor ways. It won't help (or hurt much) performance.

link

IanCal 592 days ago

Increased relevant to other providers of different llms. So I'd pick watermarked X over non-watermarked y if x performs better than Y.

link