| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by preciz 724 days ago

You are DEEPLY WRONG on all issues you mentioned.

Open weight models do not require great investment. In fact I can run them on my 400 EUR computer.

Also why you want to regulate text output from machines in the name of "public good"? That's insanity.

2 comments

lewhoo 724 days ago

Why exactly is it insane ? To reliably differentiate (let's assume it's possible for the sake of argument) between "you made this" and "you didn't make this" or at least "a human made this" seems to carry mostly (if not only) benefits.

link

fragmede 724 days ago

the problem is your parenthetical - it's not possible, so attempting to do so isn't actually really possible. what's worse than a watermark? one that doesn't actually work.

link

lores 724 days ago

Open AI literally said they have a semi-resilient method with 99.9% accuracy. It will become full-resilient for practical purposes if all LLMs implement something similar.

link

jeanlucas 724 days ago

> Open AI literally said they have a semi-resilient method with 99.9% accuracy.

They also said many other things that never happened. And they never showed it. I bet $100 they do not have a semi-resilient method with 99.9% accuracy, especially with all the evolving issues around "human vs computer" made content.

I bet you also the `semi-` in the beginning leaves a lot of room for interpretation and they are not releasing this for more reasons than "our model is too good".

link

lores 724 days ago

I really don't see what's in it for them to brag about a non-existent feature that's not in their commercial interest when its non-implementation can be turned into a stick to beat them with, so I believe they have something, yes. I don't necessarily believe the 99.9%, but with that proviso I'll take your bet.

link

mike_hearn 723 days ago

The Verge doesn't report this, but other reports have said that the watermark is easily beatable by doing things like a Google Translate roundtrip, or asking the model to add emoji and then deleting them.

link

tivert 724 days ago

> the problem is your parenthetical - it's not possible, so attempting to do so isn't actually really possible. what's worse than a watermark? one that doesn't actually work.

If it's not possible to watermark, then just ban LLMs.

Tech people have this weird self-serving assumption that the tech must be developed and must used, and if it causes harms that can't be mitigated then we must accept the harm and live with it. It's really an anti-humanist, tech-first POV.

link

pona-a 723 days ago

The comment was referring to models close to the recent releases from Meta and Mistral, reaching up to 405B with performance competitive with large commercial vendors. These models absolutely can't be trained without significant investment, and their inference without a cloud provider isn't cheap either. As I had mentioned, nothing short of not having released the weights could have stopped the abuse, but still, a fraction of it could be deterred, hopefully adding up to a few billion less spam pages for search engines to serve back to you.

As for the rationality of watermarking itself, firstly I'd like to reiterate, no spam wave of this magnitude and undetectability has ever happened in the history of the web. A word processor cannot write a petabyte of propaganda on its own. A Markov chain can't generate anything convincing enough to fool a human. Transformer-based LLMs are the first of their kind and should be treated as such. There is no quick analogy or a rule of thumb to point to.

If statistical watermarking is proven to have sufficient recall and error, there'll be nothing to lose in implementing it. A demand already exists for detecting AI slop; half-working BERT classifiers and prejudiced human sniff tests already provide for it, with little incentive to reduce false positives. With watermarks, there'll be a less painful, more certain way to catch the worst offenders. Do you really think the same operations that produce papers with titles like "Sorry, as an AI model..." or papers with pieces of ChatGPT UI text will care to roundtrip translate or rewrite entire paragraphs?

We already had this exact dilemma back when email spammers tried Bayesian poisoning [0]. Turns out, it actually creates an identifiable pattern, if not for the system, then for the user on the other side. People will train themselves to look for oddly phrased sentences or the outright nonsense roundtripping produces, abrupt shifts in writing style, and other heuristics, and once the large enough corpus is there, we can talk about training a new classifier, this time on a much more stable pattern with less type-I errors.

[0] https://en.wikipedia.org/wiki/Bayesian_poisoning

link