| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by naillo 986 days ago
	It's funny that one argument openai used to keep their models closed and centralized is so they could prevent things like this. And yet they're doing basically nothing to stop it (and letting the web deteriorate) now that profit has come into play.

1 comments

wildrhythms 986 days ago

I don't understand how OpenAI thinks they could stop that from happening.

link

tornato7 986 days ago

Not saying they should, but if they wanted to they could have an API that allows you to check whether some text was generated by them or not. Then Google would be able to check search results and downrank.

link

simonhughes22 986 days ago

It's not that simple. Originally OpenAI released a model to try and detect whether some content was generated by an LLM or not. They later dropped the service as it wasn't accurate. Today's models are so good at text generation it's not possible in most cases to differentiate between a human and machine generated text.

link

naillo 986 days ago

Well they could just not allow prompts that seem to participate in blogspam. If they wanted to stop it they definitely could.

Their argument is that since it's centralized, things like that are possible (while with llama2 you can't), they do "patch" things all the time. But since blobspam are contributing to paying back the billions microsoft expects they're not going to.

link

Meekro 986 days ago

> Well they could just not allow prompts that seem to participate in blogspam.

Unfortunately, any question that lots of people legitimately ask will also be a prompt for blogspam.

link

kolinko 985 days ago

It would be easy to workaround using other open source models. You use GPT-4 to generate content and then LLAMA-2 or sth else to change the style slightly.

Also, it would require OpenAI to store the history of everything that it's API has produced. That would be in contrast with their privacy policy and privacy protections.

link

tornato7 985 days ago

I mean they would literally just store hashes of all the content that they generate and compare it vs their database. No AI involved.

link

thewakalix 985 days ago

If it's a straightforward hash, that's easy to evade by modifying the output slightly (even programmatically).

If it's a perceptual hash, that's easy to _exploit_: just ask the AI to repeat something back to you, and it typically does so with few errors. Now you can mark anything you like as "AI-generated". (Compare to Schneier's take on SmartWater at https://www.schneier.com/blog/archives/2008/03/the_security_...).

link