| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by squigz 695 days ago
	> 2023 will be the last time we had original content, as soon we will have stop producing new content and just recycling existing content. This is just an absurd idea. We're going to just stop producing new content?

4 comments

tbatchelli 695 days ago

The incentives will be largely gone when SEO-savvy AI bots will produce 10K articles in the time it takes you to write one, so your article will be mostly unfindable in search engines.

Human generated content will be outpaced by AI generated content by a large margin, so even though there'll still be human content, it'll be meaningless on aggregate.

link

mglz 695 days ago

No, but the scrapers cannot tell it apart from LLM output.

link

epidemian 695 days ago

We can adapt. There's already invite-only and semi-closed online communities. If the "mainstream" web becomes AI-flooded, where you'd you like to hang out / get information: the mainstream AI sludge, or the curated human communities?

link

grugagag 695 days ago

I think the safest space away from the gen AI sludge will be offline. But even that will make it vulnerable to its influence.

link

deathanatos 695 days ago

Back to webrings, then.

Yet

The LLM is trained by measuring its error compared to the training data. It is literally optimizing to not be recognizable. Any improvement you can make to detect LLM output can immediately be used to train them better.

link

ben_w 695 days ago

GANs do that, I don't think LLMs do. I think LLMs are mostly trained on "how do I recon a human would rate this answer?", or at least the default ChatGPT models are and that's the topic at the root of this thread. That's allowed to be a different distribution to the source material.

Observable: ChatGPT quite often used to just outright says "As a large language model trained by OpenAI…", which is a dead giveaway.

link

sebastiennight 695 days ago

This is the result of RLHF (which is fine-tuning to make the output more palatable), but this is not what training is about.

The actual training process makes the model output be the likeliest output, and the introduction phrase you quoted would not come out of this process if there was no RLHF. See GPT3 (text-davinci-003 via API) which didn't have RLHF and would not say this, vs. ChatGPT which is fine-tuned for human preferences and thus will output such giveaways.

link

dartos 694 days ago

And then you can train a new detector.

I see no reason to believe it wouldn’t be a pendulum situation.

That’s how GANs work, after all.

link

camdenreslink 695 days ago

Non-AI content will probably become a marketing angle for certain websites and apps.

link

binary132 695 days ago

it’ll be utterly drowned out for the vast majority of users

link