Hacker News new | ask | show | jobs
by squigz 695 days ago
> 2023 will be the last time we had original content, as soon we will have stop producing new content and just recycling existing content.

This is just an absurd idea. We're going to just stop producing new content?

4 comments

The incentives will be largely gone when SEO-savvy AI bots will produce 10K articles in the time it takes you to write one, so your article will be mostly unfindable in search engines.

Human generated content will be outpaced by AI generated content by a large margin, so even though there'll still be human content, it'll be meaningless on aggregate.

No, but the scrapers cannot tell it apart from LLM output.
We can adapt. There's already invite-only and semi-closed online communities. If the "mainstream" web becomes AI-flooded, where you'd you like to hang out / get information: the mainstream AI sludge, or the curated human communities?
I think the safest space away from the gen AI sludge will be offline. But even that will make it vulnerable to its influence.
Back to webrings, then.
Yet
The LLM is trained by measuring its error compared to the training data. It is literally optimizing to not be recognizable. Any improvement you can make to detect LLM output can immediately be used to train them better.
GANs do that, I don't think LLMs do. I think LLMs are mostly trained on "how do I recon a human would rate this answer?", or at least the default ChatGPT models are and that's the topic at the root of this thread. That's allowed to be a different distribution to the source material.

Observable: ChatGPT quite often used to just outright says "As a large language model trained by OpenAI…", which is a dead giveaway.

This is the result of RLHF (which is fine-tuning to make the output more palatable), but this is not what training is about.

The actual training process makes the model output be the likeliest output, and the introduction phrase you quoted would not come out of this process if there was no RLHF. See GPT3 (text-davinci-003 via API) which didn't have RLHF and would not say this, vs. ChatGPT which is fine-tuned for human preferences and thus will output such giveaways.

And then you can train a new detector.

I see no reason to believe it wouldn’t be a pendulum situation.

That’s how GANs work, after all.

Non-AI content will probably become a marketing angle for certain websites and apps.
it’ll be utterly drowned out for the vast majority of users