Hacker News new | ask | show | jobs
by valine 1152 days ago
Presumably the ChatGPT content that makes it onto the web is at the very least curated by humans, making that text on average slightly higher quality than the raw output of ChatGPT. If that's the case than you would expect model performance to continue to improve even if the dataset is polluted.
1 comments

That's a bold assumption. I can imagine a world where 99.999% of the web will be filled with non-human curated AI generated text.

The rate at which AI can generate text will be so much greater than what humans can generate.

Doesn't matter. We want high-quality text - it's not necessary for it to be human-written. Social signals like upvotes or PageRank will still remain useful even if most text is AI generated.
I certainly don't want most of discussion forums to be generated by bots. I'd rather there was none of it. High-quality generated text is good for fiction and summaries, but not when you want to hear what actual humans have to say.
The point is that AIs will run out of human-generated text or that it won't be able to distinguish from AI or human generated text to train on.

You're already assuming pagerank and upvote systems won't break down in the future.

You just gotta get the AIs to do the upvoting, then cut the humans out of the loop all together and only have AIs read the AI generated text, and then everything will be fine. Just an endless death spiral of ai gen, ai filtering, and ai consumption, forever and ever.

Presumably at some point computers will become (already are for all I know?) the largest consumers of content on the internet as well as its producers.

"bold assumption" says the guy who assumes $2 worth of energy spent on AI generated text for every single written word by humans.

Now go ahead and spend $50 dollars on AI generated text nobody is ever going to read, just like almost nobody is going to read this comment.

Bold assumption that AI generated text won't get cheaper exponentially. It already costs less than human generated text of the same quality by magnitudes.
Costs a lot more than free text written by thinking humans.
I think you're very confused about the costs required in operating a human... Or are you assuming because the human was going to be doing it anyway the cost is free?