Hacker News new | ask | show | jobs
by sebosp 975 days ago
Is there an approximation/ratio in which the amount of digital garbage/hallucinations online generated by AI is so big that it cannot be used to train AI itself? Like are AI companies running against the clock because, say, in 5 years the internet will be flooded by false information to such an extent that it would render the internet as an invalid training ground. In a way requiring a snapshot of the internet pre-AI, because this is click bait problem times infinity it feels like
2 comments

It's too late already if you want to just scrape random horseshit on the internet. There will be real money in large expert generated data sets. AI is also a potential epistemology nightmare. It can cement bad knowledge and bury new more up to date knowledge in a sea of bullshit.
Aka "t-minus how many days until OpenAi wants to buy archive.org"