Hacker News new | ask | show | jobs
by greyface- 2167 days ago
This is an interesting thought. GPT-3 used 45TB of raw CommonCrawl data (which was filtered down to 570GB prior to training). The Internet Archive has 48PB of raw data.
1 comments

That 48PB is mostly just old video game roms and isos though