Y
Hacker News
new
|
ask
|
show
|
jobs
by
greyface-
2167 days ago
This is an interesting thought. GPT-3 used 45TB of raw CommonCrawl data (which was filtered down to 570GB prior to training). The Internet Archive has 48PB of raw data.
1 comments
GreenHeuristics
2167 days ago
That 48PB is mostly just old video game roms and isos though
link