Hacker News new | ask | show | jobs
by greens 1749 days ago
This is a common misconception. GPT-3 was trained using a 300B token (~300gb) subset of common-crawl and friends. The model is larger than the dataset.