| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by greens 1749 days ago
	This is a common misconception. GPT-3 was trained using a 300B token (~300gb) subset of common-crawl and friends. The model is larger than the dataset.