Y
Hacker News
new
|
ask
|
show
|
jobs
by
greens
1749 days ago
This is a common misconception. GPT-3 was trained using a 300B token (~300gb) subset of common-crawl and friends. The model is larger than the dataset.