|
|
|
|
|
by timpetri
1536 days ago
|
|
Question related to the Chinchilla paper[0], which says that optimal amount of training data for ~500B, 1T, and 10T param models are 11T, 21.2T, 216.2T tokens, respectively. The PaLM paper[1] says it made use of 700B tokens. How many tokens of training data have humans produced across the entire internet, all our written works, etc? Is there such a thing as a 216 trillion token set? [0] https://arxiv.org/abs/2203.15556
[1] https://arxiv.org/abs/2204.02311 |
|
But the real question you should be asking is, where would you get the compute to train a model that needs 216t tokens?