|
|
|
|
|
by xtreme
1146 days ago
|
|
Before you learned how to code from a book, you had to learn how to read and write English. You also had to learn how to follow instructions, how to imbibe and compose information etc. How many books and hours of instruction did that take? |
|
>Let us consider the GPT-3 model with 𝑃 =175 billion parameters as an example. This model was trained on 𝑇 = 300 billion tokens. On 𝑛 = 1024 A100 GPUs using batch-size 1536, we achieve 𝑋 = 140 teraFLOP/s per GPU. As a result, the time required to train this model is 34 days.
https://arxiv.org/pdf/2104.04473.pdf
I'm not sure expressing brain capacity in FLOPs makes much sense, but I'm sure if it can be expressed in FLOPs, the amount of FLOPs going to learning for a normal human is less than that.