|
|
|
|
|
by turmeric_root
1167 days ago
|
|
> the "number B" stands for "number of billions" of parameters... trained on? No, it's just the size of the network (i.e. number of learnable parameters). The 13/30/65B models were each trained on ~1.4 trillion tokens of training data (each token is around half a word). |
|