Hacker News new | ask | show | jobs
by turmeric_root 1167 days ago
> the "number B" stands for "number of billions" of parameters... trained on?

No, it's just the size of the network (i.e. number of learnable parameters). The 13/30/65B models were each trained on ~1.4 trillion tokens of training data (each token is around half a word).