| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by turmeric_root 1214 days ago
	> the "number B" stands for "number of billions" of parameters... trained on? No, it's just the size of the network (i.e. number of learnable parameters). The 13/30/65B models were each trained on ~1.4 trillion tokens of training data (each token is around half a word).