| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by riku_iki 806 days ago
	I didn't get one detail: they selected 6B transformer as baseline and compared it to 7B Griffin Why wouldn't select equal size models?..

1 comments

szundi 806 days ago

They probably had them for some reason and it was cheaper not to retrain one of them again

link

riku_iki 806 days ago

Its just performance comparison is misleading then, they report marginal improvements which is expected just because of models size differences..

link

GaggiX 806 days ago

It also performs better on any other size.

link

riku_iki 806 days ago

They have baseline transformer of max size 6B in tables. Other models are trained on very different data and probably differently.

link

GaggiX 806 days ago

All the MQA transformers, Hawk and Griffin are trained on the same MassiveText dataset so no.

link

riku_iki 806 days ago

Yes, but MQA is limited to 6B size, while "other" larger non-RNN models in table(Llama-2) are not trained on the same dataset, and Hawk and Griffin are 7B. Sorry, I don't understand your point.

link