| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alekandreev 721 days ago
	I think it makes sense to compare models trained with the same recipe on token count - usually more tokens will give you a better model. However, I wouldn't draw conclusions about different model families, like Llama and Gemma, based on their token count alone. There are many other variables at play - the quality of those tokens, number of epochs, model architecture, hyperparameters, distillation, etc. that will have an influence on training efficiency.