| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by whimsicalism 928 days ago
	DPO is pretty good as well. I think that the '7b beating 70b' is mostly due to the fact that Mistral is likely trained on considerably more tokens than Chinchilla optimal. So is llama-70b, but not to the same degree.