| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by paradite 473 days ago
	My burning question: Why not also make a slightly larger model (100B) that could perform even better? Is there some bottleneck there that prevents RL from scaling up performance to larger non-MoE model?

2 comments

they have a larger model that is in previes and still training.