| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ioulaum 518 days ago
	It's not actually a 600B+ model. It's a mixture of experts. The actual models are pretty small and thus don't require as much training to reach a decent point. It's similar to Mixtral having gotten good performance while not having anywhere near OpenAI class money / compute.

1 comments

> It's not actually a 600B+ model. It's a mixture of experts.

Is this described in the paper or was this inferred from the model itself ?

Just curious, especially if the latter.

It's a 600B+ mixture of experts and yes it's described in the paper, GitHub, etc.