Hacker News new | ask | show | jobs
by ur-whale 512 days ago
> It's not actually a 600B+ model. It's a mixture of experts.

Is this described in the paper or was this inferred from the model itself ?

Just curious, especially if the latter.

1 comments

It's a 600B+ mixture of experts and yes it's described in the paper, GitHub, etc.