| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 1R053 589 days ago

They use

- 16 experts, of which one is activated per token

- 1 shared expert that is always active

in summary that makes around 52B active parameters per token instead of the 405B of LLama3.1.