| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by l9o 176 days ago
	RAM requirements stay the same. You need all 358B parameters loaded in memory, as which experts activate depends on each token dynamically. The benefit is compute: only ~32B params participate per forward pass, so you get much faster tok/s than a dense 358B would give you.

1 comments

atq2119 175 days ago

The benefit is also RAM bandwidth. That probably adds to the confusion, but it matters a lot for decode. But yes, RAM capacity requirements stay the same.

link