| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rishabhjain1198 923 days ago
	In a MoE model with experts_per_token = 2 and each expert having 7B params, after picking the experts it should run as fast as the slowest 7B expert, not a comparable 14B model.

1 comments

Only assuming it's able to hide the faster one in free parallelism.

My CPU trying its best to run inference: parallelwhat?