| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AlexCoventry 415 days ago
	MoE models route each token, in every transformer layer, to a set of specialized feed-forward networks (fully-connected perceptrons, basically), based on a score derived from the token's current representation.

1 comments

neom 415 days ago

Good visual explainer in here: https://deepgram.com/learn/mixture-of-experts-ml-model-guide