Y
Hacker News
new
|
ask
|
show
|
jobs
by
AlexCoventry
415 days ago
MoE models route
each token
, in every transformer layer, to a set of specialized feed-forward networks (fully-connected perceptrons, basically), based on a score derived from the token's current representation.
1 comments
neom
415 days ago
Good visual explainer in here:
https://deepgram.com/learn/mixture-of-experts-ml-model-guide
link