Y
Hacker News
new
|
ask
|
show
|
jobs
by
UncleOxidant
192 days ago
Or maybe models that are much more task-focused? Like models that are trained on just math & coding?
1 comments
agoodusername63
192 days ago
isn't that what the mixture of experts trick that all the big players do is? Bunch of smaller, tightly focused models
link
irthomasthomas
191 days ago
Not exactly. MoE uses a router model to select a subset of layers per token. This makes them faster but still requires the same amount of RAM.
link