Y
Hacker News
new
|
ask
|
show
|
jobs
by
pico_creator
539 days ago
Not an MoE, but we have already done hybrid models. And found it to be highly performant (as per the training budget)
https://arxiv.org/abs/2407.12077