Hacker News new | ask | show | jobs
by pico_creator 539 days ago
Not an MoE, but we have already done hybrid models. And found it to be highly performant (as per the training budget)

https://arxiv.org/abs/2407.12077