Hacker News new | ask | show | jobs
by aheilbut 507 days ago
is it possible to distill a large model into a (even) smaller MoE model, like OLMoE?