Hacker News new | ask | show | jobs
by anilgulecha 219 days ago
It's a mixture-of-experts model. Basically N smaller model pieces put together, and when inference occurs, only 1 is active at a time. Each model piece would be tuned/good in one area.