Hacker News new | ask | show | jobs
by p1esk 843 days ago
I think they went MoE purely because straight up scaling from 175B to 1.8T is just too expensive. But it’s still 10x scaling, right?