Hacker News new | ask | show | jobs
by m_w_ 2 days ago
I think Mythos is rumored to be ~10T parameters, so in this case I think the answer is yes, although I'm sure MoE, looped models, etc play a role in the improvements as well.