Hacker News new | ask | show | jobs
by swyx 8 days ago
> If you can see that these models empirically get better with scale, why would you swap the main architecture? Those events will be pretty rare

c.f. hardware lotter https://arxiv.org/abs/2009.06489