Hacker News new | ask | show | jobs
by bjornsing 450 days ago
That doesn’t make sense though if scaling is actually stalling. The reason so much compute goes into training now is scaling, which keeps base model lifetime short.