Hacker News new | ask | show | jobs
by suddenlybananas 490 days ago
Well it's true that all of the most recent advances come from changes the architecture to do to inference scaling instead of model scaling. Scaling laws as people talked about in them in 2022 (that you take a base LLM and make it bigger) are dead.
1 comments

I think you want both. To scale the model, e.g. train it with more and more data, you also need to scale your inference step. Otherwise, it just takes too long and it's too costly, no?