Hacker News new | ask | show | jobs
by thntk 701 days ago
Anyone know what caused the very big performance jump from Large1 to Large2 in just a few months?

Besides, parameter redundancy seems evidenced. Front-tier models used to be 1.8T, then 405B, and now 123B. Would front-tier models in the future be <10B or even <1B, that would be a game changer.

2 comments

Counter-intuitively, larger models are cheaper to train. However, smaller models are cheaper to serve. At first, everyone was focusing on training, so the models were much larger. Now, so many people are using AI everyday, so companies spend more on training smaller models to save on serving.
Lots and lots of synthetic data from the bigger models training the smaller ones would be my guess.