| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thntk 701 days ago
	Anyone know what caused the very big performance jump from Large1 to Large2 in just a few months? Besides, parameter redundancy seems evidenced. Front-tier models used to be 1.8T, then 405B, and now 123B. Would front-tier models in the future be <10B or even <1B, that would be a game changer.

2 comments

duchenne 701 days ago

Counter-intuitively, larger models are cheaper to train. However, smaller models are cheaper to serve. At first, everyone was focusing on training, so the models were much larger. Now, so many people are using AI everyday, so companies spend more on training smaller models to save on serving.

link

nuz 701 days ago

Lots and lots of synthetic data from the bigger models training the smaller ones would be my guess.

link