| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by highfrequency 360 days ago
	I would guess the “secret sauce” here is distillation: pretraining on an extremely high quality synthetic dataset from the prompted output of their state of the art models like o3 rather than generic internet text. A number of research results have shown that highly curated technical problem solving data is unreasonably effective at boosting smaller models’ performance. This would be much more efficient than relying purely on RL post-training on a small model; with low baseline capabilities the insights would be very sparse and the training very inefficient.

1 comments

asadm 360 days ago

> research results have shown that highly curated technical problem solving data is unreasonably effective at boosting smaller models’ performance.

same seems to be true for humans

link

throw310822 360 days ago

Yes, if I understand correctly, what it means is "a very smart teacher can do wonders for their pupils' education".

link

tempaccount420 360 days ago

Wish they gave us access to learn from those grandmother models instead of distilled slop.

link

ashdksnndck 360 days ago

It behooves them to keep the best stuff internal, or at least greatly limit any API usage to avoid giving the goods away to other labs they are racing with.

link

saurik 360 days ago

Which, presumably, is the reason they removed 4.5 from the API... mostly the only people willing to pay that much for that model were their competitors. (I mean, I would pay even more than they were charging, but I imagine even if I scale out my use cases--which, for just me, are mostly satisfied by being trapped in their UI--it would be a pittance vs. the simpler stuff people keep using.)

link