Hacker News new | ask | show | jobs
by jph00 303 days ago
That's not widely true. E.g the GPT 4 tech report pointed out nearly all their experiments were done on models 1000x smaller than the final model.
1 comments

Fair point, though I’d argue that there’s inherent selection bias for improvements that could fit a scaling law curve in the small model regime here.