Y
Hacker News
new
|
ask
|
show
|
jobs
by
jph00
303 days ago
That's not widely true. E.g the GPT 4 tech report pointed out nearly all their experiments were done on models 1000x smaller than the final model.
1 comments
tmule
303 days ago
Fair point, though I’d argue that there’s inherent selection bias for improvements that could fit a scaling law curve in the small model regime here.
link