Hacker News new | ask | show | jobs
by buildbot 1191 days ago
I think the most interesting thing is the their ability to predict performance from loss and on a wide range of tasks using a much smaller model - this lets them fine tune their architecture and hypers, then run a single large training run to get full scale gpt4 - from the paper it sounds like they only trained the large model once, then did a Reinforcement learning with human feedback finetune.

Disclaimer - I work at Microsoft, in AI, and have no internal knowledge about gpt4.

1 comments

This isn’t that interesting imo. This is the basic outcome of the scaling laws from Kaplan, Chinchilla papers pushed to a larger final model delta.

They likely did extensive small model building on the gpt-4 architecture to establish hyperparameter scaling laws and then did a predicted build in exactly the same way chinchilla did.

I guess, but its actually not simple to do that, in my experience. There’s another paper on that: https://arxiv.org/abs/2203.03466

Why isn’t chinchilla running google AI chat or whatever then?