|
|
|
|
|
by buildbot
1191 days ago
|
|
I think the most interesting thing is the their ability to predict performance from loss and on a wide range of tasks using a much smaller model - this lets them fine tune their architecture and hypers, then run a single large training run to get full scale gpt4 - from the paper it sounds like they only trained the large model once, then did a Reinforcement learning with human feedback finetune. Disclaimer - I work at Microsoft, in AI, and have no internal knowledge about gpt4. |
|
They likely did extensive small model building on the gpt-4 architecture to establish hyperparameter scaling laws and then did a predicted build in exactly the same way chinchilla did.