|
|
|
|
|
by tempusalaria
1190 days ago
|
|
This isn’t that interesting imo. This is the basic outcome of the scaling laws from Kaplan, Chinchilla papers pushed to a larger final model delta. They likely did extensive small model building on the gpt-4 architecture to establish hyperparameter scaling laws and then did a predicted build in exactly the same way chinchilla did. |
|
Why isn’t chinchilla running google AI chat or whatever then?