Hacker News new | ask | show | jobs
by tempusalaria 1190 days ago
This isn’t that interesting imo. This is the basic outcome of the scaling laws from Kaplan, Chinchilla papers pushed to a larger final model delta.

They likely did extensive small model building on the gpt-4 architecture to establish hyperparameter scaling laws and then did a predicted build in exactly the same way chinchilla did.

1 comments

I guess, but its actually not simple to do that, in my experience. There’s another paper on that: https://arxiv.org/abs/2203.03466

Why isn’t chinchilla running google AI chat or whatever then?