|
|
|
|
|
by redox99
1170 days ago
|
|
I don't know about GPT4, but GPT3.5 I'd bet is pretty traditional and boring. It's power comes from a really good, properly curated dataset (including the RLHF). GPT3.5 turbo is much more interesting probably, because they seem to have found out how to make it much more efficient (some kind of distillation?). GPT4 if I had to make a very rough guess, probably flash attention, 100% of the (useful) internet/books for it's dataset, and highly optimized hyperparameters. I'd say with GPT4 they probably reached the limit of how big the dataset can be, because they are already using all the data that exists. Thus for GPT5 they'll have to scale in other ways. |
|
[1] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=656s