|
|
|
|
|
by orwin
1066 days ago
|
|
Yeah, that's where I thought it would go shortly after I tried GPT-4 from openAI. We're clearly at the transformer limits imho (comparing the effectiveness between 3.5 and 4, and the number of parameter in each model is why I think we reached a soft cap). So since it'll be hard to go deeper, going broader by interlacing different model types might be a way to pierce through. |
|
GPT-4 did not scale up substantially in depth, going from 175 b to 220 b per transformer.