Hacker News new | ask | show | jobs
by orwin 1066 days ago
Yeah, that's where I thought it would go shortly after I tried GPT-4 from openAI. We're clearly at the transformer limits imho (comparing the effectiveness between 3.5 and 4, and the number of parameter in each model is why I think we reached a soft cap).

So since it'll be hard to go deeper, going broader by interlacing different model types might be a way to pierce through.

1 comments

> We're clearly at the transformer limits imho

GPT-4 did not scale up substantially in depth, going from 175 b to 220 b per transformer.

Wouldn't making the model multimodal require scaling the models significantly?

Or is the idea to keep the network the same size and trade off some of its nodes for image, video, etc. data?

If so has anyone shown that doing so results in better overall performance?

My lay-observation is that GPT-4 seems to be on the border of usability for most applications so if nothing is gained by simply changing the input data type as opposed to expanding the model then it feels like it won't be of much use yet.

Also apologies if I'm not making sense, I'm almost certainly not using to correct technical terms to articulate what I'm thinking.

> Wouldn't making the model multimodal require scaling the models significantly?

Just width if that makes sense. Basically, you add another encoder model but you are not actually increasing the depth that much.