| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by orwin 1066 days ago
	Yeah, that's where I thought it would go shortly after I tried GPT-4 from openAI. We're clearly at the transformer limits imho (comparing the effectiveness between 3.5 and 4, and the number of parameter in each model is why I think we reached a soft cap). So since it'll be hard to go deeper, going broader by interlacing different model types might be a way to pierce through.

1 comments

whimsicalism 1066 days ago

> We're clearly at the transformer limits imho

GPT-4 did not scale up substantially in depth, going from 175 b to 220 b per transformer.

link

CSMastermind 1066 days ago

Wouldn't making the model multimodal require scaling the models significantly?

Or is the idea to keep the network the same size and trade off some of its nodes for image, video, etc. data?

If so has anyone shown that doing so results in better overall performance?

My lay-observation is that GPT-4 seems to be on the border of usability for most applications so if nothing is gained by simply changing the input data type as opposed to expanding the model then it feels like it won't be of much use yet.

Also apologies if I'm not making sense, I'm almost certainly not using to correct technical terms to articulate what I'm thinking.

link

whimsicalism 1066 days ago

> Wouldn't making the model multimodal require scaling the models significantly?

Just width if that makes sense. Basically, you add another encoder model but you are not actually increasing the depth that much.

link