Hacker News new | ask | show | jobs
by algo_trader 843 days ago
> at least provide evidence that scaling up no longer works for transformers.

Isnt the Mixture-of-Experts trend (GPT4 is MoE?) kinda of a proof ?

2 comments

Of scale? I would think not. I would say they are evidence against scale because they are more an argument for multi agent systems. Scale is about a singular framework. What that means is debatable though (I mean anything we call a singular network can be decomposed into sub networks. It's messy), hence the other part of my comment about not scale solutions being claimed as scale.

> I said “scale is all you need for transformers”

No you didn't. What kicked this all off was

> What’s the main difference between an ape’s brain and a human brain? Scale.

Don't retcon.

I think they went MoE purely because straight up scaling from 175B to 1.8T is just too expensive. But it’s still 10x scaling, right?