| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mmmllm 278 days ago
	Isn't that essentially how the MoE models already work? Besides, if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost? Besides, this would only apply for very few use cases. For a lot of basic customer care work, programming, quick research, I would say LLMs are already quite good without running it 100X.

3 comments

mcrutcher 278 days ago

MoE models are pretty poorly named since all the "experts" are "the same". They're probably better described as "sparse activation" models. MoE implies some sort of "heterogenous experts" that a "thalamus router" is trained to use, but that's not how they work.

link

amelius 278 days ago

> if that were infinitely scalable, wouldn't we have a subset of super-smart models already at very high cost

The compute/intelligence curve is not a straight line. It's probably more a curve that saturates, at like 70% of human intelligence. More compute still means more intelligence. But you'll never reach 100% human intelligence. It saturates way below that.

link

eMPee584 277 days ago

how would you know it converges on human limits, why wouldn't it be able to go beyond, especially if it gets its own world sim sandbox?

link

amelius 277 days ago

I didn't say that. It converges well below human limits. That's what we see.

Thinking it will go beyond human limits is just wishful thinking at this point. There is no reason to believe it.

link

mirekrusin 278 days ago

MoE is something different - it's a technique to activate just a small subset of parameters during inference.

Whatever is good enough now, can be much better for the same cost (time, computation, actual cost). People will always choose better over worse.

link

mmmllm 278 days ago

Thanks, I wasn't aware of that. Still - why isn't there a super expensive OpenAI model that uses 1,000 experts and comes up with way better answers? Technically that would be possible to build today. I imagine it just doesn't deliver dramatically better results.

link

Leynos 278 days ago

That's what GPT-5 Pro and Grok 4 Heavy do. Those are the ones you pay triple digit USD a month for.

link