| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by konaraddi 79 days ago
	> applying this compression algorithm at scale may significantly relax the memory bottleneck issue. I don’t think they’re going to downsize though, I think the big players are just going to use the freed up memory for more workflows or larger models because the big players want to scale up. It’s a cat and mouse race for the best models.

2 comments

miohtama 79 days ago

It will also help with local inference, making AI without big players possible.

link

otabdeveloper4 79 days ago

It's already possible. Post-training is vastly more important than model size. (There's bigtime diminishing returns with increasing model size.)

link

plagiarist 79 days ago

Is there a size cutoff you would say where diminishing returns really kick in?

My experience doesn't disagree, at least. I've been using Qwen for coding locally a bit. It is much better than I thought it would be. But also still falls short in some obvious ways compared to the frontiers.

link

otabdeveloper4 79 days ago

> Is there a size cutoff you would say where diminishing returns really kick in?

No idea yet. But also it's obvious that making LLMs without MoE is stupid.

link

Verdex 79 days ago

Known in the business as 'pulling a jevons'

link