Hacker News new | ask | show | jobs
by konaraddi 79 days ago
> applying this compression algorithm at scale may significantly relax the memory bottleneck issue.

I don’t think they’re going to downsize though, I think the big players are just going to use the freed up memory for more workflows or larger models because the big players want to scale up. It’s a cat and mouse race for the best models.

2 comments

It will also help with local inference, making AI without big players possible.
It's already possible. Post-training is vastly more important than model size. (There's bigtime diminishing returns with increasing model size.)
Is there a size cutoff you would say where diminishing returns really kick in?

My experience doesn't disagree, at least. I've been using Qwen for coding locally a bit. It is much better than I thought it would be. But also still falls short in some obvious ways compared to the frontiers.

> Is there a size cutoff you would say where diminishing returns really kick in?

No idea yet. But also it's obvious that making LLMs without MoE is stupid.

Known in the business as 'pulling a jevons'