Hacker News new | ask | show | jobs
by onlyrealcuzzo 3 days ago
> How do you know that?

Historic trends, every 18 months, performance for the same level of quality has gone down 90%.

See: https://www.reddit.com/r/LocalLLaMA/comments/1gpr2p4/llms_co...

And Chart 13 here: https://www.rdworldonline.com/ais-great-compression-20-chart...

And here: https://epoch.ai/data-insights/llm-inference-price-trends

The technology already exists now on the algorithmic front for the next 10x drop between everyone adopting DeepSeek's MLA, MoE (mostly already done), Medusa (a better version of Google's speculative decoding), Kimi's Attn Residuals, and Mimo's Sliding Window Attn, and (possibly) Microsoft's 1.58b (this may be a nothing burger).

Historically, algorithmic gains are only ~30% of the pie, but there's enough out there to get to 10x, with just what's available already. The other ~70% of the pie is better training data (often synthetic) and distilling frontier knowledge. There's no sign we are tapped out on that front.

> In 2026 the prices have been spiking.

That's not for the SAME level of output...

1 comments

MoE isn’t the magical improvement you think it is. Logprobs of MoE models are always worse in quality than the dense equivalent and they struggler harder at very long context quality than equivalent dense models. This is why Chinese companies like qwen are releasing dense and MoE versions of their models at near equivalent sizes. I always use/prefer the dense one.

Speculative decoding usually only improves decode and sometimes actually harm prefill and for agentic coding prefill matters more.

You’re right about the rest but I need to set the record straight on these details.