| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alecco 31 days ago

> Local modals are 6 months to 18 months behind frontier.

I wish this was true but it is not. And I am working on open source models so if anything, I would have a bias towards agreeing with you.

Frontier closed models (GPT/Claude) are gaining distance to everybody else. Even Google, once the king.

Your claim is a meme coming from benchmark results and sadly a lot of models are benchmaxxed. Llama 4, and most notably the Grok 3 drama with a lot of layoffs. And Chinese big tech... well they have some cultural issues.

"Qwen's base models live in a very exam-heavy basin - distinct from other base models like llama/gemma. Shown below are the embeddings from randomly sampled rollouts from ambiguous initial words like "The" and "A":"

https://xcancel.com/N8Programs/status/2044408755790508113

---

But thank god at least we have DeepSeek. They keep releasing good models in spite of being so seriously resource constrained. Punching well above their weight. But they are not just 6 months behind, either.

4 comments

crystal_revenge 31 days ago

I’ve worked, for a long time professionally, in the open model space for 3 years and up to 2 months ago I would have agreed with you. But it’s empirically not the case today. These models (combined with a good harness) have dramatically improved in both power and performance.

Gemma 4 was a major improvement is self-hostable local models and Qwen-3.6-A34B is a beast, and runs great on an MBP (and insanely well on a 4090).

The biggest lift is combining these models with a good agent harness (personally prefer Hermes agent). But I’ve found in practice they’re really not benchmaxxing. I’ve had these agents successfully hand a few non-trivial research projects that I wouldn’t have been able to accomplish as successfully even last year.

When you add in the open-but-not local models, Kimi, GLM, Minimax, you have a lot of very nice options. For personal use anything I don’t use local models for I give to my Kimi 2.6 powered agent.

link

alecco 30 days ago

For specific use cases, absolutely, a harness and other techniques help (this is literally what I'm working on). But GP was talking about general use.

Over-promising is a very stupid thing. Nobody will value the intermediate steps. Nobody will value all the effort because they will always compare us with frontier models made with billions and we will become a running joke. So please stop.

link

alsetmusic 30 days ago

Over-promising is what the frontier companies are doing. I'm not pretending open weight models are gonna do your homework and pay your taxes and remember your wife's bday with a super personalized gift. I'm just saying that they seem pretty good for what they are. There's no promise being made here.

link

dools 31 days ago

Kimi k2.6 is about on par with GPT 5.2 so I’d say open weight models are about 6 months behind.

link

cbg0 31 days ago

The Q4 quantization requires about 600GB of RAM without context, not exactly consumer hardware friendly.

link

janderland 31 days ago

Has Kimi found a way to vastly reduce the amount of VRAM required without running at 3 tokens per second? That’s the real concern.

link

dools 30 days ago

I said "open weight" rather than "local". I mean, local if you have $240k to drop on GPUs but you can run Kimi k2.6 on a B300 cluster for ~$50/hour too.

link

tyre 31 days ago

The Chinese models should stay close on a lag. They’re doing a ton of distillation that, realistically, I’m not sure the American frontiers can stop.

link

alecco 31 days ago

US labs got tough on "adversarial" distillation [1]. I suspect that's one of several reasons why Chinese big labs are lagging again.

[0] US AI firms team up in bid to counter Chinese 'distillation' (Apr 7) https://finance.yahoo.com/sectors/technology/articles/us-ai-...

link

tyre 30 days ago

Yeah I mean the US has gotten tough on, like, foreign interference in elections and cyber security, but if you have the Chinese state behind you—which they absolutely do and as an observer, obviously, they have to—no company can stop them.

Case in point: North Korea, with far, far fewer resources.

link

datadrivenangel 30 days ago

Local models are ~18-24 months behind the frontier on approximate intelligence, and then like 36-48 months behind the frontier on inference speed for nice hardware.

link