| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stymaar 23 days ago

> Lastly, there is a massive difference in capabilities, determinism, and error handling between 5T SOTA models like Opus

What's your source for Opus being a 5T model?

> and tiny distillations from DeepSeek that perform well only in benchmarks.

I don't think you know what you're talking about. Local models aren't “distillations from Deepseek”.

And they don't perform well “only in benchmarks”, Qwen 3.6 is a very decent model (obviously it's not Opus, but it's also much faster and speed is a quality of its own).

3 comments

gpugreg 23 days ago

> What's your source for Opus being a 5T model?

Elon Musk tweeted that Grok is 0.5T or 1/10th the size of Opus. https://xcancel.com/elonmusk/status/2042123561666855235#m

While this source's reliability is certainly debatable, the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

link

stymaar 23 days ago

> While this source's reliability is certainly debatable

Massive understatement. Nowadays it has become hard to find a single Musk statement that doesn't contain at least one lie.

> the size matches the results of this paper, in which researchers estimated the parameter count from model knowledge. https://01.me/research/ikp/

Thanks for the pointer. This estimation has Grok 6 times bigger than Musk claims it is, so maybe that's where the lie is.

(I'm quite skeptical about that number though, it would be quite disappointing for the US tech if their flagship models had to be that much larger than the Chinese ones for such a small edge in performance. Because I don't think US labs are incompetent, I'd bet that US flagships aren't more than 2/3 times bigger than Chinese flagship. Otherwise it really doesn't bode well.)

link

striking 22 days ago

In tiny gray text right above the table is written "90% PI ≈ ±3.00× either side." Is GPT-5.5-Pro 3.4T or 30.8T in size, or somewhere in between? We just don't know.

link

fluidcruft 22 days ago

Musk has a lot of incentive to explain away how horrible Grok is relative to Opus.

It's certainly a better sell that Grok sucks because it's small and Opus is impressive because it's large, than the alternative that Grok is also large and sucks which points to xAI incompetence and mismanagement.

Particularly when you're trying to IPO a rocket company based on rosy forecasted valuations of Grok dominating the market.

link

UltraSane 22 days ago

Elon Musk has absolutely no credibility anymore. I'm more likely to believe the opposite of what he claims to be true.

link

kakacik 22 days ago

aka the russian strategy

link

orphea 22 days ago

  Elon Musk tweeted

Come on. The Onion would be a more credible source.

link

Chyzwar 22 days ago

https://arxiv.org/abs/2604.24827

From this paper

link

stymaar 22 days ago

That's not what the paper says though:

    Claude Opus 4.6 Anthropic 68.0% ∼5.3T [1.8–15.6T]
    Claude Opus 4.7 Anthropic 66.4% ∼4.0T [1.4–12.0T]
    Claude Opus 4.5 Anthropic 65.2% ∼3.4T [1.1–10.0T]
    Claude Opus 4.1 Anthropic 64.9% ∼3.2T [1.1–9.5T]
    Claude Opus 4 Anthropic 59.7% ∼1.4T [478B–4.2T

According to their estimation, Opus is likely between 1T and 15T, which really doesn't tell you much that you couldn't have guessed otherwise. It doesn't say “Opus is a 5T model”.

The fact that there's absolutely no consistency in the predicted size between models from the same lab should tell you all you need about the predictive power of this method (and they aren't really lying about their numbers, their confidence interval is huge enough to fit anything in it, but their prose is making very strong claims out of their statistical nothingburger).

(somebody already posted this paper earlier, and I spent some time reading it, and this paper is really not that good even though there are a bunch of interesting ideas in it).

link

layer8 23 days ago

> What's your source for Opus being a 5T model?

Probably Elon Musk: https://eu.36kr.com/en/p/3760679047267075

link

UltraSane 22 days ago

I don't know why stymaar's comment is flagged and dead, he is 100% correct.

link