Hacker News new | ask | show | jobs
by refulgentis 481 days ago
It's unfortunate it is named 4.5 -- it is next generation scale, and it's a 1.0 of next-generation scale.

Sonnet is on its 3rd iteration, i.e. has considerably more post-training, most notably, reasoning via reinforcement learning.

2 comments

It's not really the beginning (1.0) of anything - more like the end given that OpenAI have said this'll be the last of their non-reasoning models - basically the last scale-up pre-training experiment.

As far as the version number, OpenAI's "Chief Research Officier" Mark Chen said, on Alex Kantrowitz's YouTube channel, that it "felt" like a 4.5 in terms of level of improvement over 4.0.

That's a lot of other stuff, and you express disagreement.

I'm sure we both agree it's the first model at this scale, hence the price.

> It's not really the beginning (1.0) of anything

It is a LLM w/o reasoning training.

Thus, the public decision to make 5.0 = 4.5 + reasoning.

> "more like the end...the last scale-up pre-training experiment."

It won't be the last scaled-up pre-training model.

I assume you mean, what I expect, and you go on to articulate: it'll be last scaled-up-pre-training-without-reasoning-training-too-relesed-publicly model.

As we observe, the value to benchmarks of, in your parlance, scaled-down pretraining, with reasoning training, is roughly the same as scaled-up pre-training without reasoning training.

> Yes it is. It's the first model at this scale.

Is it? Bigger than Grok 3? How do you know - just because it's expensive?

At some point, I have to say to myself: "I do know things."

I'm not even sure what the alternative theory would be: no one stepped up to dispute OpenAI's claim that it is, and X.ai is always eager to slap OpenAI around.

Let's say Grok is also a pretraining scale experiment. And they're scared to announce they're mogging OpenAI on inference cost because (some assertion X, which we give ourselves the charity of not having to state to make an argument).

What's your theory?

Steelmanning my guess: The price is high because OpenAI thinks they can drive people to Model A, 50x the cost of Model B.

Hmm...while publicly proclaiming, it's not worth it, even providing benchmarks that Model A gets the same scores 50x cheaper?

That doesn't seem reasonable.

OpenAI have apparently said that GPT 4.5 has a knowledge cutoff date of October 2023, and their System Card for it says "GPT 4.5 is NOT a frontier model" (my emphasis).

It seems this may be an older model that they chose not to release at the time, and are only doing so now due to feeling pressure to release something after recent releases by DeepSeek, Grok, Google and Anthropic. Perhaps they did some post-training to "polish the turd" and give it the better personality that seems to be one of it's few improvements.

Hard to say why it's so expensive - because it's big and expensive to serve, or for some marketing/PR reason. It seems that many sources are confirming that the benefits of scaling up pre-training (more data, bigger model) are falling off, so maybe this is what you get when you scale up GPT 4.0 by a factor of 10x - bigger, more expensive, and not significantly better. Cost to serve could also be high because, not intending to release it, they have never put the effort in to optimize it.

See, you get it: if we want to know nothing, we can know nothing.

For all we know, Beezlebub Herself is holding Sam Altman's conciousness captive at the behest of Nadella. The deal is Sam has to go "innie" and jack up OpenAI costs 100x over the next year so it can go under and Microsoft can get it all for free.

Have you seen anything to disprove that? Or even casting doubt on it?

Versions numbers for LLMs don't mean anything consistent. They don't even publicly announce at this point which models are built from new base models and which aren't. I'm pretty sure Claude 3.5 was a new set of base models since Claude 3.

What do mean by "it's a 1.0" and "3rd iteration"? I'm having trouble parsing those in context.

If Claude 3.5 was a base model*, 3.7 is a third iteration** of that model.

GPT-4.5 is a 1.0, or, the first iteration of that model.

* My thought process when writing: "When evaluating this, I should assume the least charitable position for GPT-4.5 having headroom. I should assume Claude 3.5 was a completely new model scale, and it was the same scale as GPT-4.5." (this is rather unlikely, can explain why I think that if you're interested)

** 3.5 is an iteration, 3.6 is an iteration, 3.7 is an iteration.