| That's a lot of other stuff, and you express disagreement. I'm sure we both agree it's the first model at this scale, hence the price. > It's not really the beginning (1.0) of anything It is a LLM w/o reasoning training. Thus, the public decision to make 5.0 = 4.5 + reasoning. > "more like the end...the last scale-up pre-training experiment." It won't be the last scaled-up pre-training model. I assume you mean, what I expect, and you go on to articulate: it'll be last scaled-up-pre-training-without-reasoning-training-too-relesed-publicly model. As we observe, the value to benchmarks of, in your parlance, scaled-down pretraining, with reasoning training, is roughly the same as scaled-up pre-training without reasoning training. |
Is it? Bigger than Grok 3? How do you know - just because it's expensive?