| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Flux159 148 days ago
	I'm a bit confused by this branding (never even noticed that there was a 5.2-Instant), it's not a super fast 1000tok/s Cerebras based model which they have for codex-spark, it's just 5.2 w/out the router / "non-thinking" mode? I feel like openai is going to get right back to where they were pre GPT-5 with a ton of different options and no one knows which model to use for what.

6 comments

tedsanders 148 days ago

Yeah, for a while ChatGPT Plus has been powered by two series of models under the hood.

One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.

The second series is the Thinking series, which is more accurate and more tuned to professional knowledge work, but slower (because it uses more reasoning tokens).

We'd also prefer to have simple experience with just one option, but picking just one would pull back the pareto frontier for some group of people/preferences. So for now we continue to serve two models, with manual control for people who want to choose and an imperfect auto switcher for people who don't want to be bothered. Could change down the road - we'll see.

(I work at OpenAI.)

vessenes 148 days ago

By the way, I imagine you know this, but the product split is not obvious, even to my 20-something kids that are Plus subscribers - I saw one of them chatting with the instant model recently and I was like "No!! Never do that!!" and they did not understand they were getting the (I'm sorry to say) much less capable model.

I think it's confusing enough it's a brand harm. I offer no solutions, unfortunately. I guess you could do a little posthoc analysis for plus subscribers on up and determine if they'd benefit from default Thinking mode; that could be done relatively cheaply at low utilization times. But maybe you need this to keep utilization where it's at -- either way, I think it ends up meaning my kids prefer Claude. Which is fine; they wouldn't prefer Haiku if it was the default, but they don't get Haiku, they get Sonnet or Opus.

pants2 148 days ago

I agree -- we're on the ChatGPT Enterprise plan at work and every time someone complains about it screwing up a task it turns out they were using the instant model. There needs to be a way to disable it at the bare minimum.

sebmellen 148 days ago

I mean, they must know this. Imagine how many tokens they're saving.

lifis 148 days ago

You could perhaps show the "instant" reply right away and provide a button labeled "Think longer and give me a better answer" that starts the thinking model and eventually replaces the answer.

For this to work well, the instant reply must be truly instant and the button must always be visible and at the same position in the screen (i.e. either at the top or bottom, of the answer, scrolling such that it is also at the top or bottom of the screen), and once the thinking answer is displayed, there should be a small icon button to show the previous instant answer.

michaelmrose 148 days ago

Wouldn't this be 1.5x as expensive?

jimbokun 148 days ago

Not if the Instant answer is sufficient.

resters 148 days ago

That's assuming that the instant answer is even directionally correct. A misleading instant answer could pollute the context and lead the thinking model astray.

ssl-3 148 days ago

Can the context of the pre-revision, Instant response be simply be discarded -- or forked or branched or [insert appropriate nomenclature here] -- instead of being included as potential poison?

(It seems absurd that to consider that there may be no undo button that the machine can push.)

Defenestresque 148 days ago

For those who are unaware, this is exactly what Grok does. The default is an auto mode, then when you ask a question it starts researching (which is visible to the user) and if it's using the expert mode but you don't really need all that jazz, it has a "Quick Answer" button right above the prom entry field, and if it's using a "Quick Answer" mode then it has "Expert" button and the same place, and you are able to toggle between them mid answer and it will adjust the model (or model parameters, I'm not sure how it works under the hood).

It's pretty good with the auto chooser, but I appreciate the manual choice available so in-your-face and especially not having it restart the query completely but rather convert the output to either Quick or Expert.

This is on the Web UI, can't speak for other harnesses. I do find that it's quite good with the citations and has a fairly generous free tier, even on Expert mode. (As for who sits at the top, I am indeed put off by Musk's clear interference in several cases involving Grok, nor do my personal values align with the majority of his, but today's Grok is definitely less MechaHitler and more reliable than it was before.)

Flux159 148 days ago

Thanks for clarifying! I guess the default for most users is going to be to use the router / auto switcher which is fine since most people won't change the default.

Just noting that I'm not against differentiation in products, but it gets very confusing for users when there's too many options (in the case of the consumer ChatGPT at least this is still more limited than in pre-GPT 5 days). The issue is that there's differentiation at what I pay monthly (free vs plus vs pro) and also at the model layer - which essentially becomes this matrix of different options / limits per model (and we're not even getting into capabilities).

For someone who uses codex as well, there are 5 models there when I use /model (on Plus plan, spark is only available for Pro plan users), limits also tied to my same consumer ChatGPT plan.

I imagine the model differentiation is only going to get worse as well since with more fine tuned use cases, there will be many different models (ie health care answers, etc.) - is it really on the user to figure out what to use? The only saving grace is that it's not as bad as Intel or AMD cpu naming schemes / cloud provider instance naming, but that's a very low bar.

redox99 148 days ago

Auto will never work, because for the exact same prompt sometimes you want a quick answer because it's not something very important to you, and sometimes you want the answer to be as accurate as possible, even if you have to wait 10 minutes.

In my case it would be more useful to have a slider of how much I'm willing to wait. For example instant, or think up to 1 minute, or think up to 15 minutes.

nearbuy 148 days ago

That's pretty close to what they have. They just named them Instant, Thinking (Standard), and Thinking (Extended), and they're discrete presets instead of a slider.

redox99 148 days ago

But the time it takes is too variable. Even standard can sometimes take 15+ minutes.

cj 148 days ago

They have an "answer now" button that stops the reasoning and starts the reply. Same with Gemini.

redox99 148 days ago

Yeah I use that, but it's not really a solution that allows to only have auto. It doesn't help when it chooses Instant instead of Thinking, and it's also much slower than using Instant outright because the Skip button doesn't immediately show, and it's generally slow to restart.

lxgr 148 days ago

Thank you for confirming!

I've long suspected as much, but I always found the API model name <-> ChatGPT UI selector <-> actual model used correspondence very confusing, and whether I was actually switching models or just some parameters of the harness/model invocation.

> One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.

That's putting it mildly. In my experience, the "instant/chat" model is absolute slop tier, while the "thinking" one is genuinely useful and also has a much more palatable tone (even for things not really requiring a lot of thought).

Fortunately, the latter clearly identifies itself with an absurd amout of emoji reminiscent of other early chatbots that shall not be named, so I know how to detect and avoid it.

xiphias2 148 days ago

Is there a way to get sticky model selection back, or the reason is that it is just too expensive to serve alternative models?

For coding I love codex-5.3-xhigh, but for non-coding prompts I still far prefer o3 even if it's considered a legacy model.

I can imagine that its higher tool use is too expensive to serve, but as a pro user I would love it to come back.

bananaflag 148 days ago

Before GPT-5 was launched, and after sama had said they would unify the ordinary and reasoning models, I think we all expected more than an (auto-)switcher, we expected some small innovation (smaller than the ordinary-to-reasoning one, but still a significant one) that would make both kinds of replies be in a way generated by a single model (don't know exactly how, I expected OpenAI to surprise us with something that would feel obvious in retrospect).

merlindru 148 days ago

but why not have "sane defaults but configurable"?

hide away the extra complexity for everyone. give power users a way to get it back.

dotancohen 148 days ago

The model doesn't even need to be exposed in the UI. Let the user specify "use model foobar-4" or "use a coding model" or "use a middle-tier attorney model".

VIM does this well: no UI, magic incantations to use features.

discardable_dan 148 days ago

How's the war effort?

JCharante 148 days ago

are they really two models? are they available via the API or are they wrappers built on top of models available via the API?

mrcwinn 148 days ago

Do your fully autonomous offensive weapons and domestic surveillance systems use Instant?

Computer0 148 days ago

Not today, but response time would be a lot better if they did.

seejayseesjays 148 days ago

Forgiveness but while you're here can you look into why the Notion connector in chat doesn't have the capability to write pages but the MCP (which I use via Codex) can? it looks like it's entirely possible, just mostly a missing action in the connector.

idiotsecant 148 days ago

none granted.

0xbadcafebee 148 days ago

It's because people like choice and control, and "5.2" vs "5.2 thinking" is confusing. Making them "5.2 instant" and "5.2 thinking" is less confusing to more people. Their competitors already do this (Gemini 3 Fast & Gemini 3 Thinking).

Terretta 148 days ago

ChatGPT 5.2 Intuitive

ChatGPT 5.2 Ponderous

“I had this dream the other night…” – https://www.youtube.com/watch?v=6gYIbMwswKM

NitpickLawyer 148 days ago

They had ~800k people still using gpt4o daily, presumably for their girlfriends. They need to address them somehow. Plus, serving "thinking" models is much more expensive than "instant" models. So they want to keep the horny people hornying on their platform, but at a cheaper cost.

mrits 148 days ago

Are you not vibe coding in girlfriend mode?

kilroy123 148 days ago

I can't fathom using LLMs like this. Does ChatGPT actually do this? I thought people who were into this stuff used dedicated apps or Grok?

bananaflag 148 days ago

https://old.reddit.com/r/ChatGPTNSFW/

Sabinus 148 days ago

https://www.reddit.com/r/MyBoyfriendIsAI/

TrainedMonkey 148 days ago

Will need to wait for real benchmarks, but based on OpenAI marketing Instant is their latency optimized offering. For voice interface, you don't actually need high tok/s because speech is slow, time to first token matters much more.

az226 148 days ago

Instant is a traditional LLM (non-reasoning). Thinking is a reasoning model. The name instant isn’t “instant” lol.

josalhor 148 days ago

Reminder that OpenAI serves a lot of customers for free, most of the people I know use the free tier. There is a big limit on thinking queries on free tier, so a decent non thinking model is probably a positive ROI for them.