| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by podnami 316 days ago
	Wow this was actually blazing fast. I prompted "how can the 45th and 47th presidents of america share the same parents?" On ChatGPT.com o3 thought for for 13 seconds, on OpenRouter GPT OSS 120B thought for 0.7 seconds - and they both had the correct answer.

5 comments

swores 316 days ago

I'm not sure that's a particularly good question for concluding something positive about the "thought for 0.7 seconds" - it's such a simple answer, ChatGPT 4o (with no thinking time) immediately answered correctly. The only surprising thing in your test is that o3 wasted 13 seconds thinking about it.

link

Workaccount2 316 days ago

A current major outstanding problem with thinking models is how to get them to think an appropriate amount.

link

dingnuts 316 days ago

The providers disagree. You pay per token. Verbacious models are the most profitable. Have fun!

link

willy_k 316 days ago

For API users, yes, but for the average person with a subscription or using the free tier it’s the inverse.

link

conradkay 316 days ago

Nowadays it must be pretty large % of usage going through monthly subscriptions

link

nisegami 316 days ago

Interesting choice of prompt. None of the local models I have in ollama (consumer mid range gpu) were able to get it right.

link

golergka 316 days ago

When I pay attention to o3 CoT, I notice it spends a few passes thinking about my system prompt. Hard to imagine this question is hard enough to spend 13 seconds on.

link

Imustaskforhelp 316 days ago

Not gonna lie but I got sorta goosebumps

I am not kidding but such progress from a technological point of view is just fascinating!

link

xpe 316 days ago

How many people are discussing this after one person did 1 prompt with 1 data point for each model and wrote a comment?

What is being measured here? For end-to-end time, one model is:

t_total = t_network + t_queue + t_batch_wait + t_inference + t_service_overhead

link