Hacker News new | ask | show | jobs
by podnami 316 days ago
Wow this was actually blazing fast. I prompted "how can the 45th and 47th presidents of america share the same parents?"

On ChatGPT.com o3 thought for for 13 seconds, on OpenRouter GPT OSS 120B thought for 0.7 seconds - and they both had the correct answer.

5 comments

I'm not sure that's a particularly good question for concluding something positive about the "thought for 0.7 seconds" - it's such a simple answer, ChatGPT 4o (with no thinking time) immediately answered correctly. The only surprising thing in your test is that o3 wasted 13 seconds thinking about it.
A current major outstanding problem with thinking models is how to get them to think an appropriate amount.
The providers disagree. You pay per token. Verbacious models are the most profitable. Have fun!
For API users, yes, but for the average person with a subscription or using the free tier it’s the inverse.
Nowadays it must be pretty large % of usage going through monthly subscriptions
Interesting choice of prompt. None of the local models I have in ollama (consumer mid range gpu) were able to get it right.
When I pay attention to o3 CoT, I notice it spends a few passes thinking about my system prompt. Hard to imagine this question is hard enough to spend 13 seconds on.
Not gonna lie but I got sorta goosebumps

I am not kidding but such progress from a technological point of view is just fascinating!

How many people are discussing this after one person did 1 prompt with 1 data point for each model and wrote a comment?

What is being measured here? For end-to-end time, one model is:

t_total = t_network + t_queue + t_batch_wait + t_inference + t_service_overhead