Hacker News new | ask | show | jobs
by bugglebeetle 517 days ago
I used to think this, but using o1 quite a bit lately has convinced me otherwise. It’s been 1-shotting the fairly non-trivial coding problems I throw at it and is good about outputting large, complete code blocks. By contrast, Claude immediately starts nagging you about hitting usage limits after a few back and forth and has some kind of hack in place to start abbreviating code when conversations get too long, even when explicitly instructed to do otherwise. I would imagine that Anthropic can produce a good test time compute model as well, but until they have something publicly available, OpenAI has stolen back the lead.
1 comments

"Their model" here is referring to 4o as o1 is unviable for many production usecases due to latency.