Hacker News new | ask | show | jobs
by slooonz 368 days ago
They failed hard with Claude 4 IMO. I just can't have any feedback other than "What a fascinating insight" followed by a reformulation (and, to be generous, an exploration) of what I said, even when Opus 3 has no trouble finding limitations.

By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.

2 comments

Agreed that o3 can be brutally honest. If you ask it for direct feedback, even on personal topics, it will make observations that, if a person made them, would be borderline rude.
Isn't that what "direct feedback" means?

I firmly believe you should be able to hit your fingers with a hammer, and in the process learn whether that's a good idea or not :)

Yes. It's definitely a good thing.
o3 can be very honest.

But I also find it can get very fixated that some position it has adopted is right, and will then start hallucinating like crazy in defence of that fixation, and then get stuck in a defensive loop of defending its hallucinations with even more hallucinations-by hallucinations I mean stuff like producing lengthy citation lists of invented articles, and then when you point out they don’t exist, claiming stuff like “well when I search PubMed they do”, and when you point out its DOIs are made-up it apologises for the “mistake” and just makes up some more

Thank god.
Thanks for this, I just tried the same "give me feedback on this text" prompt against both o3 and Claude 4 and o3 was indeed much more useful and much less sycophantic.
Do knowledge cutoff dates matter anymore? The cutoff for o3 was 12 months ago, while the cutoff for Claude 4 was five months ago. I use these models mostly for development (Swift, SwiftUI, and Flutter), and these frameworks are constantly evolving. But with the ability to pull in up-to-date docs and other context, is the knowledge cutoff date still any kind of relevant factor?
I understood from the ancestor comments that they are specifically talking about aspects of answer quality that are very unlikely to be related to the training cut-off date.

Unless you're talking about AI-generated training data, maybe.

Um, yeah... I made a faulty context switch there.