| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slooonz 368 days ago
	They failed hard with Claude 4 IMO. I just can't have any feedback other than "What a fascinating insight" followed by a reformulation (and, to be generous, an exploration) of what I said, even when Opus 3 has no trouble finding limitations. By comparison o3 is brutally honest (I regularly flatly get answers starting with "No, that’s wrong") and it’s awesome.

2 comments

SamPatt 368 days ago

Agreed that o3 can be brutally honest. If you ask it for direct feedback, even on personal topics, it will make observations that, if a person made them, would be borderline rude.

link

silversmith 368 days ago

Isn't that what "direct feedback" means?

I firmly believe you should be able to hit your fingers with a hammer, and in the process learn whether that's a good idea or not :)

link

SamPatt 368 days ago

Yes. It's definitely a good thing.

link

skissane 368 days ago

o3 can be very honest.

But I also find it can get very fixated that some position it has adopted is right, and will then start hallucinating like crazy in defence of that fixation, and then get stuck in a defensive loop of defending its hallucinations with even more hallucinations-by hallucinations I mean stuff like producing lengthy citation lists of invented articles, and then when you point out they don’t exist, claiming stuff like “well when I search PubMed they do”, and when you point out its DOIs are made-up it apologises for the “mistake” and just makes up some more

Thank god.

Thanks for this, I just tried the same "give me feedback on this text" prompt against both o3 and Claude 4 and o3 was indeed much more useful and much less sycophantic.

link

WaltPurvis 368 days ago

Do knowledge cutoff dates matter anymore? The cutoff for o3 was 12 months ago, while the cutoff for Claude 4 was five months ago. I use these models mostly for development (Swift, SwiftUI, and Flutter), and these frameworks are constantly evolving. But with the ability to pull in up-to-date docs and other context, is the knowledge cutoff date still any kind of relevant factor?

link

moritzwarhier 368 days ago

I understood from the ancestor comments that they are specifically talking about aspects of answer quality that are very unlikely to be related to the training cut-off date.

Unless you're talking about AI-generated training data, maybe.

link

WaltPurvis 368 days ago

Um, yeah... I made a faulty context switch there.

link