Hacker News new | ask | show | jobs
by stavros 724 days ago
Opus is better than GPT-4? I've heard mixed experiences.
3 comments

That's because the sample size is probably small and for niche prompts or topics.

It's very hard to evaluate whether a model is better than another, especially doing it in a scientifically sound way is time consuming and hard.

This is why I find these types of comments like "model X is so much better than model Y" to be about as useful as "chocolate ice cream is so much better than vanilla"

And both flavors have a base flavor of excrement... Still, since I started using Claude 3 Opus (and now 3.5 Sonnet) a couple of months back, I don't see myself switching from them nor stopping use of LLM-based AI tech; it's just made me feel like the computer is actually working for and with me and even that alone can be enough to get me motivated and accomplish what I set out to do.
"it's just made me feel like the computer is actually working for and with me and even that alone can be enough to get me motivated and accomplish what I set out to do."

This is a great way to describe what I've been feeling / experiencing as well.

Just an update on my initial impressions of Claude 3.5 Sonnet. It's a better programmer than I am in Python; that's not saying much, but this is now two nights in a row I've been impressed with what I've created with it.
True, I just tried it for generating a book summary, and Sonnet 3.5 was very bad. GPT-4o is equally bad at that , gpt-4-turbo is great.
This more likely has to do with context length?
No, all the information is there, but gpt-4o tends to produce bullet points (https://www.thesummarist.net/summary/the-making-of-a-manager...), whereas gpt-4-turbo tends to produce much more readable prose (https://www.thesummarist.net/summary/supercommunicators/the-...).
How is prose more readable than bullets?
* Clearer narrative

* Connection between points

* Flows better

* Eyes don't start-stop as much

It really depends on the type of question, but generally I'm between Gemini and Claude these days for most things.
Opus 3.5 is not yet released.
I assume the GP was talking about 3.0.