| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bottlepalm 724 days ago
	It'd be interesting to see how Sonnet 3.5 does at this. I've found Sonnet a step change better than Opus, and for a fraction of the cost. Opus for me is already far better than GPT-4. And same as the poster found, GPT-4o is plain worse at reasoning. Edit: Better at chain of thought, long running agentic tasks, following rigid directions.

2 comments

DHaldane 724 days ago

That's an interesting question - I'll take a few pokes at it now to see if there's improvement.

link

DHaldane 724 days ago

Update: Sonnet 3.5 is better than any other model for the circuit design and part finding tasks. Going to iterate a bit on the prompts to see how much I can push the new model on performance.

Figures that any article written on LLM limits is immediately out of date. I'll write an update piece to summarize new findings.

link

CamperBob2 724 days ago

That name threw me for a loop. 'Sonnet' already means something to EEs ( https://www.sonnetsoftware.com/ ).

link

RF_Savage 723 days ago

Yeah same here. Thought Sonnet had added some ML stuff into their EM simulator.

link

stavros 724 days ago

Opus is better than GPT-4? I've heard mixed experiences.

link

imperio59 724 days ago

That's because the sample size is probably small and for niche prompts or topics.

It's very hard to evaluate whether a model is better than another, especially doing it in a scientifically sound way is time consuming and hard.

This is why I find these types of comments like "model X is so much better than model Y" to be about as useful as "chocolate ice cream is so much better than vanilla"

link

r2_pilot 724 days ago

And both flavors have a base flavor of excrement... Still, since I started using Claude 3 Opus (and now 3.5 Sonnet) a couple of months back, I don't see myself switching from them nor stopping use of LLM-based AI tech; it's just made me feel like the computer is actually working for and with me and even that alone can be enough to get me motivated and accomplish what I set out to do.

link

skapadia 724 days ago

"it's just made me feel like the computer is actually working for and with me and even that alone can be enough to get me motivated and accomplish what I set out to do."

This is a great way to describe what I've been feeling / experiencing as well.

link

r2_pilot 723 days ago

Just an update on my initial impressions of Claude 3.5 Sonnet. It's a better programmer than I am in Python; that's not saying much, but this is now two nights in a row I've been impressed with what I've created with it.

link

stavros 724 days ago

True, I just tried it for generating a book summary, and Sonnet 3.5 was very bad. GPT-4o is equally bad at that , gpt-4-turbo is great.

link

netsec_burn 724 days ago

This more likely has to do with context length?

link

stavros 724 days ago

No, all the information is there, but gpt-4o tends to produce bullet points (https://www.thesummarist.net/summary/the-making-of-a-manager...), whereas gpt-4-turbo tends to produce much more readable prose (https://www.thesummarist.net/summary/supercommunicators/the-...).

link

Obscurity4340 723 days ago

How is prose more readable than bullets?