Hacker News new | ask | show | jobs
by adam_arthur 694 days ago
I've found Claude 3.5 Sonnet actually much worse on average for coding than Claude 3 Opus.

At least for my use case and when interfaced with through Kagi. Much higher hallucination rate.

GPT-4o hallucinates far less than Claude 3 Opus but also seems to have less niche knowledge (I was using it to assist with Groovy+Spock+Spring upgrade)

So I question a lot of the benchmarks published on the newest models. They don't seem to track linearly/accurately with my use cases