|
|
|
|
|
by adam_arthur
694 days ago
|
|
I've found Claude 3.5 Sonnet actually much worse on average for coding than Claude 3 Opus. At least for my use case and when interfaced with through Kagi. Much higher hallucination rate. GPT-4o hallucinates far less than Claude 3 Opus but also seems to have less niche knowledge (I was using it to assist with Groovy+Spock+Spring upgrade) So I question a lot of the benchmarks published on the newest models. They don't seem to track linearly/accurately with my use cases |
|