Hacker News new | ask | show | jobs
by avbanks 488 days ago
I still find 3.5 Sonnet the best for my coding tasks (better than o1, o3-mini, and R1). The other models might be trying to game system and fine tune the models for the benchmarks.
1 comments

Would love to know just how overfit a lot of them are on these benchmarks