Hacker News new | ask | show | jobs
by deyiao 537 days ago
The benchmark results seem unrealistically good, but I'm not sure from which angles I should challenge them.
1 comments

I think they're real. The model is performing better than claude-3-5-sonnet-20241022 on the claude leaderboard:

https://aider.chat/docs/leaderboards/