Y
Hacker News
new
|
ask
|
show
|
jobs
by
deyiao
537 days ago
The benchmark results seem unrealistically good, but I'm not sure from which angles I should challenge them.
1 comments
ai-christianson
537 days ago
I think they're real. The model is performing better than claude-3-5-sonnet-20241022 on the claude leaderboard:
https://aider.chat/docs/leaderboards/
link
https://aider.chat/docs/leaderboards/