Y
Hacker News
new
|
ask
|
show
|
jobs
by
jstummbillig
542 days ago
In the wake of the o1 release, and with the old aider benchmark saturating, Paul from aider has created a new, much harder benchmark. o1 dominates by a substantial margin.
https://aider.chat/docs/leaderboards/
https://aider.chat/2024/12/21/polyglot.html