Hacker News new | ask | show | jobs
by jstummbillig 542 days ago
In the wake of the o1 release, and with the old aider benchmark saturating, Paul from aider has created a new, much harder benchmark. o1 dominates by a substantial margin.

https://aider.chat/docs/leaderboards/ https://aider.chat/2024/12/21/polyglot.html