Hacker News new | ask | show | jobs
by NitpickLawyer 304 days ago
There's swe-rebench, where they take "bugs/issues" by date, and you can drag a slider on their top scores to see issues solved after the model was released (obviously only truly working for open models).