Hacker News new | ask | show | jobs
by boroboro4 437 days ago
They updated the paper and included Gemini 2.5. It's the only model which got non trivial score (mostly solved one problem) - 10/42.