Hacker News new | ask | show | jobs
by zone411 372 days ago
Omproves on the Extended NYT Connections benchmark compared to both Gemini 2.5 Pro Exp (03-25) and Gemini 2.5 Pro Preview (05-06), scoring 58.7. The decline observed between 03-25 and 05-06 has been reversed - https://github.com/lechmazur/nyt-connections/.