Hacker News new | ask | show | jobs
by adverbly 435 days ago
This does look like a large relative increase in score, but it seems like it comes from getting zero correct out of 6 to getting 1 and 1/2 correct. I think it's fair to say the sample size here is relatively small. Still, a record is a record! Congrats to the team for a new record!
1 comments

From my small sample size (tens of queries per day), Gemini 2.5 seems like a noticeable improvement in (almost) every way compared to to previous Gemini models.

Answers do seem to take longer to generate, but well worth the cost.