Hacker News new | ask | show | jobs
by thegeomaster 308 days ago
Gemini 2.5 Pro is severely kneecapped in this evaluation. Limit of 4096 thinking tokens is way too low; I bet o3 is generating significantly more.
1 comments

For o3, I set reasoning_effort "high" and it's usually 1000-2000 reasoning tokens for routine coding questions.

I've only seen it go above 5000 for very difficult style transfer problems where it has to wrangle with the micro-placement of lots of text. Or difficult math problems.