|
I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes. Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon. You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems: https://critpt.com/ Frontier models are still nowhere near solving it, but progress has been rapid. * o3 (high) <1.5 years ago was at 1.4% * GPT 5.4 (xhigh), 23.4% * GPT-5.5 (xhigh), 27.1% * GPT-5.5 Pro (xhigh) 30.6%. https://artificialanalysis.ai/evaluations/critpt. |
Wrong. Every advancement has followed a s curve. Where we are on that curve is anyones guess. Or maybe "this time its different".