One would think. I suppose OpenAI threw the majority of their compute budget at producing and verifying solutions. It would certainly be interesting to see whether or not this new model can distill its responses to just those steps necessary to convey its result to a given audience.
It reads like someone who found the correct answer but seemingly had no understanding of what they did and just handed in the draft paper.
Which seems odd, shouldn't an LLM be better at prose?