The International Mathematical Olympiad challenges should be pretty safely out of distribution. Gemini and OpenAI's best research models both scored gold on that this year.
When they make a model with those abilities publicly available, I'll happily experiment with it, and I'd anticipate reporting that it is a lot better than what I experienced in the past.