https://microsoft.ai/wp-content/uploads/2026/06/main_2026060...
https://microsoft.ai/news/building-a-hillclimbing-machine-la...
Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs.
https://microsoft.ai/wp-content/uploads/2026/06/main_2026060...