That claim keeps contradicted hard by other parties, who say Mythos beats 5.5 resoundingly on both autonomous search and discovery and creation of complex exploit chains.
There might be a harness difference, but also, this CTF-type benchmark might not capture the capability difference fully.
It's doubtful they have the compute to make mythos publicly available even after the SpaceX datacenter deal. And why sell it publicly if people are still willing to pay for Opus 4.7?
Hold on, I think this claim needs some hard data. Here you go gentlemen:
https://www.aisi.gov.uk/blog/our-evaluation-of-openais-gpt-5...