Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.
People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.