| HN Mirror

I just finished the 100x benchmark across 4 frontier models here. Gemini 3.1 Pro + GPT 5.5 + Opus 4.7 are all quite similar but Grok is an odd ball: https://zonted.com/posts/three-of-four-ais-same-person/