I keep seeing these Grok 4 intelligence claims, so I tried something very simple: "Animate a round robin tournament for 10 people."
Results:
Claude: ~10s, perfect working demo
ChatGPT: ~20s, solid solution
Grok 4: ~1000s, failed completely, gave me a truncated base64 blob
This wasn't some obscure edge case... it was basic data visualization that any decent model should handle. Yet somehow Grok 4 is "competing with humans" and has "99% tool accuracy"...