This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image Arena and Image Edit Arena. I'm surprised they didn't mention this leaderboard in the blog post.
I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).