Hacker News new | ask | show | jobs
by encroach 191 days ago
This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image Arena and Image Edit Arena. I'm surprised they didn't mention this leaderboard in the blog post.

I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).

https://lmarena.ai/leaderboard/text-to-image

https://lmarena.ai/leaderboard/image-edit

2 comments

The score are really, really close, it might be why
The arena concept doesn’t work for image models due to watermarks.
There are no watermarks in the arena.
There are no visible watermarks, but model makers can use steganographic codes to identify outputs from their own models.
Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

https://arxiv.org/pdf/2510.06525

This is true, however LMArena does employ some methods to mitigate attempts to manipulate the leaderboard, see https://openreview.net/forum?id=zf9zwCRKyP

They also control for style https://news.lmarena.ai/sentiment-control/