| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by encroach 191 days ago

This outperforms Gemini 3 pro image (nano banana pro) on Text-to-Image Arena and Image Edit Arena. I'm surprised they didn't mention this leaderboard in the blog post.

I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).

https://lmarena.ai/leaderboard/text-to-image

https://lmarena.ai/leaderboard/image-edit

2 comments

ygouzerh 190 days ago

The score are really, really close, it might be why

link

nycdatasci 191 days ago

The arena concept doesn’t work for image models due to watermarks.

link

encroach 191 days ago

There are no watermarks in the arena.

link

nycdatasci 190 days ago

There are no visible watermarks, but model makers can use steganographic codes to identify outputs from their own models.

link

nycdatasci 190 days ago

Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

https://arxiv.org/pdf/2510.06525

link

encroach 190 days ago

This is true, however LMArena does employ some methods to mitigate attempts to manipulate the leaderboard, see https://openreview.net/forum?id=zf9zwCRKyP

They also control for style https://news.lmarena.ai/sentiment-control/

link