Hacker News new | ask | show | jobs
by b7894 254 days ago
Gemini 3.0 Pro (or what is deemed to be 3.0 Pro - you can get access to it via A/B testing on AI Studio) does a noticeably better job

https://x.com/cannn064/status/1972349985405681686

https://x.com/whylifeis4/status/1974205929110311134

https://x.com/cannn064/status/1976157886175645875

4 comments

It was Google that featured a bicycling pelican in a presentation a few months back:

https://simonwillison.net/2025/Jun/6/six-months-in-llms/#ai-...

So I think the benchmark can be considered dead as far as Gemini goes

There’s obviously no improvement on this metric and hasn’t been in a while.
How do people trigger A/B testing?
As far as I can tell they just keep on hammering the same prompt in https://aistudio.google.com/ until they get lucky and the A/B test triggers for them on one of those prompts.
That 2nd one is wild.

Ugh. I hate this hype train. I'll be foaming at the mouth with excitement for the first couple of days until the shine is off.