| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by b7894 254 days ago

Gemini 3.0 Pro (or what is deemed to be 3.0 Pro - you can get access to it via A/B testing on AI Studio) does a noticeably better job

https://x.com/cannn064/status/1972349985405681686

https://x.com/whylifeis4/status/1974205929110311134

https://x.com/cannn064/status/1976157886175645875

4 comments

rozab 253 days ago

It was Google that featured a bicycling pelican in a presentation a few months back:

https://simonwillison.net/2025/Jun/6/six-months-in-llms/#ai-...

So I think the benchmark can be considered dead as far as Gemini goes

fellowmartian 254 days ago

There’s obviously no improvement on this metric and hasn’t been in a while.

jiggawatts 254 days ago

How do people trigger A/B testing?

simonw 254 days ago

As far as I can tell they just keep on hammering the same prompt in https://aistudio.google.com/ until they get lucky and the A/B test triggers for them on one of those prompts.

qingcharles 254 days ago

That 2nd one is wild.

Ugh. I hate this hype train. I'll be foaming at the mouth with excitement for the first couple of days until the shine is off.