| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BinRoo 543 days ago
	Are you insinuating Gemini is similar in performance to o3-mini?

3 comments

panarky 543 days ago

I've only had o3-mini for a day, but Gemini 2.0 Flash Thinking is still clearly better for my use cases.

And it's currently free in aistudio.google.com and in the API.

And it handles a million tokens.

link

xnx 543 days ago

Definitely varies by application, but the blind "taste test" vibes are very good for Gemini: https://lmarena.ai/?leaderboard

link

anabab 543 days ago

that reminds me that a week ago there was a (now deleted but has a copy of the content available in the comments) post on Reddit where the author claimed they have attempted manipulating/manipulated voting on lmarena in favor of Gemini to tip the scale on Polymarket where on a question like "which AI model will be the best one by $date" (with the outcome decided based on the scoring on lmarena) they have supposedly made O(USD10k).

Original deleted post: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

A copy of the content: https://old.reddit.com/r/MachineLearning/comments/1i83mhj/lm...

link

gerdesj 543 days ago

Are you implying it isn't?

(evidence please, everyone)

link

BinRoo 543 days ago

Simple example: o3-mini-high gets this [1] right, whereas Gemini 2.0 Flash 01-21 gets it wrong.

[1] https://chatgpt.com/share/679d9579-5bb8-8008-ac4a-38cef65b45...

link

xnx 543 days ago

Great example. Thank you. Can confirm that none of the Gemini models warned about the exception without prompting.

link

maeil 543 days ago

This agrees with my limited testing so far, but in a different way: o3 being better at coding and objective tasks, with the most recent Flash 2.0-thinking stronger at subjective tasks. Similarly, o3 seems better at shorter output sizes, but drops off, tending to be lazy.

link