| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CSMastermind 425 days ago

Okay, I decided to benchmark a bunch of AI models with geoguessr. One round each on diverse world, here's how they did out of 25,000:

Claude 3.7 Sonnet: 22,759

Qwen2.5-Max: 22,666

o3-mini-high: 22,159

Gemini 2.5 Pro: 18,479

Llama 4 Maverick: 14,316

mistral-large-latest: 10,405

Grok 3: 5,218

Deepseek R1: 0

command-a-03-2025: 0

Nova Pro: 0

3 comments

nemo1618 425 days ago

Neat, thanks for doing this!

msephton 425 days ago

How does Google Lens compare?

CSMastermind 425 days ago

I tried it but as far as I can tell Google Lens doesn't give you a location - it just describes generally what you're looking at.

msephton 422 days ago

I had cause to try Google Lens today and found the location to exact address thanks to a veterinary clinic which was in the background of an image. ChatGPT got the country but wrong city.

arresin 425 days ago

What about 04-mini-high ?

CSMastermind 425 days ago

OpenAI's naming confuses me but I ran o4-mini-2025-04-16 through a game and it got 23,885

arresin 424 days ago

Interesting. It supports what they said (this is the model with good visual reasoning)