| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mickeyp 69 days ago

This test would be a lot more useful if the author used images the models obviously hadn't seen before. Pulling images from Wikipedia? They'll have seen 'em before, and the metadata, and all the pages they were casually linked to.

The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer?

At no point does the author contemplate that.

2 comments

vessenes 69 days ago

It might be more useful, but as is, it is still dispositive: 5.5 is significantly worse than o3 at geo-guessing. And the “magic” prompt doesn’t matter that much, at least in o3’s case.

link

vintermann 69 days ago

They say they threw in some indoor images, presumably from around where they were.

link