Hacker News new | ask | show | jobs
by johnfn 502 days ago
Hm, interesting. The way I tried it was by pasting an image into Claude directly as the start of the conversation, plus a simple prompt ("What do you see here?"). It got the specific image wrong (it thought it was baby yoda, lol), but it did understand that it was an image.

I wonder if the author got different results because they had been talking a lot about a data set before showing the image, which possibly predisposed AI to think that it was a normal data set. In any case, I think that "Your Ai Can't See Gorillas" isn't really a valid conclusion.

1 comments

Please read TFA. The conclusion of the article isn't nearly so simplistic, they're just suggesting that you have to be aware of the natural strengths and weaknesses of LLMs, even multi modal ones particularly around visual pattern recognition vs quantitative pattern recognition.

And yes, the idea that the initial context can sometimes predispose the LLM to consider things in a more narrow manner than a user might otherwise want is definitely well known.

The title of the article is "Your AI Can't See Gorillas". That seems demonstrably false.

The article says:

> Furthermore, their data analysis capabilities seem to focus much more on quantitative metrics and summary statistics, and less on the visual structure of the data

Again, this seems false - or, at best, misleading. I had no problem getting AI to focus on visual structure of the data without any tricks. A more fair statement would be "If you ask an AI a bunch of questions about summary statistics and then show it a scatterplot with an image, then it might continue to focus on summary statistics". But that's not what the concluding paragraph states, and it's not what the title states, either.

you knew that there was a visual gag in there before asking it to.

if you didnt know it was there, and took a look at only the text output, the llm would not have found it to tell you its there

Yeah the people "it gives you the answer when you give it the answer" have kind of ruined my morning. Oh well.