Hacker News new | ask | show | jobs
by vunderba 503 days ago
Please read TFA. The conclusion of the article isn't nearly so simplistic, they're just suggesting that you have to be aware of the natural strengths and weaknesses of LLMs, even multi modal ones particularly around visual pattern recognition vs quantitative pattern recognition.

And yes, the idea that the initial context can sometimes predispose the LLM to consider things in a more narrow manner than a user might otherwise want is definitely well known.

1 comments

The title of the article is "Your AI Can't See Gorillas". That seems demonstrably false.

The article says:

> Furthermore, their data analysis capabilities seem to focus much more on quantitative metrics and summary statistics, and less on the visual structure of the data

Again, this seems false - or, at best, misleading. I had no problem getting AI to focus on visual structure of the data without any tricks. A more fair statement would be "If you ask an AI a bunch of questions about summary statistics and then show it a scatterplot with an image, then it might continue to focus on summary statistics". But that's not what the concluding paragraph states, and it's not what the title states, either.

you knew that there was a visual gag in there before asking it to.

if you didnt know it was there, and took a look at only the text output, the llm would not have found it to tell you its there

Yeah the people "it gives you the answer when you give it the answer" have kind of ruined my morning. Oh well.