Hacker News new | ask | show | jobs
by taberiand 502 days ago
I don't know why you would expect it to see a gorilla without an image to look at. Humans can't.
1 comments

Without an image? No, not at all. It's supposed to make its own image. And it did make its own image. But it didn't properly analyze the image it made.
That's a feature that would need to be implemented. There's no reason to think it could look at the image of the plot it generated automatically, but feeding it the image it generated back to it is no different to if it did view it automatically
The point of telling it to explore the data is so I don't have to think of every angle myself. Humans can get an understanding from visuals that LLMs can't match, apparently, even without gimmicks.
The llm is able to see the gorilla when shown the image in the same way you would show a human an image.

Imagine if you gave someone the raw data and told them to write code to graph the output but on to a screen they couldn't see. They would not be able to tell you it's a gorilla until you turn the monitor around and show them.

Humans are still better at seeing the image, sure (for now), but the llm is a tool with certain features and abilities. You can't make up a scenario that is misusing the tool and then pretend that it doesn't work - especially when it seems you want it to use it without applying your own brain power to the process

And to be clear, I'm open to criticism of llms and exploration of their limitations - but I'm tired of hearing complaints that amount to PEBKAC.

When I tell a human to analyze the data, I sure don't expect them to interpret it as "write code to graph it to a screen you can't see". You found the problem but glossed right over it.

> misusing the tool and then pretend that it doesn't work

It was told to analyze and then it did a bad job of analyzing. I don't care if an LLM expert expects this already, it's worth pointing out to everyone else. It's not PEBKAC.

If a user misunderstands the purpose and value of a tool, this is PEBKAC.