Hacker News new | ask | show | jobs
by nok22kon 2 hours ago
LLM are very good at looking at images and reasoning about them. much more than just object recognition/segmentation, they can explain the physics in the image, the intents, plan actions, ...
1 comments

That's because of posttraining optimizing for benchmarks that test that.

They tend to collapse into nonsense and hallucinations pretty quickly if you move slightly out of the envelope of the current visual reasoning benchmaxxing.