| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nok22kon 2 hours ago
	LLM are very good at looking at images and reasoning about them. much more than just object recognition/segmentation, they can explain the physics in the image, the intents, plan actions, ...

1 comments

Chu4eeno 2 hours ago

That's because of posttraining optimizing for benchmarks that test that.

They tend to collapse into nonsense and hallucinations pretty quickly if you move slightly out of the envelope of the current visual reasoning benchmaxxing.

link