| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SoftTalker 185 days ago
	LLMs are trained on text. Why would we expect them to understand a visual and tactile 3D world?

1 comments

Because they’re also multimodal vLLMs.