Hacker News new | ask | show | jobs
by SoftTalker 138 days ago
LLMs are trained on text. Why would we expect them to understand a visual and tactile 3D world?
1 comments

Because they’re also multimodal vLLMs.