Hacker News new | ask | show | jobs
by beklein 815 days ago
More like a cousin of LLMs are Vision-Language-Action (VLA) models like RT-2 [1]. Additionally to text and vision data they also include data from robot actions as "another language" as tokens to output movement actions for robots.

[1]: https://robotics-transformer2.github.io