|
|
|
|
|
by beklein
815 days ago
|
|
More like a cousin of LLMs are Vision-Language-Action (VLA) models like RT-2 [1].
Additionally to text and vision data they also include data from robot actions as "another language" as tokens to output movement actions for robots. [1]: https://robotics-transformer2.github.io |
|