Y
Hacker News
new
|
ask
|
show
|
jobs
by
Geee
360 days ago
Vision-language-action models seem to be the broad category for the best approach, which basically combines a large vision-language model with robotic actions. For example, see
https://www.physicalintelligence.company/blog/pi0