Hacker News new | ask | show | jobs
by Geee 360 days ago
Vision-language-action models seem to be the broad category for the best approach, which basically combines a large vision-language model with robotic actions. For example, see https://www.physicalintelligence.company/blog/pi0