Hacker News new | ask | show | jobs
by sumo43 750 days ago
seems like an improvement on the aloha approach? You still need to finetune it on roughly the same amount of OOD examples. Contrast this with google's approach over 2023, which was training large vision-language models with the goal of generalizing on OOD.