Hacker News new | ask | show | jobs
by ilaksh 512 days ago
I think that this is the obvious path to more robust models -- grounding language on video.