Hacker News new | ask | show | jobs
VideoCLIP: Contrastive Pre-Training for Zero-Shot Video-Text Understanding (arxiv.org)
1 points by LuisMondragon 1727 days ago
1 comments

Not sure if this is the same thing?

https://github.com/openai/CLIP

Not the same. CLIP is trained with pairs of images and texts, whereas VideoCLIP uses pairs of videos and texts.