VideoCLIP: Contrastive Pre-Training for Zero-Shot Video-Text Understanding

Y	Hacker News new \| ask \| show \| jobs

	VideoCLIP: Contrastive Pre-Training for Zero-Shot Video-Text Understanding (arxiv.org)
	1 points by LuisMondragon 1727 days ago

1 comments

Not sure if this is the same thing?

Not the same. CLIP is trained with pairs of images and texts, whereas VideoCLIP uses pairs of videos and texts.