| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by radarsat1 1282 days ago
	> trained only on Text-Image pairs and unlabeled videos This is fascinating. It's able to pick up sufficiently on the fundamentals of 3D motion from 2D videos, while only needing static images with descriptions to infer semantics.