| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by smj-edison 120 days ago
	Could it be since a lot of the data is trained on captions? At least if I'm remembering correctly, that's what they use to create the association between what's seen and what's said.