| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by exe34 543 days ago
	> CNN will beat ViT on small data tasks, but that flips with enough scale because ViT imposes less inductive bias any idea why this is the case? CNN have the bias that neighbouring pixels are somehow relevant - they are neighbours. ViTs have to re-learn this from scratch. So why do they end up doing better than CNN?