| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rfoo 483 days ago
	Do they even have an optimized backward? It looks like optimizations like this aren't needed during training. Their V2 paper also suggests so.