| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jl2718 708 days ago
	I think you need higher algorithmic intensity. Gradient descent is best for monolithic GPUs. There could be other possibilities for layer-distributed training.