Hacker News new | ask | show | jobs
by jl2718 660 days ago
I think you need higher algorithmic intensity. Gradient descent is best for monolithic GPUs. There could be other possibilities for layer-distributed training.