Y
Hacker News
new
|
ask
|
show
|
jobs
by
jl2718
660 days ago
I think you need higher algorithmic intensity. Gradient descent is best for monolithic GPUs. There could be other possibilities for layer-distributed training.