| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by RocketSyntax 2420 days ago
	Deep learning doesn't parallelize well. Would be cool if you could loan CPU cycles on your phone or home computers while at work.

2 comments

pheug 2420 days ago

Actually it parallelizes extremely well, so that large companies are able to create monster models like mentioned in the article in the first place by just throwing money at the problem with TPUs and similar highly parallelized accelerators. It just doesn't lend itself well to distributed computing due to e.g. throughput requirements.

link

RocketSyntax 2420 days ago

That's just vertical scale. Distributed is what I was referring to. See comment below.

link

question_away 2420 days ago

In what way does it not parallelize well? There are mounds of research in federated learning.

link

kyle_grove 2420 days ago

In fact, one of the chief advantages of the BERT/Transformer architecture over ELMO/LSTM is the ability to parallelize.

link

RocketSyntax 2420 days ago

I've read that you can't split up large layers to be trained on separate processors either horizontally (one layer per processor) or vertically (parts of many layers).

link

pheug 2420 days ago

On a shared memory system there's little need to do that - there's much more parallelism to be had from accelerating fine grained operations, like matrix multiplications to compute each layer's output.

On a distributed system, splitting up layers between machines to do distributed training is pretty much what Google initially designed Tensorflow for. Generally it scales less well due to the need to communicate massive amounts of data between nodes and much lower network throughput than what GPU/TPU memory provides.

link

bitL 2420 days ago

RNNs (LSTM/GRU) tend to have issues with scaling. Attention-based models like Transformer on the other hand scale extremely well.

link