| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shagie 1181 days ago

SETI@home (and similar projects) fall into the domain of embarrassingly parallelizable ( https://en.wikipedia.org/wiki/Embarrassingly_parallel ).

My own experience with this was a distributed ray tracer where the server sent the full model to the machines and then each machine would ask for one scan line to do, report back, and then ask for another scan line and repeated.

There was no interaction between the machines - what was on one scan line didn't need any coordination with what was on another scan line.

Likewise, with SETI@home, the server could give you a chunk of data and you could analyze that chunk - the contents of another chunk of data didn't change the analysis being done on this one.

Furthermore, these can be done asynchronously and then assembled when everything is done. Only the very final product / analysis / artifact needs all of the data and nothing other than the end process is waiting on any sub process.

For doing gradient descent ( https://www.3blue1brown.com/lessons/gradient-descent ), as I understand it, each iteration is dependent on the previous one.

Doing 13,002 dimensional (for the example of a 784 -> 16 -> 16 -> 10 neuron net digit recognizer in the 3b1b page) matrix math is the parallel part... but and if you get into the billions of parameters it gets much larger. Matrix multiplication has difficulty across a network. For example - http://www.lac.inpe.br/~stephan/CAP-372/Fox_example.pdf and http://www.cs.csi.cuny.edu/~gu/teaching/courses/csc76010/sli...

> We are now ready for the second stage. In this stage, we broadcast the next column (mod n) of A across the processes and shift-up (mod n) the B values.

That use of "broadcast" - the matrix multiplication is limited by the speed of the slowest node and it needs to send all the data from the previous calculation to all the nodes making it difficult to use across a network that experiences latency.

When doing ML training, they most of TB/sec of bandwidth... and the high end extremes are in PB/sec ( https://www.cerebras.net/product-chip/ ) ... and I'm sitting here watching Steam download.

The inefficiencies of the network, slow computers, and amount of data transfer to preform the next calculation make network distributed machine learning "not a good choice" at this time.