Hacker News new | ask | show | jobs
by kingcai 1156 days ago
ML training is not as easily parallelizable as the other problems that have been explored. I'm not familiar with SETI but I know this to be true for folding@home.

As you mentioned, ML training can be parallelized but this requires either model/data parallelism.

Data parallelism means spreading the data over many different compute units and then synchronizing gradients somehow. The heterogeneous nature of @home computing makes this particularly challenging, as you will be limited by the smallest compute unit. I've personally only ever seen data (and model) parallel done on a homogenous compute cluster (i.e. 8x GPUS)

For model parallelism, we split the model across different compute units. However, this means that you need to synchronize the different parts of the model together, which can get very expensive when you do it across the internet. If you have 8xGPUS on one machine, your latency is limited by PCIe instead of TCP/IP in a distributed @home cluster.

But I would say it's not impossible, someone clever could definitely figure it out.

1 comments

Why wouldn't it work for CPU models?