|
|
|
|
|
by dekhn
1180 days ago
|
|
A few people have built frameworks to do this. There is still a very large open problem in how to federate large numbers of loosely coupled computers to speed up training "interesting" models. I've worked in both domains (protein folding via Folding@Home/protein folding using supercomputers, and ML training on single nodes/ML training on supercomputers) and at least so far, ML hasn't really been a good match for embarrassingly parallel compute. Even in protein folding, folding@home has a number of limitations that are much better addressed on supercomputers (for example: if your problem requires making extremely long individual simulations of large proteins). All that could change, but I think for the time being, interesting/big models need to be trained on tightly coupled GPUs. |
|