|
|
|
|
|
by mirekrusin
767 days ago
|
|
No. Training is offset relative to starting point. If you distribute it from same point you'll have bunch of unrelated offsets. It has to be serial - output state of one training is input state of the next. If you could do it, we'd already have SETI like networks for AI. |
|
Most modern large models cannot be trained on one instance of anything (GPU, accelerators, whatever), so there's no alternative to distributed training. They also wouldn't even fit in the memory of one GPU/accelerator, so there are even more complex ways to split the model across instances.