|
|
|
|
|
by mirekrusin
1180 days ago
|
|
Unfortunately training is not emberassingly parallelisable [0] problem. It would require new architecture. Current models diverge too fast. By the time you'd download and/or calculate your contribution the model would descend somewhere else and your delta would not be applicable - based off wrong initial state. It would be great if merge-ability would exist. It would also likely apply to efficient/optimal shrinking for models. Maybe you could dispatch tasks to train on many variations of similar tasks and take average of results? It could probably help in some way, but you'd still have large serialized pipeline to munch through and you'd likely require some serious hardware ie. dual gtx 4090 on client side. [0] https://en.wikipedia.org/wiki/Embarrassingly_parallel |
|
merge-ability does exist and you can average the results.