|
|
|
|
|
by alchemist1e9
1163 days ago
|
|
Can they mathematically be “mushed” and then create an improved model? I have yet to understand the difference between fine tuning and training and therefore yet to understand if a distributed decentralized eventually consistent training approach is a possibility or simply not realistic. |
|
It becomes an empirical engineering question how many parallel nodes you can train on for how long before averaging them back together. It's an expensive question to answer, since you have to train many variations to get the data.