|
|
|
|
|
by Zambyte
3 days ago
|
|
Excuse my ignorance if by "distributed training" you mean a specific process, but couldn't this be considered a step toward distributed training? If nations train models independently and then later distill them into a single model, all the work (both the compute and the research processes) are distributed for the initial training phase. |
|
I don't think your approach would work because you can't create a strong model from distilling several weak models.
https://www.primeintellect.ai/blog/intellect-1
https://www.primeintellect.ai/blog/intellect-2-release