|
|
|
|
|
by idkdotcom
736 days ago
|
|
These seem classic challenges with running distributed systems loads that are not specific to training LLMs. Anyone of the super computers listed here https://en.wikipedia.org/wiki/TOP500 suffers from the same issues. Think about it. While the national labs use these systems to model serious stuff -such as climate or nuclear weapons- Meta uses them to train LLMs. What a joke, honestly! |
|