| Hey there! Higgsfield AI. We have a massive GPU cluster and developed our own infrastructure to manage the cluster and train massive models. There's how it works: - You upload the dataset with preconfigured format into HuggingFaсe [1].
Choose your LLM (e.g. LLaMa 70B, Mistral 7B) - Place your submission into the queue - Wait for it to get trained. - Then you get your trained model there on HuggingFace. Essentially, why would we want to do it? We already have an experience with training big LLMs. We could achieve near-perfect infrastructure performance for training. Sometimes GPUs have just nothing to train. Thus we thought it would be cool if we could utilize our GPU cluster 100%. And give back to Open Source community (already built an e2e distributed training framework [2]). This is in an early stage, so you can expect some bugs. Any thoughts, opinions, or ideas are quite welcome! [1]: https://github.com/higgsfield-ai/higgsfield/blob/main/tutori... [2]: https://github.com/higgsfield-ai/higgsfield |