Hacker News new | ask | show | jobs
by ftxbro 1150 days ago
How expensive is it? My understanding is that it's not reasonable to train an LLM from scratch by yourself, and that if you want one that isn't just very stupid then you need to spend between hundreds of thousands and hundreds of millions of dollars. But if you don't want to train from scratch then you can fine-tune existing models for cheaper.
2 comments

Disclaimer: I work for MosaicML (MosaicML is the creator of the training platform used by Replit).

Training these models from scratch on your domain specific data is not as expensive as one might think. We have provided some cost estimates in our blogs.

https://www.mosaicml.com/blog/mosaicbert

https://www.mosaicml.com/blog/training-stable-diffusion-from...

https://www.mosaicml.com/blog/gpt-3-quality-for-500k

Do you have any examples on how to train a model that can write code but in a specific domain? Eg I only want to train it on a specific set of code. Eg let’s say functional React components in TypeScript.
We recently released 1B parameter model trained on a mix of data.[1] If you got your domain-specific data, our platform can cover the rest.

[1]: https://twitter.com/jefrankle/status/1649060478910357504?s=4...

But do you have any examples of how to do this? I am a pretty seasoned dev, but never trained a model before :)
Thank you this is very interesting!
Looking at what they're doing here probably not as much as you think.

As you note, with the plethora of open/open-ish LLMs today and LoRA + PEFT you can fine tune with low VRAM and pretty quickly so even a single A100 or whatever cloud GPUs are just fine. I've even seen people pull it off in reasonable time on super cheap T4s, A10s, etc.

I doubt anyone reading a blog post is attempting to train a "true" multi-billion param LLM from scratch.