| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ftxbro 1196 days ago
	How expensive is it? My understanding is that it's not reasonable to train an LLM from scratch by yourself, and that if you want one that isn't just very stupid then you need to spend between hundreds of thousands and hundreds of millions of dollars. But if you don't want to train from scratch then you can fine-tune existing models for cheaper.

2 comments

dkhudia 1196 days ago

Disclaimer: I work for MosaicML (MosaicML is the creator of the training platform used by Replit).

Training these models from scratch on your domain specific data is not as expensive as one might think. We have provided some cost estimates in our blogs.

https://www.mosaicml.com/blog/mosaicbert

https://www.mosaicml.com/blog/training-stable-diffusion-from...

https://www.mosaicml.com/blog/gpt-3-quality-for-500k

link

moltar 1196 days ago

Do you have any examples on how to train a model that can write code but in a specific domain? Eg I only want to train it on a specific set of code. Eg let’s say functional React components in TypeScript.

link

dskhudia 1196 days ago

We recently released 1B parameter model trained on a mix of data.[1] If you got your domain-specific data, our platform can cover the rest.

[1]: https://twitter.com/jefrankle/status/1649060478910357504?s=4...

link

moltar 1196 days ago

But do you have any examples of how to do this? I am a pretty seasoned dev, but never trained a model before :)

link

ftxbro 1196 days ago

Thank you this is very interesting!

link

kkielhofner 1196 days ago

Looking at what they're doing here probably not as much as you think.

As you note, with the plethora of open/open-ish LLMs today and LoRA + PEFT you can fine tune with low VRAM and pretty quickly so even a single A100 or whatever cloud GPUs are just fine. I've even seen people pull it off in reasonable time on super cheap T4s, A10s, etc.

I doubt anyone reading a blog post is attempting to train a "true" multi-billion param LLM from scratch.

link