Hacker News new | ask | show | jobs
by kkielhofner 1150 days ago
Looking at what they're doing here probably not as much as you think.

As you note, with the plethora of open/open-ish LLMs today and LoRA + PEFT you can fine tune with low VRAM and pretty quickly so even a single A100 or whatever cloud GPUs are just fine. I've even seen people pull it off in reasonable time on super cheap T4s, A10s, etc.

I doubt anyone reading a blog post is attempting to train a "true" multi-billion param LLM from scratch.