| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lvncelot 1201 days ago
	A few months (weeks?) ago I would've said that this already was the case for language models. It's absolutely mind-blowing to me what is happening here - same with stable diffusion. Once Dall-E was out, I was sure that there was no way that anything like this could be run on consumer hardware. I'm very happy to be proven wrong. In a way, things are still moving in this direction, though. 8 or so years ago it was more or less possible to train those models yourself to a certain degree of usefulness, as well, and I think we've currently moved way past any feasibility for that.

2 comments

MacsHeadroom 1201 days ago

LLaMA can be fine tuned in hours on a consumer GPU or in a free Colab with just 12GB of VRAM, and soon 6GB in 4bit training, using PEFT.

https://github.com/zphang/minimal-llama#peft-fine-tuning-wit...

link

TuringTest 1201 days ago

Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.

This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).

link