| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fbodz 1173 days ago
	Has anyone figured out a way to fine tune this with 24gb of vram? I have tried with deepspeed etc but no luck. Seems to be just out of reach for fine tuning requiring 26gb.

2 comments

csdvrx 1173 days ago

Have you tried quantization? It's often a cheap and simple way to reduce the VRAM requirements.

What hardware are you using? (CPU,RAM,GPU,VRAM)

Have you considered using llama.cpp for a mixed CPU+GPU use (if you have enough RAM)

link

fbodz 1173 days ago

Yeah I am using the default training script with int8 quantisation. It uses peft with lora but this still requires 26gb

link

int_19h 1173 days ago

I'm not sure about this model specifically, but training with 4-bit quantization has been a thing with LLaMA for a while now, although the setup involves manual hacks of various libraries.

link

freeqaz 1173 days ago

Is it possible to offload some layers to CPU and still train in a reasonable amount of time?

link

generalizations 1173 days ago

There’s also that pruning tool that was on hn in the last couple weeks. It seemed to work really well on the larger models, and could reduce size by 30-50%

link

nl 1173 days ago

You probably don't want to fine-tune a quantized model. They are fine for inference but not great for training.

link

mirekrusin 1173 days ago

People should be training model sizes that fit-and-fill consumer GPUs, ie:

2x 24G - for dual GPU ~ 28B model

1x 24G ~ 14B model

etc.

link