| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nico 1188 days ago

> The code runs on a 8xA100 80GB, but can also run on 8xA10040GB or 4xA100 with lower batch size and gradient accumulation steps. To get the GPUs, I suggest using Lambda Labs, best pricing for the best hardware.

I wonder how much it was total in $ for the fine-tuning.

Also, does anyone have some sort of table/formula that relates MB/GB of training data to $ for fine-tuning?

4 comments

menzoic 1188 days ago

Stanford only spent $500 to fine-tune LLAMA for humam instruction with 52k instructions generated by GPT-3. This probably costs less. The use of GPT to generate the instruction data instead of humans is the massive cost reduction. The actual training for fine-tuning on GPUs is relatively cheap.

link

IanCal 1188 days ago

Most of that was getting the data, the training would cost something like $100 if memory serves.

link

sahil_chaudhary 1188 days ago

All included it costs under 70$ for the 13B model. Training 65B now so will report what that will cost.

link

alex_sf 1188 days ago

For the 65B fine tune, did you add another A100 node? Or just drop batch size?

Any chance you’re up to sharing the training parameters?

link

sahil_chaudhary 1188 days ago

Dropping the batch size

link

sillysaurusx 1188 days ago

Please do! Also please include how you’re calculating the costs.

link

NhanH 1188 days ago

Probably in the hundreds of dollar for 7B model, and may be a thousand or two for the 13B at worst

link

MacsHeadroom 1188 days ago

Far far less. Alpaca-7B's compute cost was around $60-$70 for Stanford and around $0.60 (yes 60 cents) for equivalent fine tunes using the Parameter Efficient Fine Tuning (PEFT) strategy of Low Rank Adapters (LoRA).

The repo above can be replicated for similar costs. Easily less than $10 for up to 30B using LoRA (which requires only 24GB of VRAM for 30B/33B and smaller).

link

NhanH 1187 days ago

I thought so too, but for newcomers, they should expect to train model a dozen times or so :-)

link

Art9681 1188 days ago

I am interested in this. What would be the cost for the best model possible by the public?

link

fareesh 1188 days ago

I asked in the issues, let's see

link