Hacker News new | ask | show | jobs
by nico 1188 days ago
> The code runs on a 8xA100 80GB, but can also run on 8xA10040GB or 4xA100 with lower batch size and gradient accumulation steps. To get the GPUs, I suggest using Lambda Labs, best pricing for the best hardware.

I wonder how much it was total in $ for the fine-tuning.

Also, does anyone have some sort of table/formula that relates MB/GB of training data to $ for fine-tuning?

4 comments

Stanford only spent $500 to fine-tune LLAMA for humam instruction with 52k instructions generated by GPT-3. This probably costs less. The use of GPT to generate the instruction data instead of humans is the massive cost reduction. The actual training for fine-tuning on GPUs is relatively cheap.
Most of that was getting the data, the training would cost something like $100 if memory serves.
All included it costs under 70$ for the 13B model. Training 65B now so will report what that will cost.
For the 65B fine tune, did you add another A100 node? Or just drop batch size?

Any chance you’re up to sharing the training parameters?

Dropping the batch size
Please do! Also please include how you’re calculating the costs.
Probably in the hundreds of dollar for 7B model, and may be a thousand or two for the 13B at worst
Far far less. Alpaca-7B's compute cost was around $60-$70 for Stanford and around $0.60 (yes 60 cents) for equivalent fine tunes using the Parameter Efficient Fine Tuning (PEFT) strategy of Low Rank Adapters (LoRA).

The repo above can be replicated for similar costs. Easily less than $10 for up to 30B using LoRA (which requires only 24GB of VRAM for 30B/33B and smaller).

I thought so too, but for newcomers, they should expect to train model a dozen times or so :-)
I am interested in this. What would be the cost for the best model possible by the public?
I asked in the issues, let's see