| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eiz 1159 days ago
	https://arxiv.org/pdf/2302.13971.pdf table 15. 1770394 A100-80GB hours to train the entire model suite at the going rate for cloud 8xA100-80GBs (~$12/hr if you could actually get capacity) is ~$2.6M, under extremely optimistic assumptions. YMMV on bulk pricing ;) "the more you buy the more you save"

1 comments

Robotbeat 1159 days ago

Hmmm… the values in the 7B model seem feasible. An order of magnitude lower GPU hours, plus presumably the lower parameter count means it probably could fit on a 24GB Radeon RX 7900 XTX, which has higher single precision flops than the A100 and costs $1000 instead of $15,000.

An order of magnitude lower GPU-hour time, plus if you train it for 210 days instead of 21 days, means you could do a 7B model with 20 consumer GPUs which are $1000 apiece. $20k, not counting mainboard, etc. Really not bad. Might even be doable as a volunteer project.

link

nl 1159 days ago

I'm not aware of any efficient transformer training code for AMD cards.

Also most training is done using bfloat, not single precision (which is usually only used for accumulators)

link

Robotbeat 1158 days ago

Sure, you would need to rewrite the training code for AMD's ecosystem. If you're using mixed precision training, I suppose you're right about BF16. That puts the relative performance of A100 about 2.5x that of the Radeon RX 7900 XT. May be better to go with the Nvidia GeForce RTX 4090 with a $1600 retail.

link

titaniumtown 1158 days ago

It all works with pytorch and huggingface's transformers library out of the box with Rocm.

link

slavik81 1157 days ago

You would need to compile a few components from source for Navi 31 if you were to try it today, so out-of-the-box is perhaps an overstatement, but it's certainly doable.

link