|
|
|
|
|
by Robotbeat
1161 days ago
|
|
Hmmm… the values in the 7B model seem feasible. An order of magnitude lower GPU hours, plus presumably the lower parameter count means it probably could fit on a 24GB Radeon RX 7900 XTX, which has higher single precision flops than the A100 and costs $1000 instead of $15,000. An order of magnitude lower GPU-hour time, plus if you train it for 210 days instead of 21 days, means you could do a 7B model with 20 consumer GPUs which are $1000 apiece. $20k, not counting mainboard, etc. Really not bad. Might even be doable as a volunteer project. |
|
Also most training is done using bfloat, not single precision (which is usually only used for accumulators)