Hacker News new | ask | show | jobs
by lawlessone 793 days ago
1.3 million GPU hrs for the 8b model. Take you around 130 years to train on a desktop lol.
1 comments

Interesting. LLAMA is trained using 16K GPUs so it would have taken around a quarter for them. An hour of GPU use costs $2-$3 so training a custom solution using LLAMA should be atleast $15K to $1M. I am trying to get started with this thing. A few guys suggested 2 GPUs were a good start but I think that would only be good for 10K training samples.