Hacker News new | ask | show | jobs
by antimatter15 806 days ago
It looks like Llama 2 7B took 184,320 A100-80GB GPU-hours to train[1]. This one says it used a 96×H100 GPU cluster for 2 weeks, for 32,256 hours. That's 17.5% of the number of hours, but H100s are faster than A100s [2] and FP16/bfloat16 performance is ~3x better.

If they had tried to replicate Llama 2 identically with their hardware setup, it'd cost a little bit less than twice their MoE model.

[1] https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md#...

[2] https://blog.ori.co/choosing-between-nvidia-h100-vs-a100-per...

1 comments

They mention the cost was ~80,000k USD so for 32,256 hours it comes to ~2.48$ an hour. Amazing how cost effective the compute actually is.
I was paying $1.1 for A100 hour more than a year ago. $2.48 is crazy expensive.
It was for a 96 X H100 cluster. Their provider was exabits.ai which bills itself as a decentralised computing marketplace.