|
|
|
|
|
by wongarsu
177 days ago
|
|
To get a single knowledge-cutoff they spent 16.5h wall-clock hours on a cluster of 128 NVIDIA GH200 GPUs (or 2100 GPU-hours), plus some minor amount of time for finetuning. The prerelease_notes.md in the repo is a great description on how one would achieve that |
|
Also of course this is for one training run, if you need to experiment you'd need to do that more.