Hacker News new | ask | show | jobs
by Trapais 758 days ago
For comparison, here's 8B [Nemotron](https://huggingface.co/nvidia):

> 1,024 A100s were used for 19 days to train the model.

> NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.8 Trillion tokens of text.