| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Trapais 758 days ago

For comparison, here's 8B [Nemotron](https://huggingface.co/nvidia):

> 1,024 A100s were used for 19 days to train the model.

> NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.8 Trillion tokens of text.