|
|
|
|
|
by Trapais
758 days ago
|
|
For comparison, here's 8B [Nemotron](https://huggingface.co/nvidia): > 1,024 A100s were used for 19 days to train the model. > NVIDIA models are trained on a diverse set of public and proprietary datasets. This model was trained on a dataset containing 3.8 Trillion tokens of text. |
|