|
|
|
|
|
by pellucide
791 days ago
|
|
Somewhere I read that the 8B llama2 model could be undertrained by 100-1000x. So is it possible to train a model with 8B/100 = 80M parameters to perform as good as the llama2 8B model, given enough training time and training tokens? |
|