|
|
|
|
|
by ummonk
1144 days ago
|
|
Given inference costs and ability to run on devices, there's an argument to be made for training models that are smaller than Chinchilla-optimal though, especially if you can still eek out improved performance with longer training times. |
|