Hacker News new | ask | show | jobs
by ummonk 1144 days ago
Given inference costs and ability to run on devices, there's an argument to be made for training models that are smaller than Chinchilla-optimal though, especially if you can still eek out improved performance with longer training times.