|
|
|
|
|
by wrsh07
793 days ago
|
|
My understanding from this tweet thread [1] is that chinchilla probably overspecified some of the hyperparameters to the model tl;dr I'm looking forward to having lots of models (ideally models) trained with a wide range of parameters to narrow down "what is actually optimal" I think there is an interesting tradeoff of data quality and data volume, though (Eg if we train with the highest quality 10% of our data, does the model improve if we use the other 90%? What if we increase our data size by 10x?) [1] https://twitter.com/tamaybes/status/1780639257389904013 |
|