| HN Mirror

My understanding from this tweet thread [1] is that chinchilla probably overspecified some of the hyperparameters to the model

tl;dr I'm looking forward to having lots of models (ideally models) trained with a wide range of parameters to narrow down "what is actually optimal"

I think there is an interesting tradeoff of data quality and data volume, though

(Eg if we train with the highest quality 10% of our data, does the model improve if we use the other 90%? What if we increase our data size by 10x?)