| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eightysixfour 926 days ago
	There is probably some limit where making the dataset larger, with more diverse information, does not create meaningful improvements with current architectures. I do not know what that limit is or what it looks like, but I also don’t think we are particularly close to it yet. “The Pile” dataset is the asset we needed to jumpstart this process, it had so much raw data it could get us over the hump, but Phi and some of the models trained on explicit reasoning make the limitations of random shit people say on the internet pretty clear.

2 comments

verdverm 926 days ago

The Pile dataset for those interested

https://pile.eleuther.ai/

https://arxiv.org/abs/2101.00027

I'm bullish on domain specific models that start from generalized models. Something of a T shape analogy, but maybe a couple of distillation & fine-tuning steps

link

pmb22 926 days ago

The eightysixfour rule? You would think that this would follow something similar to Moore's law for a little while

link