| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by digdugdirk 924 days ago
	What do you see as the limit to this improvement?

2 comments

eightysixfour 924 days ago

There is probably some limit where making the dataset larger, with more diverse information, does not create meaningful improvements with current architectures. I do not know what that limit is or what it looks like, but I also don’t think we are particularly close to it yet.

“The Pile” dataset is the asset we needed to jumpstart this process, it had so much raw data it could get us over the hump, but Phi and some of the models trained on explicit reasoning make the limitations of random shit people say on the internet pretty clear.

link

verdverm 924 days ago

The Pile dataset for those interested

https://pile.eleuther.ai/

https://arxiv.org/abs/2101.00027

I'm bullish on domain specific models that start from generalized models. Something of a T shape analogy, but maybe a couple of distillation & fine-tuning steps

link

pmb22 924 days ago

The eightysixfour rule? You would think that this would follow something similar to Moore's law for a little while

link

lukeplato 924 days ago

models trained on gpt output might be more distilled and specialized but it wouldn't be improving generalization

link

lukeplato 924 days ago

https://twitter.com/pfau/status/1674766269113937920

link

eightysixfour 924 days ago

I disagree with this. If you give GPT information that was not part of its dataset and ask it to make question and answer pairs off of that information, you are adding higher quality breadth to the training corpus.

Phi-2 seems like pretty good proof of that.

link

verdverm 924 days ago

that's the point, they get less good at everything, but really good at one or a few things

The real benefit here is

1. It's much cheaper and faster to train a bunch of specialized models once you have a single good LLM

2. You probably can't get the same capabilities from a specialized model by training it directly.

link