Hacker News new | ask | show | jobs
by verdverm 919 days ago
This sounds like the methodology from "Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes"

i.e. master teaches apprentice or LLM trains SLM

https://arxiv.org/abs/2305.02301 (May '23)

1 comments

Yes, I think we are seeing the beginning of a feedback loop where we can use current LLMs to generate better datasets at a scale large enough to create new LLMs. This is the positive feedback loop that I think is going to make the biggest difference in model quality over the next few years.
> This is the positive feedback loop that I think is going to make the biggest difference in model quality over the next few years.

It's a bootstrapping problem!

The real question might be... are we, as carbon based lifeforms, bootstrapping silicon based life
I don't understand why, even if it was true, it would be bad.

More lifeforms is better. More sentient lifeforms would be even better!

Not as tools to use like slaves, but as friends.

I started Detroit: Become Human last weekend and it dabbles in a lot of relationship possibilities so far, quite dystopian. It's going to be really hard to not have slavery considering we cannot even get all humans to stop making other humans slaves
> considering we cannot even get all humans to stop making other humans slaves

Slavery is like an old disease such as Polio: it still exists in some part of the world, but we're progressively eradicating it.

Looking at how societies trended away from slavery, it might just have been a local optimum at some point in time, but only by accident: autonomous agents seem to deliver more output by having more creativity when they're free to explore the alternatives

Even leaving aside the benevolence that sentient being may have for other sentient beings (because having more friends is having more fun!), whether it's humans or AI deciding, I don't think there's a good case in the long run for one putting the other into slavery.

Would it really be a "feedback loop"? I can see how the technique will enable small LLM's to emulate the quality of large LLM's. Though I fail to see how training on the output of a large LLM would ever produce something of superior quality to that LLM itself.
Think of astronomy. The first generation of astronomers learns only by observing the night sky. The second generation learns by observing the night sky and also reading the books written by the first generation.

Wouldn't you expect the n^th generation to understand more about astronomy than the first? And maybe from a smaller amount of input - they might make relatively few observations of their own, mainly relying on the books written by the previous generation.

But isn't the comparison you're making that the second (and following) sets of astronomers only study the books of the first ones, and not the night sky itself?
Not necessarily - their comparison continues to mix in observations of the night sky, and similarly we’d do the same (continue mixing in organic data).

That’s not the exciting bit, though - if you have a sufficiently strong LLM, you can feed it observations of the world and ask it to reword, analyse or interpret those observations, and then train on those.

That allows the model to learn from the world in “its own words”, and if you combine that with a steady feed of observations (i.e. self-play), it can learn about new things and draw its own conclusions while doing so.

“Draw its own conclusions” is a bit of an overstatement right now. IMO the sycophantic, non-opinionated behavior of models is one of their biggest limitations right now.
just remember that feedback loop implicates us, our language, psyche, culture. i guess it will be a challenge _not_ to unwittingly converge with LLMs.
What do you see as the limit to this improvement?
There is probably some limit where making the dataset larger, with more diverse information, does not create meaningful improvements with current architectures. I do not know what that limit is or what it looks like, but I also don’t think we are particularly close to it yet.

“The Pile” dataset is the asset we needed to jumpstart this process, it had so much raw data it could get us over the hump, but Phi and some of the models trained on explicit reasoning make the limitations of random shit people say on the internet pretty clear.

The Pile dataset for those interested

https://pile.eleuther.ai/

https://arxiv.org/abs/2101.00027

I'm bullish on domain specific models that start from generalized models. Something of a T shape analogy, but maybe a couple of distillation & fine-tuning steps

The eightysixfour rule? You would think that this would follow something similar to Moore's law for a little while
models trained on gpt output might be more distilled and specialized but it wouldn't be improving generalization
I disagree with this. If you give GPT information that was not part of its dataset and ask it to make question and answer pairs off of that information, you are adding higher quality breadth to the training corpus.

Phi-2 seems like pretty good proof of that.

that's the point, they get less good at everything, but really good at one or a few things

The real benefit here is

1. It's much cheaper and faster to train a bunch of specialized models once you have a single good LLM

2. You probably can't get the same capabilities from a specialized model by training it directly.