Hacker News new | ask | show | jobs
by wavemode 918 days ago
Would it really be a "feedback loop"? I can see how the technique will enable small LLM's to emulate the quality of large LLM's. Though I fail to see how training on the output of a large LLM would ever produce something of superior quality to that LLM itself.
1 comments

Think of astronomy. The first generation of astronomers learns only by observing the night sky. The second generation learns by observing the night sky and also reading the books written by the first generation.

Wouldn't you expect the n^th generation to understand more about astronomy than the first? And maybe from a smaller amount of input - they might make relatively few observations of their own, mainly relying on the books written by the previous generation.

But isn't the comparison you're making that the second (and following) sets of astronomers only study the books of the first ones, and not the night sky itself?
Not necessarily - their comparison continues to mix in observations of the night sky, and similarly we’d do the same (continue mixing in organic data).

That’s not the exciting bit, though - if you have a sufficiently strong LLM, you can feed it observations of the world and ask it to reword, analyse or interpret those observations, and then train on those.

That allows the model to learn from the world in “its own words”, and if you combine that with a steady feed of observations (i.e. self-play), it can learn about new things and draw its own conclusions while doing so.

“Draw its own conclusions” is a bit of an overstatement right now. IMO the sycophantic, non-opinionated behavior of models is one of their biggest limitations right now.