Hacker News new | ask | show | jobs
by alister 531 days ago
> short stories generated by GPT-3.5 and GPT-4 to train LMs that are smaller

The loop of development is fascinating:

Millions of humans write literature, Wikipedia, etc.

Large language models are trained on that body of work.

Now large language models generate training data for small language models.

What's the next iteration? A talking Buzz Lightyear toy with one of those small language models that'll teach (human) infants to talk?

3 comments

This is actually a common pattern called "model distilling".[0]

[0] https://platform.openai.com/docs/guides/distillation

I thought that, too. It wasn’t really true, though.

Some papers pointed out that the models start failing after being trained with too much synthetic data. They also need tons of random, Internet data in the first place. Humans don’t have those failure modes. The AI’s also got smarter the more data we produced.

So, there’s some critical differences between what we’re doing and what they’re doing that keep it from being a neat flow like that. What many humans do in training other humans fits that, though.

> A talking Buzz Lightyear toy with one of those small language models that'll teach (human) infants to talk?

Great idea. I was thinking more like a plushie toy with sensors, it would react to touch, sight and speech. I would run the models locally from a computer, keep the toy just lightweight I/O.