Hacker News new | ask | show | jobs
by simonpure 40 days ago
I was wondering the same and learned the model doesn't know about itself during training [0]

[0] https://developers.googleblog.com/closing-the-knowledge-gap-...

1 comments

the model doesn't know itself, but all these larger models are generating a significant amount of synthetic data from the prior models, and the prior models are all context bloated renditions; you fill the KV cache with whatever alignment you want, and then generate synthetic data.

That training on existing models is what brings out various other things about other models; then there's models that are just like snowballs, where you build one iteration, then you give it it's identity, then you train on that with the same synthetic generaiton.

So a model could generation include at some point it's own name.

I don’t think what you’re saying makes a lot of sense. You don’t “fill the KV cache with whatever alignment you want.” That doesn’t exist. The KV cache is an inference optimization, and is populated by running tokens through the model.

Synthetic data is generated by other models, and yes this is often where identity propagates.

I think with the snowballing you mean things like iterative self distillation? That’s definitely not done unsupervised, because of the risk of model collapse, and typically heavily curated and/or mixed with real data.