|
|
|
|
|
by cyanydeez
39 days ago
|
|
the model doesn't know itself, but all these larger models are generating a significant amount of synthetic data from the prior models, and the prior models are all context bloated renditions; you fill the KV cache with whatever alignment you want, and then generate synthetic data. That training on existing models is what brings out various other things about other models; then there's models that are just like snowballs, where you build one iteration, then you give it it's identity, then you train on that with the same synthetic generaiton. So a model could generation include at some point it's own name. |
|
Synthetic data is generated by other models, and yes this is often where identity propagates.
I think with the snowballing you mean things like iterative self distillation? That’s definitely not done unsupervised, because of the risk of model collapse, and typically heavily curated and/or mixed with real data.