Hacker News new | ask | show | jobs
by famouswaffles 608 days ago
>Synthetic data can never contain more information than the statistical model from which it is derived: it is simply the evaluation of a non-deterministic function on the model parameters. And the model parameters are simply a function of the training data.

The Information in the data isn't just about the output but its rate of occurrence/distribution. If what your base model has learnt is only enough to have the occasional flash of brilliance say 1 out of 40 responses and you are able to filter out these responses and generate as much as you like then you can very much 'bootstrap a better model' by training on these filtered results. You are only getting a function of the model's parameters if you train on its unfiltered, unaltered output.