| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tysam_and 890 days ago

This is, among other things, a very natural consequence of some of the equations surrounding and involved in Shannon's original noisy channel capacity theorem, where the noise is (in many ways) conditioned upon the structure of the model itself.

It is not at all necessarily surprising, I think, from a purely high-level perspective, but I do personally think that I find that it is good to have the analysis. From a purely professional standpoint, I do not believe it is unique or distinctive enough as an individual method to need its own separate name for day-to-day use. From a personal perspective, however, I thought the mad cow disease reference was hilarious and applaud whoever came up with the acronym.

I find the benefit in the analysis, and the concerns presented about generated data being present in the data makes sense to me (and if in sufficient quantity, would make sense as biasing the models improperly in a rather significant kind of way).

I particularly enjoyed the humor of this line, the tongue-in-cheek nature is very funny/nice to me here:

"Ascertaining whether an autophagous loop has gone MAD or not (recall Definition 2.1) requires that we measure how far the synthesized data distribution Gt has drifted from the true data distribution Pr over the generations t."

I like their use of color in the paper, I saw a similar orange/green color scheme earlier today and enjoyed it very much as an annotation method.

"A fixed real dataset only slows generative model degradation" is again also a natural consequence of Shannon's noisy channel capacity theorem, one can say that with almost nearly perfect certainty that a limited neural network will not be able to perfectly fit the distribution of the data that it is training on, thus it will have bias, variance, or some combination of both, limited ultimately by the model's capacity itself.

This w.r.t. the original dataset is noise, and we can choose between whether we want collapse, or recursively encoding the noise patterns of the previous model (which might happen to have an additive effect, or maybe not! Who knows! I do not know for sure here, I have not yet figured this one out myself yet).

w.r.t. the real data slowing down degradation, if we are sampling I.I.D. of course then proportionately we still should see some degradation as this is the nature of empirical risk minimization over maximum likelihood estimation. It is still good that they have shown this, however, I thinks.

The fresh data loop, I believe, would be an example of actually a kind of noise in and of itself, w.r.t. the original input dataset, and as long as this 'noise' (from the perspective of the model) has a higher SNR than the (potentially slow) collapse of the model's output distribution, then it should (in some kind of proportion at leasts) be constantly-playing 'keep-up' with the fresh data.

"First, we find that—regardless of the performance of early generations—the performance of later generations converges to a point that depends only on the amounts of real and synthetic data in the training loop. " -- there we are (I saw this after making the SNR point, this makes sense within this framework of interpretation, then.

All in all, I found this paper very aware of itself and what it was studying, it was well-laid out and accessible, and while the points are not necessarily earth-shattering (though I still have to read through some of it, I think), having clear empirical evidence about this phenomenon, detailing it, and cutting away through the forest of (at-least-seemingly) untested battlegrounds is one that I appreciated.

Curious to hear what others think about this one. <3 :'))))